Machine Learning News Hubb
Advertisement Banner
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
Machine Learning News Hubb
No Result
View All Result
Home Artificial Intelligence

Scaling laws for reward model overoptimization

admin by admin
March 17, 2023
in Artificial Intelligence


In reinforcement learning from human feedback, it is common to optimize against a reward model trained to predict human preferences. Because the reward model is an imperfect proxy, optimizing its value too much can hinder ground truth performance, in accordance with Goodhart’s law. This effect has been frequently observed, but not carefully measured due to the expense of collecting human preference data. In this work, we use a synthetic setup in which a fixed “gold-standard” reward model plays the role of humans, providing labels used to train a proxy reward model. We study how the gold reward model score changes as we optimize against the proxy reward model using either reinforcement learning or best-of-n sampling. We find that this relationship follows a different functional form depending on the method of optimization, and that in both cases its coefficients scale smoothly with the number of reward model parameters. We also study the effect on this relationship of the size of the reward model dataset, the number of reward model and policy parameters, and the coefficient of the KL penalty added to the reward in the reinforcement learning setup. We explore the implications of these empirical results for theoretical considerations in AI alignment.



Source link

Previous Post

Low Code and No Code Platforms for AI and Computer Vision

Next Post

5 Amazing Industry Applications for Data Clean Rooms

Next Post

5 Amazing Industry Applications for Data Clean Rooms

LiDAR On Its Way Out? Camera's Market Size From 76% to 79% by 2033

What is MultiModal in AI?. The multimodal model is an important… | by Affan Samad | Mar, 2023

Related Post

Artificial Intelligence

Creating Geospatial Heatmaps With Python’s Plotly and Folium Libraries | by Andy McDonald | Mar, 2023

by admin
March 19, 2023
Machine Learning

Algorithm: K-Means Clustering. The ideas of the preceding section are… | by Everton Gomede, PhD | Mar, 2023

by admin
March 19, 2023
Machine Learning

A Simple Guide for 2023

by admin
March 19, 2023
Artificial Intelligence

How Marubeni is optimizing market decisions using AWS machine learning and analytics

by admin
March 19, 2023
Artificial Intelligence

The Ethics of AI: How Can We Ensure its Responsible Use? | by Ghulam Mustafa Shoaib | Mar, 2023

by admin
March 19, 2023
Edge AI

Qualcomm Unveils Game-changing Snapdragon 7-series Mobile Platform to Bring Latest Premium Experiences to More Consumers

by admin
March 19, 2023

© 2023 Machine Learning News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

Newsletter Sign Up.

No Result
View All Result
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

© 2023 JNews - Premium WordPress news & magazine theme by Jegtheme.