In this video we review a new paper titled "Self-Rewarding Language Models" by Meta AI. This paper is published on the same day that Mark Zuckerberg announced that Meta AI is working towards building an open-source AGI, and this paper may be a step in that direction.

The paper introduces a method to self-align a pre-trained large language model (LLM) that can replace standard RLHF and RLAIF.
The method includes training the LLM using DPO, with responses that were evaluated by the model itself.

The researchers have evaluated the method with Llama 2 70B and achieved impressive results when comparing to Claude-2, Gemini Pro and GPT-4. That said, a lot of follow up research is essential.

Watch the video to learn more.

Post - https://aipapersacademy.com/self-rewarding-language-models/
Paper - https://arxiv.org/abs/2401.10020

-----------------------------------------------------------------------------------------------
✉️ Join the newsletter - https://aipapersacademy.com/newsletter/

👍 Please like & subscribe if you enjoy this content

We use VideoScribe to edit our videos - https://tidd.ly/44TZEiX (affiliate)
-----------------------------------------------------------------------------------------------
Chapters:
0:00 Paper Introduction
1:04 High-Level Idea
2:38 Self-Rewarding Language Models Method
4:41 Results

AI Papers Academy

Meta AI released a research paper on Self-Rewarding Language Models, coinciding with Mark Zuckerberg's announcement of Meta's goal to build open-source AGI. The approach enables a single LLM to act as both the instruction-following model and its own reward model. Starting from a base model (Llama 2 70B) fine-tuned on instruction and evaluation datasets, the method iteratively generates prompts, produces multiple responses, self-scores them, and trains using DPO on preference pairs. Experiments show progressive improvement across iterations (M1→M3), with M3 achieving ~20% win rate against GPT-4 Turbo — outperforming several strong baselines. Both instruction-following ability and reward modeling accuracy improve with each iteration, though the impact of further iterations remains an open research question.

Self-Rewarding Language Models by Meta AI - Path to Open-Source AGI?