In this video, we dive into a new Meta research paper: "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution".
This paper introduces SWE-RL, a new reinforcement learning method for real-world software engineering. By training large language models (LLMs) directly on the evolution of real GitHub projects, SWE-RL can empower LLM to be better at software engineering.
We break down:
 • How Meta curated 11 million pull requests from GitHub.
 • SWE-RL training pipeline.
 • SWE-RL state-of-the-art results on SWE-bench Verified for open-source models under 100B parameters.
 
🔗 Written Review: https://aipapersacademy.com/swe-rl/
🔗 Paper Link: https://arxiv.org/abs/2502.18449
🔗 Code: https://github.com/facebookresearch/swe-rl
___________________
🔔 Subscribe for more AI paper reviews!

📩 Join the newsletter → https://aipapersacademy.com/newsletter/

Become a patron - https://www.patreon.com/aipapersacademy

The video was edited using VideoScribe - https://tidd.ly/44TZEiX
___________________
#airesearch #metaai #swe_rl #reinforcementlearning #llm

Chapters:
0:00 Introduction
1:15 GitHub PRs Curation
3:20 SWE-RL Training
5:42 Aha Moments
6:39 SWE-RL Results

AI Papers Academy

Meta's SWE-RL paper proposes scaling reinforcement learning for real-world software engineering tasks, addressing limitations of models like DeepSeek R1 that focus on competitive programming. Researchers curated a dataset of ~11 million high-quality GitHub pull requests from 4.6 million repositories, using issue descriptions, comments, and code context as training inputs. The model (Llama 3 SWE-RL 70B) is trained using Group Relative Policy Optimization (GRPO) with a rule-based reward derived from similarity between predicted and actual merged patches. The resulting model achieves 41% pass@1 on SWE-bench Verified, setting a new state-of-the-art for open-source models under 100B parameters. Notably, RL training produces emergent general reasoning behaviors — including divide-and-conquer strategies and self-reflection — that transfer to out-of-domain tasks not seen during training, consistently outperforming supervised fine-tuning.

SWE-RL by Meta — Reinforcement Learning for Software Engineering LLMs