Meta's KernelEvolve is an agentic AI system that automates the generation and optimization of hardware kernels for heterogeneous accelerators including NVIDIA GPUs, AMD GPUs, and Meta's custom MTIA silicon. Rather than one-shot code generation, it frames kernel optimization as a structured search problem using Monte Carlo tree search and evolutionary strategies, with an LLM synthesizer guided by dynamic context-aware prompts, a retrieval-augmented knowledge base injecting hardware-specific documentation, and an automated evaluation framework that profiles correctness and performance. Results include a 60%+ inference throughput improvement for Meta's Andromeda Ads model on NVIDIA GPUs and 25%+ training throughput improvement on MTIA chips, compressing weeks of expert kernel engineering into hours. The system also supports proprietary hardware like MTIA by injecting chip-specific documentation at runtime, enabling kernel generation for hardware absent from any public LLM training data. Successful optimization trajectories are distilled into reusable skills and used to post-train smaller specialized models via agentic reinforcement learning.

17m read timeFrom engineering.fb.com
Post cover image
Table of contents
The Challenge: The Bottleneck of Explosive Kernel GrowthHow KernelEvolve Addresses These ChallengesKernelEvolve: Searching for Optimal KernelsEnabling Proprietary AI ChipsKernelEvolve’s Impact Across Benchmark and ProductionHow It All Fits TogetherLooking AheadRead the PaperAcknowledgements

Sort: