Meta's Generative Ads Model (GEM) is a foundation model that processes billions of daily user-ad interactions to improve ads recommendations across its platforms. The system uses LLM-scale training infrastructure with hybrid parallelism strategies, combining Hybrid Sharded Distributed Parallel for dense components and two-dimensional parallelism for sparse embedding tables across thousands of GPUs. Key optimizations include custom GPU kernels, PyTorch 2.0 graph compilation, FP8 quantization, and NCCLX communication collectives. GEM transfers knowledge to hundreds of vertical models through direct and hierarchical distillation strategies, achieving a 23x effective FLOPs improvement while reducing training bottlenecks through 5x faster job startup and 7x faster compilation times.

3m read timeFrom infoq.com
Post cover image

Sort: