LyftLearn Evolution: Rethinking ML Platform Architecture Written by Yaroslav Yatsiuk At Lyft, machine learning (ML) is the engine behind our most critical business functions — from dispatch and …

LyftEng's platform is a central hub for engineering insights and technology updates from Lyft's engineering team. Through articles, tech talks, and open-source contributions, LyftEng offers insights into engineering challenges, innovation projects, and best practices in software development. Readers can learn about Lyft's engineering culture, technology stack, and contributions to the broader engineering community.

Lyft Engineering

Lyft migrated their ML platform from a fully Kubernetes-based architecture to a hybrid approach, using AWS SageMaker for offline training and batch workloads while keeping Kubernetes for online model serving. The transition reduced operational complexity by eliminating custom orchestration logic, background watchers, and cluster management overhead. Key technical challenges included replicating the Kubernetes runtime environment, building cross-platform Docker images, optimizing startup times with SOCI indexes and warm pools, and solving cross-cluster networking for Spark. The migration was designed to be invisible to users, requiring zero changes to ML code while significantly improving system reliability and reducing compute costs.