How Lyft Built an ML Platform That Serves Millions of Predictions Per Second

Lyft built LyftLearn Serving, an ML platform handling millions of predictions per second using a microservices architecture. Instead of a shared monolithic system, they generate independent microservices for each team via configuration templates. The platform separates data plane concerns (runtime performance, inference execution) from control plane concerns (deployment, versioning, testing). Key features include automated model self-tests, flexible library support (TensorFlow, PyTorch), and dual interfaces for engineers and data scientists. The architecture uses Flask/Gunicorn for HTTP serving, Kubernetes for orchestration, and Envoy for load balancing. Over 40 teams migrated from the legacy system, achieving team autonomy while maintaining platform consistency.

#machine-learning

#python

#kubernetes

#microservices

#lyft

Jan 13•13m read time•From blog.bytebytego.com

Table of contents

✂️ Cut your QA cycles down to minutes with automated testing (Sponsored)Two Planes of Complexity The Requirements Problem Cut Code Review Time & Bugs in Half (Sponsored)The Microservices Solution The Runtime Architecture The Configuration Generator Model Self-Tests How an Inference Request Flows Through the System Development Workflow and Documentation Conclusion SPONSOR US