Uber's Michelangelo ML platform evolved from a monolithic architecture to a cloud-native Kubernetes-based system to handle 30 million predictions per second. Key engineering solutions include 100+ custom CRDs with a MySQL-backed storage abstraction to bypass etcd scaling limits, a federated batch scheduling layer using PropagationPolicy CRDs to eliminate stranded compute across regional clusters, a Python-native workflow engine called Uniflow for ML lifecycle orchestration, and a multi-cloud compute mesh spanning GCP, AWS, Azure, and OCI. The platform now supports 40 million trips per day across 70+ countries. Future plans focus on autonomous self-healing through AI agents for debugging, intelligent CI/CD governance, and zero-toil framework upgrades.
Table of contents
Introduction: the scaling wallThe architecture: solving the “platform reality”Impact: production reality at UberFuture directions: the next operational frontierSort: