Machine Learning (ML) is celebrating its 8th year at Uber since we first started using complex rule-based machine learning models for driver-rider matching and pricing teams in 2016. Since then, our progression has been significant, with a shift towards employing deep learning models at the core of most business-critical applications today, while actively exploring the possibilities offered by Generative AI models. As the complexity and scale of AI/ML models continue to surge, there's a growing demand for highly efficient infrastructure to support these models effectively. Over the past few years, we've strategically implemented a range of infrastructure solutions, both CPU- and GPU-centric, to scale our systems dynamically and cater to the evolving landscape of ML use cases. This evolution has involved tailored hardware SKUs, software library enhancements, integration of diverse distributed training frameworks, and continual refinements to our end-to-end Michaelangelo platform. These iterative improvements have been driven by our learnings along the way, and continuous realignment with industry trends and Uber's trajectory, all aimed at meeting the evolving requirements of our partners and customers.

The Uber Engineering Blog offers insights, technical deep dives, and updates on the engineering challenges and solutions behind Uber's platform and services. Covering topics such as distributed systems, data infrastructure, and mobile development, the blog provides resources for developers interested in large-scale systems and real-world engineering problems. Developers can learn about Uber's technology stack, engineering culture, and best practices for building scalable and reliable systems.

Uber Engineering

Uber has made significant progress in scaling their AI/ML infrastructure, transitioning from on-prem to cloud infrastructure and optimizing existing infrastructure. They have implemented a unified federation layer for batch workloads, upgraded network bandwidth for training efficiency, and upgraded memory to improve GPU allocation rates. They are also building new infrastructure by evaluating price-performance ratios of cloud SKUs and improving LLM training efficiency through memory offload.

Scaling AI/ML Infrastructure at Uber

Optimizing Existing On-prem Infrastructure