Generative AI powered by LLMs (Large Language Models) has a wide range of applications at Uber, like Uber Eats recommendations and search, customer support chatbots, code development, and SQL query generation. To support these applications, Uber leverages open-source models like Meta® Llama 2 and Mistral AI Mixtral®, and closed-source models from OpenAI, Google, and other third-party providers. As a leading company in mobility and delivery, Uber also has considerable domain-specific knowledge that can improve LLM performance for these applications. ‌One way Uber incorporates this domain-specific knowledge is through RAG (Retrieval Augmented Generation).

The Uber Engineering Blog offers insights, technical deep dives, and updates on the engineering challenges and solutions behind Uber's platform and services. Covering topics such as distributed systems, data infrastructure, and mobile development, the blog provides resources for developers interested in large-scale systems and real-world engineering problems. Developers can learn about Uber's technology stack, engineering culture, and best practices for building scalable and reliable systems.

Uber Engineering

Uber uses a mix of open-source and closed-source models to optimize the performance of large language models (LLMs) for various applications such as Uber Eats recommendations, customer support chatbots, and code development. The training infrastructure leverages robust tools like PyTorch, Kubernetes, Ray, and DeepSpeed for distributed training on both on-premises and cloud-based NVIDIA GPUs. Through continuous pre-training and fine-tuning, Uber enhances models to handle large-scale traffic efficiently, achieving performance comparable to industry-leading models like GPT-4.

Open Source and In-House: How Uber Optimizes LLM Training