David Cheney discusses the architecture which powers GitHub Copilot, a LLM powered Code Completion service.

InfoQ is a leading online platform for software developers, architects, and technical leaders, providing news, articles, presentations, and interviews on a wide range of topics, including agile practices, DevOps, microservices, and emerging technologies. With a focus on quality content and expert insights, InfoQ helps professionals stay informed about the latest trends, best practices, and industry developments. Developers can learn from real-world experiences, gain  knowledge, and connect with peers in the global software community through InfoQ's diverse and engaging content.

InfoQ

GitHub Copilot serves over 400 million code completion requests daily with a response time under 200 milliseconds. It achieves this through a sophisticated backend architecture using low latency techniques, HTTP/2, and regional deployments to minimize network latency. The use of copilot-proxy for efficient token authentication and traffic routing helps maintain seamless service. Additionally, GitHub Copilot uses strategies like request cancellation and adaptive models to optimize performance and scalability.

How GitHub Copilot Serves 400 Million Completion Requests a Day

Building a Cloud Hosted Autocompletion Service

Dealing with a Heterogeneous Client Population