Copilot Arena is a Visual Studio Code extension designed to evaluate large language models (LLMs) in real-world settings by collecting developer preferences during their actual workflow. The platform has gained over 11,000 users and supports numerous code completions and completion battles. It has shown insights into user preferences and how different models perform on various tasks. The evaluation highlights the importance of human feedback for performance metrics, contrasting with static benchmarks. Extensions to include more nuanced feedback mechanisms are encouraged.

8m read timeFrom blog.ml.cmu.edu
Post cover image
Table of contents
Copilot Arena System DesignConclusion

Sort: