Copilot Arena is a Visual Studio Code extension designed to evaluate large language models (LLMs) in real-world settings by collecting developer preferences during their actual workflow. The platform has gained over 11,000 users and supports numerous code completions and completion battles. It has shown insights into user preferences and how different models perform on various tasks. The evaluation highlights the importance of human feedback for performance metrics, contrasting with static benchmarks. Extensions to include more nuanced feedback mechanisms are encouraged.
Sort: