Simon Willison's blog offers a mix of technical tutorials, data analysis projects, and reflections on technology and society. With a focus on Python, SQL, and web development, Simon shares insights into building web applications, working with data, and exploring emerging technologies. Developers can learn about data visualization, API design, and software engineering best practices, gaining inspiration and practical knowledge to advance their careers.

Simon Willison

The Chatbot Arena, a prominent platform for evaluating large language models (LLMs) using vibes-based assessments, has faced criticism due to its ranking methods. A recent study highlights how private testing practices may lead to biased scores favoring certain proprietary models. The paper exposes concerns about unfair sampling rates and transparency issues regarding model testing. Additionally, the arena's practice of allowing multiple model submissions for vendors to cherry-pick the highest scoring results is scrutinized. OpenRouter is suggested as a potential alternative for unbiased LLM rankings.

Understanding the recent criticism of the Chatbot Arena