Simon Willison's blog offers a mix of technical tutorials, data analysis projects, and reflections on technology and society. With a focus on Python, SQL, and web development, Simon shares insights into building web applications, working with data, and exploring emerging technologies. Developers can learn about data visualization, API design, and software engineering best practices, gaining inspiration and practical knowledge to advance their careers.

Simon Willison

Trip Venturella released Mr. Chatterbox, a 340M-parameter language model trained from scratch on 28,035 Victorian-era British Library texts (2.93B tokens, all pre-1900). Simon Willison explores the model, noting it performs poorly — more like a Markov chain than a modern LLM — likely because the training data is less than half what the Chinchilla scaling law recommends. Willison used Claude Code to build an LLM plugin (llm-mrchatterbox) enabling local execution via simple CLI commands. A key caveat: the supervised fine-tuning used synthetic conversation pairs generated by Claude Haiku and GPT-4o-mini, undermining the 'no post-1899 inputs' claim. Despite its limitations, the project is seen as a promising step toward useful models trained entirely on public domain data.

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer