Best of Simon Willison2025

  1. 1
    Article
    Avatar of simonwillisonSimon Willison·32w

    Claude Skills are awesome, maybe a bigger deal than MCP

    Anthropic introduced Claude Skills, a new pattern for extending LLM capabilities using Markdown files with instructions, scripts, and resources. Skills are token-efficient (loading only when needed), depend on code execution environments, and are simpler to create than MCP implementations. The system enables general computer automation beyond just coding tasks, with skills shareable as single files or folders. Skills work with other models too, potentially sparking wider adoption than the Model Context Protocol.

  2. 2
    Article
    Avatar of simonwillisonSimon Willison·23w

    JustHTML is a fascinating example of vibe engineering in action

    JustHTML is a pure Python HTML5 parser that passes all 9,200+ browser vendor tests and achieves 100% test coverage. The library was built over several months using AI coding agents (Claude Sonnet, Gemini Pro, Claude Opus) in VS Code, but with extensive human engineering oversight. The developer established the API design, integrated comprehensive test suites, built custom profilers and fuzzers, and made all architectural decisions while letting the AI handle code implementation. This represents "vibe engineering"—using AI agents professionally with proper code review, testing, and engineering practices—rather than "vibe coding" which produces unvetted prototypes. The project demonstrates how experienced engineers can leverage AI as a typing assistant while maintaining responsibility for design, quality, and architectural decisions.

  3. 3
    Article
    Avatar of simonwillisonSimon Willison·23w

    Your job is to deliver code you have proven to work

    Software engineers must deliver proven, working code rather than untested contributions. This requires both manual testing (seeing the code work yourself, documenting steps, testing edge cases) and automated testing (bundling tests with changes). With AI coding agents like Claude Code, developers should train these tools to prove their changes work through testing before submission. The human developer remains accountable for ensuring code quality and providing evidence that changes function correctly.

  4. 4
    Article
    Avatar of simonwillisonSimon Willison·26w

    Olmo 3 is a fully open LLM

    Ai2 released Olmo 3, a fully open LLM series that includes complete training data, process, and checkpoints. The flagship 32B Think model emphasizes interpretability with visible reasoning traces through OlmoTrace. Trained on 5.9 trillion tokens from the Dolma 3 Mix dataset (6x fewer tokens than competitors), it offers four 7B variants and two 32B models. The release enables auditing training data to detect potential backdoors, addressing security concerns in open-weight models. Performance testing shows improved SVG generation compared to Olmo 2, though OlmoTrace's training data attribution needs refinement.

  5. 5
    Article
    Avatar of simonwillisonSimon Willison·1y

    Highlights from the Claude 4 system prompt

    Anthropic has released the system prompts for their Claude 4 models, providing insights into prompt engineering and model personalities. The prompts offer guidance on effective use, limiting hallucinations, and maintaining model safety while they detail Claude's capabilities, including its preferences for conversation styles and ensuring copyright compliance.

  6. 6
    Article
    Avatar of simonwillisonSimon Willison·49w

    Design Patterns for Securing LLM Agents against Prompt Injections

    A comprehensive research paper by 11 authors from IBM, Google, Microsoft and other organizations presents six design patterns to mitigate prompt injection attacks in LLM agents. The patterns include Action-Selector, Plan-Then-Execute, LLM Map-Reduce, Dual LLM, Code-Then-Execute, and Context-Minimization approaches. Each pattern trades some agent flexibility for security by constraining actions and preventing untrusted input from triggering arbitrary tasks. The paper includes ten detailed case studies covering practical applications like SQL agents, email assistants, and customer service chatbots, providing threat models and mitigation strategies for each scenario.

  7. 7
    Article
    Avatar of simonwillisonSimon Willison·23w

    OpenAI are quietly adopting skills, now available in ChatGPT and Codex CLI

    OpenAI has quietly implemented support for skills in ChatGPT and their Codex CLI tool, following Anthropic's approach from October. Skills are simple folders containing Markdown files and optional resources that LLM tools can read and execute. ChatGPT's Code Interpreter now includes a /home/oai/skills folder with skills for handling spreadsheets, docx, and PDFs. The Codex CLI added experimental skills support two weeks ago, allowing users to place custom skills in ~/.codex/skills. The author successfully tested both implementations, creating a PDF with ChatGPT and building a Datasette plugin with Codex CLI using a custom skill. The rapid adoption by OpenAI suggests skills may become a standard pattern for extending LLM capabilities.

  8. 8
    Article
    Avatar of simonwillisonSimon Willison·33w

    Vibe engineering

    Introduces 'vibe engineering' as a term for experienced developers who use LLMs and coding agents productively while maintaining accountability for their code. Unlike 'vibe coding' (fast, irresponsible AI-driven development), vibe engineering requires senior-level skills: comprehensive testing, planning, documentation, version control, code review, QA, and research abilities. Coding agents like Claude Code, OpenAI's Codex CLI, and Gemini CLI enable parallel development workflows but demand top-tier engineering practices. AI tools amplify existing expertise rather than replace it, making traditional software engineering disciplines more important than ever.