Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Hacker News is a community-driven platform for sharing and discussing technology news, startups, and programming-related topics. Through user submissions and comments, Hacker News offers insights into emerging technology trends, industry developments, and entrepreneurial ventures. Readers can participate in discussions, share their insights, and stay informed about the latest advancements in technology and innovation.

Hacker News

Anthropic has open-sourced circuit-tracing tools that generate attribution graphs to reveal the internal decision-making steps of large language models. The library supports popular open-weights models and includes an interactive frontend hosted by Neuronpedia for exploring these graphs. Researchers can trace circuits, visualize and annotate graphs, and test hypotheses by modifying feature values. The tools have been used to study multi-step reasoning and multilingual representations in models like Gemma-2-2b and Llama-3.2-1b, aiming to advance AI interpretability research across the broader community.

Open-sourcing circuit-tracing tools