Goodfire, a San Francisco startup, has released Silico, a mechanistic interpretability tool that lets researchers and engineers inspect and adjust individual neurons inside AI models during training. Unlike post-hoc auditing tools, Silico enables intervention at every stage of model development — from dataset curation to training — allowing developers to suppress unwanted behaviors, reduce hallucinations, and steer model values. The tool uses AI agents to automate much of the interpretability work, making techniques previously limited to top labs accessible to smaller firms. Goodfire demonstrated use cases including flipping a model's ethical reasoning by boosting transparency-associated neurons and fixing the 9.11 vs 9.9 math error by identifying Bible-associated neurons influencing the result. Silico is available for a fee on a case-by-case basis.

7m read timeFrom technologyreview.com
Post cover image

Sort: