Google has released AMS (Activation-based Model Scanner), an open source tool for verifying the safety of open-weight LLMs without behavioral testing. Instead of sending harmful prompts, AMS measures geometric structure in a model's activation space — specifically, the statistical separation between harmful and benign content representations. Safety-trained models show 3.8–8.4σ separation; uncensored or abliterated variants collapse to 1.1–3.3σ and are flagged accordingly. Scans complete in 10–40 seconds on GPU hardware using a single forward pass per prompt pair. AMS supports CI/CD integration, supply chain verification via activation fingerprinting, and registry screening. It works with any Hugging Face-compatible model and is released under Apache 2.0.
Table of contents
The Problem with Behavioral TestingWhat AMS DetectsUse CasesHow It WorksGet StartedSort: