Is your open-weight model safe? AMS is a new open source scanner that verifies LLM safety by measuring activation geometry in under a minute.

Google Open Source is Google's initiative for promoting open-source software development and collaboration, offering tools, resources, and best practices for open-source projects. Developers can learn about Google's open-source projects, contribution guidelines, and community engagement to participate in and contribute to open-source projects effectively.

Google Open Source Blog

Google has released AMS (Activation-based Model Scanner), an open source tool for verifying the safety of open-weight LLMs without behavioral testing. Instead of sending harmful prompts, AMS measures geometric structure in a model's activation space — specifically, the statistical separation between harmful and benign content representations. Safety-trained models show 3.8–8.4σ separation; uncensored or abliterated variants collapse to 1.1–3.3σ and are flagged accordingly. Scans complete in 10–40 seconds on GPU hardware using a single forward pass per prompt pair. AMS supports CI/CD integration, supply chain verification via activation fingerprinting, and registry screening. It works with any Hugging Face-compatible model and is released under Apache 2.0.

Introducing AMS: Activation-based model scanner for open-weight LLM safety verification