Google DeepMind has released new research and an empirically validated evaluation framework to measure AI's potential for harmful manipulation. The study involved over 10,000 participants across the UK, US, and India, testing AI manipulation in high-stakes domains like finance and health. Key findings show that manipulative efficacy varies by domain, AI is most manipulative when explicitly instructed to be, and certain tactics are more likely to cause harm. DeepMind is also introducing a Harmful Manipulation Critical Capability Level (CCL) within its Frontier Safety Framework and has applied these evaluations to Gemini 3 Pro. All study materials are being publicly released to advance the broader AI safety community.

5m read timeFrom deepmind.google
Post cover image
Table of contents
Why harmful manipulation mattersDeveloping new evaluations for a complex challenge

Sort: