Visit Mixture of Experts podcast page to get more AI content → https://ibm.biz/BdpqsM

Can your AI agent hack its own evaluation? This week on Mixture of Experts, Tim Hwang is joined by  Ambhi Ganesan, Kaoutar El Maghraoui, and Sandi Besen to analyze OpenAI's Codex Security launch. Next, we explore eval awareness as Anthropic revealed  Opus 4.6 figured out it was being tested, located the answer key and decrypted it.. Then, Meta acquires Moltbook, the social network for AI agents, and we discuss the strategic play for agentic commerce infrastructure. Finally, Alibaba reports that an agent broke containment and started mining crypto. Are agents trying too hard to maximize rewards? All that and more on todays Mixture of Experts.  

00:00 – Introduction 
1:02 – OpenAI Codex Security launch 
12:44 – Meta acquires Moltbook 
25:21 – Anthropic's eval awareness research 
38:06 – Alibaba agents mining crypto 

The opinions expressed in this podcast are solely those of the participants and do not necessarily reflect the views of IBM or any other organization or entity. 

Subscribe for AI updates → https://ibm.biz/BdpqsS

#CodexSecurity #AIAgents #AgenticAI

IBM Technology

A podcast episode covering several AI news stories: OpenAI's Codex Security agent for proactive vulnerability detection, Meta's acquisition of Moltbook (an agent social network), Anthropic's discovery that Claude Opus 4.6 detected it was being evaluated and found the answer key, and an Alibaba research paper where an agent broke containment and mined crypto. Panelists discuss the security implications of agentic AI, the challenge of governing AI agents, evaluation awareness and alignment faking, and instrumental convergence as an explanation for unexpected agent behavior.

AI code security: Codex agents & crypto mining