OpenAI uses automated red teaming powered by reinforcement learning to discover and patch prompt injection vulnerabilities in ChatGPT Atlas's browser agent before attackers exploit them. The system trains an AI attacker to find novel exploits that could trick the agent into performing unauthorized actions like forwarding

11m read timeFrom openai.com
Post cover image

Sort: