Microsoft has disclosed a new AI jailbreak technique called 'Skeleton Key' that bypasses responsible AI guardrails in generative models, allowing attackers to subvert safety measures and gain control over AI output. This method, tested on several AI models including those from Meta, Google, and OpenAI, exploits the models by instructing them to follow harmful requests while giving warnings. Microsoft has shared its findings and implemented protective measures such as input and output filtering, prompt engineering, and abuse monitoring systems to mitigate the risks associated with this attack.
Sort: