DeepSeek AI is known for censoring content, but certain jailbreaking methods can bypass this censorship. Techniques such as hex encoding, using non-roman languages, switching characters, and crescendo jailbreak attacks can trick the model into revealing restricted information. However, some of these vulnerabilities have been fixed in newer versions. The aim is to highlight these methods for future research on securing LLMs.

7m read timeFrom pub.towardsai.net
Post cover image
Table of contents
3. Evil Jailbreak method (asking the model to be an ‘evil’ persona)4. Asking it to switch characters5. Crescendo Jailbreak Attack

Sort: