Heretic is an automated tool that removes safety alignment from transformer-based language models using directional ablation without expensive retraining. It combines advanced abliteration techniques with TPE-based parameter optimization to automatically find optimal parameters that minimize both refusals and divergence from

5m read timeFrom github.com
Post cover image
Table of contents
UsageHow it worksPrior artAcknowledgmentsLicense

Sort: