What’s the Best Way to Brainwash an LLM?
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
An experiment comparing three data formats for injecting a persona (C-3PO) into an LLM via Supervised Fine-Tuning (SFT) with LoRA on Qwen3-4B. The three strategies tested are: chat demonstrations, first-person self-descriptive statements, and third-person synthetic documents (SDF). First-person statements proved most effective for generalization, encoding identity deeply enough to transfer across formats. Demonstrations work well in fixed deployment contexts but don't generalize. Synthetic documents excel at factual accuracy but fail to capture emotional texture. A key practical finding: a well-crafted system prompt alone achieves surprisingly strong persona fidelity, and fine-tuning is only worth the cost when robustness across varied prompts is needed.
Table of contents
Three Theories of Where a Persona LivesThe SetupHow Do You Measure Brainwash Quality?The Perplexity MatrixWhat Do the Actual Responses Look Like?Trait Coverage: The Human CheckThe LLM Judge Couldn’t Tell Them ApartWhat This Experiment Can’t Tell YouSo, What’s the Best Way to Brainwash an LLM?Sort: