What’s the Best Way to Brainwash an LLM?

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

An experiment comparing three data formats for injecting a persona (C-3PO) into an LLM via Supervised Fine-Tuning (SFT) with LoRA on Qwen3-4B. The three strategies tested are: chat demonstrations, first-person self-descriptive statements, and third-person synthetic documents (SDF). First-person statements proved most effective for generalization, encoding identity deeply enough to transfer across formats. Demonstrations work well in fixed deployment contexts but don't generalize. Synthetic documents excel at factual accuracy but fail to capture emotional texture. A key practical finding: a well-crafted system prompt alone achieves surprisingly strong persona fidelity, and fine-tuning is only worth the cost when robustness across varied prompts is needed.

#llm

#lora

#qwen

May 13•12m read time•From towardsdatascience.com

Table of contents

Three Theories of Where a Persona Lives The Setup How Do You Measure Brainwash Quality?The Perplexity Matrix What Do the Actual Responses Look Like?Trait Coverage: The Human Check The LLM Judge Couldn’t Tell Them Apart What This Experiment Can’t Tell You So, What’s the Best Way to Brainwash an LLM?

Comment

Bookmark

Copy

Sort: