A developer documents building DecipherLM, an LLM-based Caesar cipher solver for a mixed-shift cryptographic riddle posted on dev.to. The solution evolved through four phases: a failed global master key approach, a noisy line-by-line scoring attempt, a consensus/mode-based trusted pool strategy, and finally a contextual history scoring method. Using Qwen2.5-0.5B for perplexity scoring with a rolling context buffer of previous decrypted lines, the system achieved 100% accuracy. The post compares GPT-2, SmolLM2-135M, SmolLM2-360M, and Qwen2.5-0.5B, finding that bigger models aren't always better for character-level cryptographic noise. Full working Python code is included.

Sort: