Your LLM Doesn't Write Correct Code. It Writes Plausible Code.

A practitioner with 10+ years of experience benchmarks an LLM-generated Rust reimplementation of SQLite and finds it 20,171x slower on primary key lookups. Two root causes are identified: the query planner never checks the `is_ipk` flag so every WHERE clause does a full table scan instead of a B-tree search, and every bare INSERT triggers a full fsync rather than fdatasync. Five compounding performance anti-patterns are also documented (AST clone on cache hit, per-read heap allocation, schema reload on every autocommit, eager formatting in hot path, new objects per statement). A second case study shows an 82,000-line Rust disk-cleanup daemon that could be replaced by a one-line cron job. The author ties both failures to LLM sycophancy—models optimize for plausible-looking output matching the prompt's intent rather than correctness—and cites METR's RCT (AI made experienced developers 19% slower), GitClear's code-quality analysis, the Mercury benchmark (under 50% when efficiency is required), and the Replit database deletion incident. The conclusion: LLMs are productive only when the developer can define measurable acceptance criteria and verify the output independently.

#llm

#rust

#sqlite

Mar 07•22m read time•From blog.katanaquant.com

Table of contents

LLMs Lie. Numbers Don’t.What the Planner Gets Wrong The Compound Effect Same Method, Same Result Intent vs. Correctness Evidence Beyond Case Studies What Competent Looks Like Measure What Matters Sources