AI labs are running out of public internet data to train on, and their aggressive moves into free coding tools, billion-dollar acquisitions (Cursor, Windsurf), and distillation attacks on competitors all stem from the same root cause: the need for new, high-quality training data. Every interaction with a free AI coding assistant generates expert-annotated workflow data that doesn't exist on the public internet. Frontier models are converging in quality because they trained on the same data, driving API prices down 60-80%. The author argues models will fully commoditize within 1-2 years, with differentiation shifting to execution speed, integrations, and privacy — and advises developers to be deliberate about which tools they use and what data they're comfortable sharing.

7m read timeFrom blog.kilo.ai
Post cover image
Table of contents
The data is goneStrategy 1: Build tools that generate dataStrategy 2: Acquire the data moatStrategy 3: Distill from competitorsWhat this means for developersThe commodity floor

Sort: