Why AI labs are buying coding tools and giving away free products

Kilo Blog

AI labs are running out of public internet data to train on, and their aggressive moves into free coding tools, billion-dollar acquisitions (Cursor, Windsurf), and distillation attacks on competitors all stem from the same root cause: the need for new, high-quality training data. Every interaction with a free AI coding assistant generates expert-annotated workflow data that doesn't exist on the public internet. Frontier models are converging in quality because they trained on the same data, driving API prices down 60-80%. The author argues models will fully commoditize within 1-2 years, with differentiation shifting to execution speed, integrations, and privacy — and advises developers to be deliberate about which tools they use and what data they're comfortable sharing.

The New AI Problem Is a Lack of New Data

Strategy 1: Build tools that generate data