Training LLMs on GitHub: The 2% Good Code Problem #shorts
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
A short clip raises the concern that only 1-2% of GitHub projects contain genuinely good code, meaning LLMs trained on GitHub data are effectively learning from 98% low-quality code. The implication is that the training data quality problem is a fundamental challenge for AI coding tools.
•1m watch time
Sort: