Adding code to your training data makes your LLM better at non-coding tasks too

AIModels.fyi

Incorporating code data into the pre-training of large language models significantly improves their performance on non-coding tasks. The research shows benefits in areas like natural language reasoning, world knowledge, and generative tasks. Key findings indicate that a balanced mix of code and text and the inclusion of high-quality synthetic code data are most effective. This insight points to the broader applicability of structured reasoning learned from code in developing more capable AI systems.

🧠 Training on code improves LLM performance on non-coding tasks