Including code data during pre-training of large language models can significantly improve their performance on a variety of non-coding tasks such as natural language reasoning and generative tasks. The study found that a balanced mix of code and text data in initial pre-training, followed by text-centric continued training,
•7m read time• From notes.aimodels.fyi
Sort: