Research paper proposing a scaling law to optimize domain knowledge injection during LLM pretraining. The study identifies critical collapse points where excessive domain-specific data causes catastrophic forgetting, and demonstrates these thresholds scale predictably with model size. The proposed scaling law enables predicting optimal knowledge infusion amounts for large models by analyzing smaller counterparts, validated across multiple model sizes and token budgets.
Sort: