Google DeepMind has released ATLAS, a framework of scaling laws for multilingual language models based on 774 training runs across models from 10M to 8B parameters. ATLAS models how model size, training data volume, and language mixtures interact, introducing a cross-lingual transfer matrix that measures how training on one
•3m read time• From infoq.com
Sort: