Google DeepMind has released ATLAS, a framework of scaling laws for multilingual language models based on 774 training runs across models from 10M to 8B parameters. ATLAS models how model size, training data volume, and language mixtures interact, introducing a cross-lingual transfer matrix that measures how training on one

3m read time From infoq.com
Post cover image

Sort: