MobileLLM is a sub-billion parameter language model optimized for on-device use cases, presented at ICML 2024. It incorporates innovations like SwiGLU activation functions, deep and thin architectures, embedding sharing, and grouped-query attention, achieving significant accuracy improvements over state-of-the-art models. This repository includes training code and instructions for dataset preparation and multi-node setup.

4m read timeFrom github.com
Post cover image
Table of contents
CitationRunResults on Zero-shot Common Sense Reasoning tasksAcknowledgementContactLicense

Sort: