Find out how MLC compiles both model and runtime for efficient local LLM inference, and what that means for running AI on your Android or iOS app.

Callstack Blog

MLC (Machine Learning Compilation) is a framework that compiles both the model and its runtime for efficient local LLM inference on edge devices. Unlike TensorFlow Lite, ONNX Runtime, or Core ML, MLC targets any environment that supports C++, making it cross-platform across Android, iOS, web, and macOS. It analyzes a model's architecture, strips out unnecessary functions, and produces a lightweight optimized runtime. Practical usage involves converting a model (e.g., LLaMA or Mistral), compiling the runtime for the target platform, and bundling both into an app. Hardware limits still apply — model size (3B, 7B, 70B) determines feasibility on a given device, not the framework itself. MLC is open-source and currently focused on LLMs, with potential to support other model types in the future.

Want to Run LLMs on Your Device? Meet MLC