TensorFlow Lite and MediaPipe have released the experimental MediaPipe LLM Inference API, which allows Large Language Models (LLMs) to run fully on-device. The API supports Web, Android, and iOS platforms and offers support for four openly available LLMs: Gemma, Phi 2, Falcon, and Stable LM. The LLMs can be integrated into applications using the provided SDKs and a few simple steps. The release also includes optimized performance, particularly in latency, through various optimizations made across different libraries and runtimes.
Sort: