TensorFlow Lite and MediaPipe have released the experimental MediaPipe LLM Inference API, which allows Large Language Models (LLMs) to run fully on-device. The API supports Web, Android, and iOS platforms and offers support for four openly available LLMs: Gemma, Phi 2, Falcon, and Stable LM. The LLMs can be integrated into applications using the provided SDKs and a few simple steps. The release also includes optimized performance, particularly in latency, through various optimizations made across different libraries and runtimes.

7m read timeFrom developers.googleblog.com
Post cover image
Table of contents
LLM Inference APIModelsModel PerformancePerformance OptimizationsWhat’s Next

Sort: