BMW Group and Google Cloud completed a proof of concept for deploying small language models (SLMs) in vehicles for voice commands. Unlike cloud-dependent LLMs, SLMs can run on-device, avoiding network latency issues. The team built an automated pipeline on Vertex AI to handle the full workflow: model compression (quantization, pruning, knowledge distillation), quality enhancement (LoRA fine-tuning, RL methods like DPO and GRPO), and rigorous evaluation using ROUGE/BLEU metrics and LLM-as-a-judge approaches. The pipeline tests models against BMW's 'Head unit in the cloud' — an AOSP-based infotainment system running natively on cloud compute instances — enabling scalable testing without physical hardware. Source code is published on GitHub.
Table of contents
Small language models: small concept, big potentialChallenges of Integrating foundation models into vehiclesConverting LLMs to SLMsPost-Compression Quality EnhancementEvaluating Performance for Automotive TasksThe Challenge of Finding the Optimal ConfigurationSolution: An Automated Workflow for SLM OptimizationImplementation: An Automated Workflow with Vertex AI PipelinesSort: