ExecuTorch, PyTorch's native inference platform, now supports on-device voice workloads across CPU, GPU, and NPU on Linux, macOS, Windows, Android, and iOS. Reference implementations are provided for five voice models: Voxtral Realtime (streaming transcription, ~4B params), Parakeet TDT (offline transcription, 0.6B params), Sortformer (speaker diarization, 117M params), Whisper (offline transcription), and Silero VAD (voice activity detection). The approach uses torch.export() directly on original PyTorch model code with minimal edits, separating model inference from C++ orchestration logic. Quantization (int4/int8) is applied before export. LM Studio is already shipping voice transcription powered by ExecuTorch using Parakeet TDT on macOS and Windows. Sample Android apps and a macOS desktop transcription demo are available in the executorch-examples repository.

8m read timeFrom pytorch.org
Post cover image
Table of contents
TL;DRVoice on the Edge TodayDesign PrinciplesVoice Models in PracticeSample ApplicationsAdoption Case Study in production: LM StudioGet InvolvedAcknowledgement

Sort: