As models get smaller and more capable, more AI workloads can move onto the device itself. In this talk, Chintan Parikh from Google DeepMind walks through what that looks like in practice, from Gemma 4 edge models and on-device agent skills to the real tradeoffs around latency, privacy, cost, and cross-platform deployment.

The session covers LiteRT, the Google AI Edge stack for running models across Android, iOS, desktop, web, and IoT, along with demos of local tool calling, structured output, reasoning, benchmarking, and hardware acceleration on CPUs, GPUs, and NPUs. If you're building on-device AI systems, this is a practical overview of the current edge stack and where it is headed.

Speaker info:
- https://www.linkedin.com/in/weiyiwang1993
- https://www.linkedin.com/in/chintansparikh

AI Engineer

Google DeepMind's Chintan Parikh presents the Gemma 4 edge models (2B and 4B parameter variants) and the LiteRT on-device inference framework. Key topics include benefits of edge AI deployment (latency, privacy, cost), new Gemma 4 capabilities like function calling, structured JSON output, and chain-of-thought reasoning, plus a demo gallery app showcasing on-device agent skills. The LiteRT framework supports cross-platform deployment across Android, iOS, macOS, Linux, Windows, Raspberry Pi, and IoT devices, with NPU acceleration from Qualcomm and MediaTek delivering up to 13x performance boosts. Models are available on Hugging Face under Apache 2.0 and support PyTorch and JAX conversion paths.

Accelerating AI on Edge — Chintan Parikh and Weiyi Wang, Google DeepMind