Towards Data Science is a community-powered publication that showcases work in data science, machine learning and artificial intelligence. Every day newcomers, seasoned researchers and industry practitioners publish tutorials, research notes and real-world case studies that help the field move forward.

Towards Data Science

LatentVLA is a novel architecture for autonomous driving that avoids natural language reasoning entirely. Instead, it learns discrete ego-centric latent actions from unlabeled driving data using a self-supervised encoder-decoder framework inspired by LAPO, with a VQ-VAE to discretize continuous action vectors. A Qwen2.5-VL (3.8B) model is trained to predict these latent actions, then distilled into a compact 50M-parameter decision transformer for real-time use. The approach integrates VLM knowledge into existing end-to-end architectures (iPad, Transfuser) via a fusion module using cross-attention in Bird's-Eye-View space. Evaluated on NavSim, LatentVLA achieves state-of-the-art results, though performance gains over baselines are modest. The author notes that open-loop evaluation has significant limitations and argues closed-loop testing would likely reveal larger advantages for reasoning-based approaches.