Towards Data Science is a community-powered publication that showcases work in data science, machine learning and artificial intelligence. Every day newcomers, seasoned researchers and industry practitioners publish tutorials, research notes and real-world case studies that help the field move forward.

Towards Data Science

Multi-Token Prediction (MTP) challenges the standard next-token prediction approach in LLMs by training models to predict multiple future tokens simultaneously. Research shows transformers already encode future text trajectories in their hidden states, which MTP explicitly leverages as a training objective. The architecture uses a shared trunk with independent prediction heads, achieving up to 3x inference speedup through self-speculation and 17% better performance on coding benchmarks for larger models. While MTP excels at reasoning tasks, it underperforms on knowledge retrieval benchmarks. DeepSeek-V3 successfully deployed MTP in production, validating its practical benefits for improving reasoning capabilities and inference efficiency.

Why We’ve Been Optimizing the Wrong Thing in LLMs for Years

The MTP Architecture: Parallelizing Prediction

Experimental Results: The Scale of Improvement

The Price of Foresight: Shortcomings and Trade-offs