I-DLM (Introspective Diffusion Language Model) is a new approach to diffusion-based language models that closes the quality gap with autoregressive (AR) models. The core insight is that existing DLMs lack 'introspective consistency' — they generate tokens without verifying them as AR models implicitly do. I-DLM introduces Introspective Strided Decoding (ISD), which generates new tokens and verifies prior ones in a single forward pass using a p/q acceptance criterion. The 8B model is claimed to be the first DLM to match same-scale AR quality, outperforming LLaDA-2.1-mini (16B) by +26 on AIME-24 and +15 on LiveCodeBench-v6 while delivering 2.9–4.1x throughput at high concurrency. A lossless variant (R-ISD) using gated LoRA guarantees bit-for-bit identical output to the base AR model. The model integrates directly into SGLang with no custom infrastructure, and code, models, and benchmarks are publicly available.
Table of contents
AbstractWhy Introspective Consistency?The I-DLM MethodResultsSpeedup Factor ExplorerDocumentation & ResourcesCitationSort: