Daily Dose of DS offers a daily dose of inspiration, education, and motivation for data scientists and aspiring data professionals. Through bite-sized articles, tutorials, and curated resources, readers embark on a journey to master the art and science of data analysis, machine learning, and artificial intelligence. By staying updated with the latest trends, techniques, and tools in data science, readers can hone their skills and stay ahead in this rapidly evolving field.

Daily Dose of Data Science | Avi Chawla | Substack

Diffusion LLMs (dLLMs) offer an alternative to autoregressive text generation by starting with a fully masked sequence and unmasking tokens in parallel using bidirectional attention, shifting inference from memory-bandwidth bound to compute-bound. Part 2 of this deep dive covers scaling training from 8B to 100B parameters, converting pre-trained autoregressive models like LLaMA into diffusion models via attention mask annealing, inference acceleration techniques (block-wise KV caching, confidence-aware parallel decoding, token editing), production serving with SGLang, and hands-on code for running Dream 7B and LLaDA 2.0. Benchmark results show LLaDA 8B matching LLaMA 3 on MMLU and exceeding it on TruthfulQA and HumanEval.

The Anatomy of Diffusion LLMs