...covered with fundamentals, bottlenecks, and techniques!

Daily Dose of DS offers a daily dose of inspiration, education, and motivation for data scientists and aspiring data professionals. Through bite-sized articles, tutorials, and curated resources, readers embark on a journey to master the art and science of data analysis, machine learning, and artificial intelligence. By staying updated with the latest trends, techniques, and tools in data science, readers can hone their skills and stay ahead in this rapidly evolving field.

Daily Dose of Data Science | Avi Chawla | Substack

Part 13 of an LLMOps crash course focused on LLM inference and optimization. Covers the prefill and decode phases, KV caching and its optimizations (PagedAttention, prefix caching), attention-level techniques (FlashAttention, GQA), speculative decoding, model parallelism strategies, and hands-on comparisons between vLLM and standard inference. Emphasizes that inference optimization is critical for production deployments where costs, latency, and memory constraints determine whether a model is actually usable at scale.

A Practical Deep Dive on LLM Inference and Optimization!