A deep dive into RAG chunking strategies based on real production failures. Starting from a compliance incident caused by a split paragraph, the author walks through fixed-size chunking, sentence windows, hierarchical chunking, and semantic chunking — explaining when each works and when it fails. Special attention is given to enterprise document types: PDFs with complex layouts, tables (converted to natural-language sentences for better retrieval), and slide decks with diagrams. A routing framework selects the right parser per document type. RAGAS metrics (context recall, faithfulness, context precision) are used throughout to quantify improvements, with context recall jumping from 0.72 to 0.88 after switching to sentence windows for narrative documents.
Table of contents
In This ArticleWhat Chunking Is and Why Most Engineers Underestimate ItThe First Crack: Fixed-Size ChunkingGetting Smarter: Sentence WindowsWhen Your Documents Have Structure: Hierarchical ChunkingThe Alluring Option: Semantic ChunkingThe Problem Nobody Talks About: PDFs, Tables, and SlidesA Decision Framework, Not a RankingWhat RAGAS Tells You About Your ChunksWhere This Leaves UsSort: