A practical walkthrough comparing three chunking strategies for RAG-based code knowledge assistants: naive fixed-size splitting, language-aware splitting (LangChain's RecursiveCharacterTextSplitter), and AST-based chunking using Tree-sitter. Each strategy was deployed as a Databricks Knowledge Assistant over a demo codebase,
Table of contents
How Knowledge Assistants Works (and Why Code Is Different)Chunking StrategiesEvaluation Setup with MLflowSort: