A benchmark comparing two code indexing strategies for Kodit (used in Helix Code Intelligence) found that simple text chunking outperformed program slicing by 14 percentage points on SWE-Bench Verified. Program slicing — which uses syntax trees to extract structurally coherent, dependency-aware code snippets — actually performed below the no-indexing baseline. The explanation: LLMs are trained on whole files, not synthetic syntax-tree constructs, so feeding them program slices disrupts their natural processing. Chunking achieved a 60% resolve rate vs. 46% for slicing and 48% for the baseline across 25 evaluated instances.

2m read timeFrom blog.helix.ml
Post cover image

Sort: