Semlib is a Python library that brings functional programming primitives (map, reduce, sort, filter) to LLM-powered data processing pipelines. Instead of dumping all data into a single LLM prompt, Semlib structures complex tasks into smaller, concurrent sub-tasks described in natural language, improving quality, reducing cost, and handling arbitrarily large datasets. The author built it while automating performance review synthesis for their engineering team, finding the structured multi-step approach outperformed single-shot prompting with Claude. Semlib handles prompting, parsing, concurrency, caching, and cost tracking under the hood, and is positioned as a simpler, more practical alternative to research systems like DocETL, LOTUS, and Palimpzest.

10m read timeFrom anishathalye.com
Post cover image
Table of contents
Semlib: Semantic Data ProcessingOrigin storyDesignCase study: automating performance reviews with SemlibRelated workConclusion

Sort: