Semlib is a Python library for building data processing and data analysis pipelines that leverage the power of large language models (LLMs).

Anish Athalye

Semlib is a Python library that brings functional programming primitives (map, reduce, sort, filter) to LLM-powered data processing pipelines. Instead of dumping all data into a single LLM prompt, Semlib structures complex tasks into smaller, concurrent sub-tasks described in natural language, improving quality, reducing cost, and handling arbitrarily large datasets. The author built it while automating performance review synthesis for their engineering team, finding the structured multi-step approach outperformed single-shot prompting with Claude. Semlib handles prompting, parsing, concurrency, caching, and cost tracking under the hood, and is positioned as a simpler, more practical alternative to research systems like DocETL, LOTUS, and Palimpzest.

Semlib: Semantic Data Processing

Case study: automating performance reviews with Semlib