LangExtract is a Python library by Google that uses Large Language Models to extract structured information from unstructured text documents. It provides precise source grounding by mapping extractions to exact locations in source text, supports various LLMs including Gemini and local models via Ollama, and generates interactive HTML visualizations. The library handles long documents through optimized chunking and parallel processing, requires minimal setup with few-shot examples, and includes specialized applications for medical text processing like medication extraction and radiology report structuring.
Table of contents
Table of ContentsIntroductionWhy LangExtract?Quick StartInstallationAPI Key Setup for Cloud ModelsMore ExamplesContributingTestingDevelopmentTroubleshootingDisclaimerSort: