A practical walkthrough of using a locally hosted LLM (via Ollama with Gemma 2) as a zero-shot classifier for short, semantically varied free-text data. The approach avoids the need for labeled training data by defining candidate categories from domain knowledge and prompting the model to assign each entry. The pipeline covers preprocessing (token reduction, normalization, deduplication), classification with low-temperature settings for consistency, and result aggregation. The post benchmarks performance on ~7,000 security annotations, compares Gemma 2 vs Llama 3.2, and clearly outlines when this technique is and isn't appropriate compared to embeddings, regex, or supervised classifiers.
Table of contents
Why traditional clustering struggles with short free-textLLMs as zero-shot classifiersBuilding the pipelineWhen this approach is not the right fitOther applications worth tryingGetting startedSort: