Towards Data Science is a community-powered publication that showcases work in data science, machine learning and artificial intelligence. Every day newcomers, seasoned researchers and industry practitioners publish tutorials, research notes and real-world case studies that help the field move forward.

Towards Data Science

A practical walkthrough of using a locally hosted LLM (via Ollama with Gemma 2) as a zero-shot classifier for short, semantically varied free-text data. The approach avoids the need for labeled training data by defining candidate categories from domain knowledge and prompting the model to assign each entry. The pipeline covers preprocessing (token reduction, normalization, deduplication), classification with low-temperature settings for consistency, and result aggregation. The post benchmarks performance on ~7,000 security annotations, compares Gemma 2 vs Llama 3.2, and clearly outlines when this technique is and isn't appropriate compared to embeddings, regex, or supervised classifiers.

Using a Local LLM as a Zero-Shot Classifier

Why traditional clustering struggles with short free-text