CyberSecQwen-4B: Why Defensive Cyber Needs Small, Specialized, Locally-Runnable Models

CyberSecQwen-4B is a 4B-parameter LLM fine-tuned on cybersecurity threat intelligence tasks (CWE classification, CVE-to-CWE mapping, CTI Q&A), designed to run locally on a 12 GB consumer GPU. Trained on a single AMD Instinct MI300X using LoRA with FlashAttention-2 and ROCm 7, it outperforms Cisco's Foundation-Sec-Instruct-8B on CTI-MCQ by +8.7 percentage points while using half the parameters. The core argument is that defensive security practitioners need small, specialized, locally-runnable models because sensitive data cannot leave the premises, air-gapped environments are common, and per-call API costs are prohibitive. A companion 2B model (Gemma4Defense-2B) trained with the same recipe achieves similar results, validating the approach is recipe-driven rather than substrate-specific. The model is Apache 2.0 licensed and available on Hugging Face with a live demo.

#cyber

#deep-learning

#lora

May 08•9m read time•From huggingface.co

Table of contents

Why this matters Why a small specialized model, not just a small model A 5-minute walkthrough Why AMD MI300X The training data The recipe Companion model: same recipe, different substrate Challenges and fixes Try it yourself Intended use What's next Closing

Comment

Bookmark

Copy

Sort: