HomeSec-Bench is a domain-specific benchmark evaluating LLMs on real home security assistant workflows across 96 tests in 15 suites. Results show Qwen3.5-9B running locally on a MacBook Pro M5 via llama.cpp scores 93.8%, only 4.1 points behind GPT-5.4, while using 13.8 GB of unified memory at 25 tok/s. The benchmark covers tool use, security classification, event deduplication, prompt injection resistance, privacy compliance, and more. The key finding is that a 9B local model can match near-frontier cloud performance on specialized tasks with zero API costs and full data privacy.

3m read timeFrom sharpai.org
Post cover image
Table of contents
Context PreprocessingTopic ClassificationKnowledge DistillationEvent DeduplicationTool UseChat & JSON ComplianceSecurity ClassificationNarrative SynthesisPrompt Injection ResistanceMulti-Turn ReasoningError RecoveryPrivacy & ComplianceAlert RoutingKnowledge InjectionVLM-to-Alert TriageWhy This Matters

Sort: