NVIDIA's AI Red Team applied grammar-constrained decoding to improve Bash command generation in small language models. Using Lark grammars generated from command help documentation, they constrained the token sampling process during inference to enforce syntactically valid Bash. Evaluated across 13 small models and 299 tasks, the technique improved mean pass rates from 62.5% to 75.2%, with the largest gain on Qwen3-0.6B jumping from 16.7% to 59.2%. The pipeline uses grammargen to produce grammars, llguidance to apply them during llama.cpp inference, and tree-sitter-bash for syntax validation with retry. Gains were strongest for filter/transform and recon/action tasks, while complex shell constructs like loops and heredocs saw minimal improvement. The post also discusses security implications, noting grammars can encode policy constraints but must be paired with other controls for safe agentic deployment.

11m read timeFrom developer.nvidia.com
Post cover image
Table of contents
Why BashGenerating grammarsApplying grammars during decodingMeasuring upliftSecurity implicationsRecommendationsGet started

Sort: