Improving Bash Generation in Small Language Models with Grammar-Constrained Decoding

NVIDIA's AI Red Team applied grammar-constrained decoding to improve Bash command generation in small language models. Using Lark grammars generated from command help documentation, they constrained the token sampling process during inference to enforce syntactically valid Bash. Evaluated across 13 small models and 299 tasks, the technique improved mean pass rates from 62.5% to 75.2%, with the largest gain on Qwen3-0.6B jumping from 16.7% to 59.2%. The pipeline uses grammargen to produce grammars, llguidance to apply them during llama.cpp inference, and tree-sitter-bash for syntax validation with retry. Gains were strongest for filter/transform and recon/action tasks, while complex shell constructs like loops and heredocs saw minimal improvement. The post also discusses security implications, noting grammars can encode policy constraints but must be paired with other controls for safe agentic deployment.

#agentic-ai

#bash

#llama-cpp

May 08•11m read time•From developer.nvidia.com

Table of contents

Why Bash Generating grammars Applying grammars during decoding Measuring uplift Security implications Recommendations Get started

Comment

Bookmark

Copy

Sort: