A critical unpatched RCE vulnerability has been discovered in SGLang, a popular LLM inference engine powering over 400,000 GPUs worldwide. The attack vector is a malicious GGUF model file containing a crafted tokenizer chat template with a Jinja2 payload. When SGLang's reranking endpoint processes any request, it renders the chat template using an unsandboxed Jinja2 environment, allowing arbitrary Python code execution on the host server. The attacker must first get the victim to load the malicious model — achievable via supply chain attacks or social engineering on model hubs like Hugging Face. Full server compromise, data exfiltration, and lateral movement are possible outcomes. The fix is straightforward: replace the default Jinja2 environment with Jinja2's SandboxedEnvironment, as demonstrated by the already-patched CVE-2024-34359 in llama.cpp.
•9m watch time
1 Comment
Sort: