Flaws replicated from Meta’s Llama Stack to Nvidia TensorRT-LLM, vLLM, SGLang, and others, exposing enterprise AI stacks to systemic risk.

InfoWorld is a source of news, analysis, and commentary on technology trends, IT strategies, and business innovation. With a focus on enterprise technology and digital transformation, InfoWorld offers insights and guidance for IT decision-makers, software developers, and technology professionals. From  articles on cloud computing and cybersecurity to product reviews and industry trends, InfoWorld helps readers navigate the complexities of modern IT environments and make informed decisions to drive business success.

InfoWorld

Critical remote code execution vulnerabilities were discovered across major AI inference frameworks including Meta's Llama Stack, Nvidia TensorRT-LLM, vLLM, and SGLang. The flaws originated from unsafe use of ZeroMQ and Python's pickle deserialization in Meta's code, then spread to other projects through copy-paste development practices. Attackers could exploit these vulnerabilities to execute arbitrary code on GPU clusters, exfiltrate data, or compromise AI infrastructure. All affected vendors have released patches, and organizations are advised to upgrade immediately and implement authentication measures for ZeroMQ communications.

Copy-paste vulnerability hits AI inference frameworks at Meta, Nvidia, and Microsoft