Advances in AI architectures have unlocked multimodal functionality, enabling transformer models to process multiple forms of data in the same context.

NVIDIA DevTalk serves as a vibrant community hub where developers can engage in discussions, seek assistance, and collaborate on projects involving NVIDIA hardware and software. Developers can tap into the collective expertise of the NVIDIA developer community, sharing insights, troubleshooting issues, and exploring best practices for GPU programming and AI development. Additionally, DevTalk provides a platform for developers to showcase their projects, receive feedback, and network with peers, fostering collaboration and knowledge exchange within the NVIDIA ecosystem.

NVIDIA Developer

Vision language models (VLMs) can be manipulated through adversarial image perturbations, similar to traditional image classifiers. Using techniques like Projected Gradient Descent (PGD), attackers can craft pixel-level modifications or adversarial patches that cause VLMs to generate unexpected outputs. The article demonstrates how to generate adversarial examples against PaliGemma 2 by optimizing image inputs to manipulate token logits, showing that a red traffic light can be modified to output "go" or even arbitrary responses like "eject." Developers building systems with VLMs should implement input/output sanitization, guardrails, and consider the broader attack surface when processing untrusted images.

Updating Classifier Evasion for Vision Language Models