promptfoo/promptfoo: Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs
Promptfoo is an open-source CLI and library for evaluating and red-teaming LLM applications. It supports automated prompt testing, model comparison (GPT, Claude, Gemini, Llama, etc.), vulnerability scanning, and CI/CD integration. The tool runs evaluations locally for privacy, uses declarative configs, and is now part of OpenAI while remaining MIT licensed. It claims to power LLM apps serving 10M+ users in production.