Carnegie Mellon University and Fujitsu developed three benchmarks to evaluate AI agent safety and effectiveness in enterprise environments. FieldWorkArena tests agents in logistics and manufacturing settings for detecting safety violations, while ECHO measures hallucination mitigation in vision language models, and an
Sort: