A brief commentary pointing to a report arguing that AI security cannot be measured by simply maximizing benchmarks. Drawing parallels to 30 years of software security engineering evolution (from penetration testing to BSIMM), the report suggests that while software security measurement approaches will likely translate to AI, there is currently no reliable 'security meter' for AI systems. Progress can still be made by managing risk through good assurance processes, but vigilance remains essential.
Sort: