Researchers created BinaryAudit, a benchmark testing AI agents' ability to detect backdoors in compiled binaries using reverse engineering tools like Ghidra. Claude Opus 4.6 achieved 49% detection rate on artificially injected backdoors in open-source servers. While models can now operate decompilers and perform basic binary

Sort: