The evaluation aims to measure whether AI models like GPT-4 increase access to information about biological threat creation compared to existing resources. The study found mild uplifts in accuracy and completeness for participants with access to GPT-4, but the effect sizes were not statistically significant. The evaluation highlights the need for more research to determine meaningful thresholds of increased risk. The methodology involved human participants, with experts having access to a research-only variant of GPT-4, and the tasks covered various stages of the biological threat creation process.
Sort: