We’re developing a blueprint for evaluating the risk that a large language model (LLM) could aid someone in creating a biological threat. In an evaluation involving both biology experts and students, we found that GPT-4 provides at most a mild uplift in biological threat creation accuracy. While this uplift is not large enough to be conclusive, our finding is a starting point for continued research and community deliberation.

The OpenAI Blog offers an in resource for developers interested in AI research, providing deep dives into state-of-the-art AI models, breakthroughs in machine learning, and ethical considerations in AI development. Developers can explore the latest advancements in natural language processing, reinforcement learning, and computer vision, gaining insights into the underlying principles and methodologies driving AI innovation. Furthermore, the blog addresses broader societal implications of AI, sparking discussions on topics such as bias mitigation, AI ethics, and responsible AI deployment, shaping the discourse around AI's impact on society.

OpenAI

The evaluation aims to measure whether AI models like GPT-4 increase access to information about biological threat creation compared to existing resources. The study found mild uplifts in accuracy and completeness for participants with access to GPT-4, but the effect sizes were not statistically significant. The evaluation highlights the need for more research to determine meaningful thresholds of increased risk. The methodology involved human participants, with experts having access to a research-only variant of GPT-4, and the tasks covered various stages of the biological threat creation process.

Building an early warning system for LLM-aided biological threat creation