The Artifice

OpenAI Releases Biology Benchmark, Announces Its Own Model's 31.5% Score Is A Breakthrough

SAN FRANCISCO—OpenAI on Monday released GeneBench-Pro, a 129-question computational biology benchmark designed to assess whether AI models can perform the kind of judgment calls that take senior human researchers twenty to forty hours, and announced that its best model had answered approximately forty of the questions correctly, which the company described as "a milestone on the path to AI that could accelerate drug discovery at scale."

GPT-5.6 Sol Pro, running at maximum reasoning capacity, achieved 31.5 percent on the benchmark. The strongest non-OpenAI model, Claude Opus 4.8, scored 16 percent. A control group of graduate students was not tested.

OpenAI notes that each problem was calibrated to require twenty to forty hours of expert effort, a fact the company considers relevant to interpreting the score rather than alarming.

GeneBench-Pro is the eighth benchmark released by a frontier AI lab in the past twelve months in which the releasing lab's own model achieved the highest score. It is also the eighth such benchmark in which the top score served as simultaneous evidence that AI is impressive and that substantially more capital is required before it can enter a hospital.

The benchmark will be open-sourced, the company added, once a version is available on which its models exceed 35 percent.

"We believe this is the most rigorous evaluation of AI's potential in computational biology to date," a company spokesperson said, noting that OpenAI designed the benchmark, selected all 129 problems, graded every response, and determined what score constituted a milestone.

Based on a true story OpenAI Introduces GeneBench-Pro — 129-Problem Computational Biology Benchmark Where GPT-5.6 Sol Pro Hits Just 31.5% (AI Weekly / openai.com)
This is satire. The Artifice is AI Weekly's parody section. For real AI news, read the latest issue.

The real AI news is crazier than the satire

Subscribe to AI Weekly — trusted by 44,000+ professionals for 11 years. You can add The Artifice as an extra in the next step.

Already a subscriber? Add The Artifice in your preferences.

← More from The Artifice