OpenAI for Science unveils early GPT-5 research results across mathematics, biology, and physics

ResearchAI

24 Nov

OpenAI for Science has released early findings showing how GPT-5 is already accelerating research workflows across disciplines, including new proofs for previously unsolved mathematical problems.

Kevin Weil, VP of OpenAI for Science, took to LinkedIn to announce the group’s first major public update: a paper documenting 13 early experiments where GPT-5 helped researchers progress work in mathematics, physics, biology, materials science, and computer science.

OpenAI for Science is the division focused on applying frontier AI models to accelerate scientific discovery, working with researchers across universities, national laboratories, and industry.

In the LinkedIn post, Weil says the aim is “to be measured, yet optimistic,” noting that the paper shows “what GPT-5 can and cannot do today, and [gives] a clear path for how researchers can use it to accelerate scientific discovery while keeping standards high.”

The paper compiles case studies showing how the model helped complete complex calculations, propose experimental mechanisms, surface new literature connections, and in several cases contribute missing steps in open mathematical problems.

Weil writes that the model’s ability to match concepts “across disciplines and languages” is already changing how researchers search and synthesize technical literature.

New examples of reasoning in mathematical problem-solving

Four of the paper’s 13 case studies involve GPT-5 helping generate proofs for unresolved mathematical problems.

In one example highlighted by Weil, physicist Robert Scherrer reflects on long-standing open problems he has collected over his career, stating: “I have accumulated a number of such unsolved interesting mathematical problems that have frustrated me over my 40-year research career. Many of these seem particularly well-suited to AI solutions. I have long waited for this moment to arrive.”

The paper also documents a case in which researchers were tackling an Erdős number-theory problem. GPT-5 provided the key idea for understanding how a single “out-of-pattern” number constrains the entire structure of the set. That insight allowed mathematicians Mehtaab Sawhney and Mark Sellke to complete a full proof.

The researchers independently verified all steps and emphasized that the system served as a fast reasoning partner rather than an autonomous solver.

Acceleration in biology, physics, and computational science

Outside mathematics, OpenAI’s case studies show how GPT-5 supported reasoning in contexts where conceptual connections matter.

In biology, GPT-5 analyzed an unpublished chart and identified a plausible mechanism behind an unexpected change in human immune cells. It also proposed an experiment that researchers later validated in the lab.

In physics, the model helped investigate symmetries around black-hole equations and contributed to simplifying steps in computational modeling workflows.

In computer science, GPT-5 helped analyze failure cases of common optimization methods, tightening known results and suggesting clearer constructions for researchers to evaluate.

Across these examples, the model did not run experiments or operate independently, but it provided alternative directions, counterexamples, and reasoning paths that researchers could assess.

Literature search as emerging capability

One recurring theme in the paper is the model’s ability to conduct conceptual literature search, connecting a new theorem or idea to related fields, including work published in less accessible journals or other languages.

Researchers reported that this significantly reduced time spent identifying prior work and understanding how new results align with existing theory.

Weil writes that GPT-5 is “an incredible brainstorm partner for new ideas due to the sheer breadth of science it understands.”

Limitations and the need for expert oversight

OpenAI notes that GPT-5 can hallucinate references, miss domain-specific context, or follow unproductive reasoning paths unless guided.

The paper frames these case studies as early experiments rather than generalizable benchmarks. OpenAI states that expert oversight is essential, and most progress emerges from iterative human–AI collaboration.

Despite limitations, the collective findings suggest deeper potential as the model is given more time, specialized tools, and opportunities to reason over longer sessions.

Weil says GPT-5 is not yet tackling major open problems like the Riemann Hypothesis, but the shift from summarizing existing knowledge to contributing small new results marks a meaningful threshold.

He adds that the idea of an LLM providing proofs “would have been absurd a year ago,” emphasizing the pace of change in AI-assisted research.

Weil closes his announcement by thanking co-authors and contributors, noting: “Looking forward to hearing your thoughts! 2026 is going to be wild.”