Paper
Olympiad-level formal mathematical reasoning with reinforcement learning
A long-standing goal of artificial intelligence (AI) is to build systems capable of complex reasoning in vast domains, a task epitomized by mathematics with its boundless concepts and demand for rigorous proof. Recent AI systems, often reliant on human data, typically lack the formal verification necessary to guarantee correctness. By contrast, formal languages such as Lean<sup>1</sup> offer an interactive environment that grounds reasoning, and reinforcement learning (RL) provides a mechanism for learning in such environments. Here we present AlphaProof, an AlphaZero-inspired<sup>2</sup> agent that learns to find formal proofs through RL by training on millions of auto-formalized problems. For the most difficult problems, it uses test-time RL, a method of generating and learning from millions of related problem variants at inference time to enable deep, problem-specific adaptation. AlphaProof substantially improves state-of-the-art results on historical mathematics competition problems. At the 2024 International Mathematical Olympiad competition, our AI system, with AlphaProof as its core reasoning engine, solved three out of the five non-geometry problems, including the competition's most difficult problem. Combined with AlphaGeometry 2<sup>3</sup>, this performance, achieved with multi-day computation, resulted in reaching a score equivalent to that of a silver medallist, marking the first time an AI system achieved any medal-level performance, to our knowledge. Our work demonstrates that learning at scale from grounded experience produces agents with complex mathematical reasoning strategies, paving the way for a reliable AI tool in complex mathematical problem solving.
Authors: Thomas Hubert · Rishi Mehta · Laurent Sartran · Miklós Z. Horváth · Goran Žužić · Eric Wieser · Aja Huang · Julian Schrittwieser · Yannick Schroecker · Hussain Masoom · Ottavia Bertolli · Tom Zahavy · Amol Mandhane · Jessica Yung · Iuliya Beloshapka · Borja Ibarz · Vivek Veeriah · Lei Yu · Oliver Nash · Paul Lezeau · Salvatore Mercuri · Calle Sönne · Bhavik Mehta · Alex Davies · Daniel Zheng · Fabian Pedregosa · Yin Li · Ingrid von Glehn · Mark Rowland · Samuel Albanie · Ameya Velingker · Simon Schmitt · Edward Lockhart · Edward Hughes · Henryk Michalewski · Nicolas Sonnerat · Demis Hassabis · Pushmeet Kohli · David Silver