OpenAI’s latest model, GPT-5.2 Pro, has demonstrated the ability to generate and verify formal proofs for long‑standing mathematical problems, while also achieving a record 31 percent success rate on the toughest tier of the FrontierMath benchmark. These breakthroughs confirm that AI‑driven reasoning now rivals expert mathematicians on genuine research challenges.
ChatGPT Generates First Formal Proof of Historic Problem
Proof Generation Process
Researchers led by mathematicians Barreto and Price used the large‑language model ChatGPT to construct a rigorous proof for a problem that had resisted professional mathematicians for decades. The AI’s argument was described as “quite nice” and “sophisticated,” earning approval from the human collaborators. To ensure correctness, the proof was translated into formal code for the Lean proof assistant using the Aristotle system, which automatically verified every logical step.
GPT-5.2 Pro Sets New Record on FrontierMath Benchmark
Benchmark Performance Details
GPT-5.2 Pro achieved a 31 percent success rate on Tier 4 of the FrontierMath benchmark, the most demanding tier designed to test AI on the hardest problems in the collection. This marks a substantial improvement over the previous best of 19 percent, highlighting the model’s enhanced symbolic reasoning and problem‑solving capabilities.
Implications for Mathematical Research and Formal Verification
Accelerating Discovery with AI‑Assisted Proofs
The ability to generate plausible arguments and automatically render them in a formal proof assistant could dramatically speed up the discovery pipeline. By handling routine derivations, AI frees mathematicians to focus on high‑level conceptual work, while formal assistants like Lean ensure that every result meets rigorous standards.
Future Challenges: Authorship, Credit, and Ethical Considerations
As AI systems produce proofs that pass formal verification, the community must address questions of authorship and credit. Current practice treats AI as a tool rather than a co‑author, but increasing autonomy may blur this line. Ongoing dialogue will be essential to define responsible use and attribution in AI‑augmented mathematics.
