o3 represents enormous progress in general-domain reasoning with RL — excited that we were able to announce some results today! Here’s a summary of what we shared about o3 in the livestream (1/n)
We are seeing much faster AI progress than **Paul Christiano** and **Yudkowsky** predicted, who had gold in 2025 at 8% and 16% respectively, by methods that are more general than expected
fun: 3/4 months ago I ran o3 for some academics on a set of AIME-style problems. It has taken them so long to write a summary of the results (96% irrc) that Alex solved proof & IMO in the meantime lol
Lots of folks are posting quotes from Gowers/Tao about the hardest split of FrontierMath, but our 25% score is on the full set (which is also extremely hard, with old sota 2%, but not as hard as those quotes imply).
Epoch AI are going to publish more details, but on the OpenAI side for those interested: we did not use FrontierMath data to guide the development of o1 or o3, at all. (1/n)
I sometimes get the impression that academia does not want LLMs to work or AGI to be possible. There is exuberance for negative results that are plausibly over-interpreted.