Today, we released our report on competitive programming with large reasoning models. We show how our LLMs have evolved from amateur competitive programmers to competing with the world's best:
Ahmed El-Kishky
116 posts
Researcher at OpenAI
San Francisco
Joined September 2024
- My first project at OpenAI involved teaching our models to reason and use tools by improving their competitive programming skills. Back then, GPT-4 struggled with even the simplest Codeforces problems, often oom-ing in the sandbox. It's incredible to see that just 2.5 years1/n I’m thrilled to share that our @OpenAI reasoning system scored high enough to achieve gold 🥇🥇 in one of the world’s top programming competitions - the 2025 International Olympiad in Informatics (IOI) - placing first among AI participants! 👨💻👨💻
- Replying to @ahelkky9/ When we inspected the chain of thought, we discovered the model had independently developed its own test-time strategies. One interesting one was the model 1) wrote a simple brute-force solution first then 2) used it to validate a more complex optimized approach.
- My colleagues and I will be hosting a talk and Q&A session on 'Learning to Reason with LLMs' and the new OpenAI o1 model. Join us for an insightful discussion! forum.openai.com/public/events/… #OpenAIForum
- Congratulations @FakePsyho on a nail-biting performance! Great showings as well from @bminaiev, @andresnds, and @_lorenzkuhn representing OpenAI. It’s been fantastic sponsoring AtCoder World Finals @atcoder. We’re excited to share some of the model solutions with the world.Humanity has prevailed (for now!) I'm completely exhausted. I figured, I had 10h of sleep in the last 3 days and I'm barely alive. I'll post more about the contest when I get some rest. (To be clear, those are provisional results, but my lead should be big enough)
- Excited that the world gets to see some of the incredible reasoning research we've been working on at @OpenAI. openai.com/index/learning…
- Replying to @ahelkky11/ Since competitive programming is just one facet of coding, o3 contributors also evaluated models on software engineering tasks. While there’s still a long way to go, it’s clear that learning to reason through RL improves SWE capabilities.
- Replying to @ahelkky10/ We again saw gains on uncontaminated Codeforces contests—the model’s Elo ranked in the 99.8th percentile, placing it around #175 globally.
- Replying to @ahelkky8/ Without any elaborate hand-crafted strategies, o3 achieved IOI gold under official contest constraints (50-submissions per problem, same time constraints).
- Replying to @ahelkky12/ Excited to share OpenAI's journey in exploring how our models perform in competitive programming! Check out our full report arxiv.org/pdf/2502.06807
- Replying to @ahelkky3/ A major step-function improvement came with large reasoning models like OpenAI o1, trained with reinforcement learning to reason effectively in their chains of thought. We saw performance jump from 11th percentile Elo to 89th on held-out / uncontaminated Codeforces contests.
- Replying to @ahelkky7/ But progress didn't stop there. OpenAI announced OpenAI o3, trained with even more reinforcement learning. We wanted to see how far competitive programming could go without using hand-crafted test-time strategies - through RL alone.
- Replying to @ahelkky6/ Our hand-crafted test-time strategies were very effective! They boosted our IOI score by ~60 points and increased o1-ioi's performance on held-out Codeforces contests from the 93rd to 98th percentile.
- Replying to @ahelkky1/ Read the full report here: arxiv.org/pdf/2502.06807













