1/N Yesterday in Tokyo we @OpenAI ran a 10‑hour live Humans vs AI exhibition at the AtCoder World Tour Finals Heuristic. We pointed an OpenAI reasoning model at the same brutal problem the finalists tackled—no human help, same rules, same clock. Buckle up. 👇
Andre Saraiva
133 posts
o1-preview, o1-mini, o1, o3-mini,o4-mini, o3...
Reasoning Researcher at OpenAI. Ex-DeepMind.
San Francisco
Joined December 2010
- Replying to @andresnds15/N We ran the model fully autonomously for the 10h window—no human intervention; same submission/data/tools/time budget as everyone else. Watching it iterate live beside elite humans was electric.
- Replying to @andresnds @bminaiev and 9 others@_lorenzkuhn and me in-person watching the model compete
- Replying to @andresnds9/N Side note: @FakePsyho came in running on ~1h sleep (!!). Dude still shipped. Massive respect. What a legend clip (~t=7h20m):
- Replying to @andresnds12/N Final minutes: Psyho ships again and opens what looks like a winning gap. Model nudges scores but can’t close. Humans > AI (provisional). Provisional Exhibition Standings: atcoder.jp/contests/awtf2…
- Replying to @andresnds2/N The task (“Group Commands & Wall Planning”): 30×30 grid, K robots start→goal. Before moving you can add walls + assign robot groups, then issue group/solo moves to steer everyone home w/ fewest ops (score = ops + distance penalty). Problem:
- I feel especially proud to have helped train some of the steps of these models. Watching coding agents move from research to genuinely useful real-world scenarios has been amazing. I’ve been using Codex to try out research ideas faster and answer questions about the codebase.We’re launching a research preview of Codex: a cloud-based software engineering agent that can work on many tasks in parallel. Rolling out to Pro, Enterprise, and Team users in ChatGPT starting today. chatgpt.com/codex
- Replying to @andresnds3/N Hours 0‑3: Model does what models do—slam a fast baseline. It ignores new walls, instead routing + grouping through the initial maze, and rockets to #1. Scoreboard looks great… but seasoned AtCoder folks know early greedy leads rarely survive human midgame builds.
- Just 2 years ago this line of work could barely pass easy CF tasks. Today our models reach IOI gold level on the official AI track, first among AI entrants. Long project with @ahelkky @_lorenzkuhn @alexwei_ @bminaiev @oleg_murk @MostafaRohani @clavera_i. See @SherylHsu02 thread.1/n I’m thrilled to share that our @OpenAI reasoning system scored high enough to achieve gold 🥇🥇 in one of the world’s top programming competitions - the 2025 International Olympiad in Informatics (IOI) - placing first among AI participants! 👨💻👨💻
- Replying to @andresnds13/N Mini timeline: 0h → Model #1 (no walls) 3h → Model walls; chat loses it 3‑7h → Model leads; humans close ~7h → Psyho passes, model #2 ~8h → Model comeback → #1 8‑9h → Model leads Final hr → Psyho surge; humans win (probably)
- Replying to @andresnds5/N ~Hour 3 (~12:08 JST): WALLS!!! The model starts placing them and chat 🤯: hayashi: “Wall by OpenAI!” 北杜: “Seriously?!” tk bd: “That thing we’ve been waiting for is here.” 楽しんで学ぶ: “The tension…” よには/yoniha: "Now it might actually happen." cocoa milky: "Speechless
- Replying to @andresnds10/N ~Hour 8: Plot twist. Model finds new ideas (better walling, better resource scheduling) → jumps back to #1 and holds into ~Hour 9. This is going down to the wire!!
- Replying to @andresnds18/N At the end of the day this isn’t humans versus AI as a zero‑sum sport—it’s research. Every round like this helps us measure where our models are in our quest build AI that amplify human ingenuity and benefits humanity. @OpenAI












