Andre Saraiva (@andresnds) / X

Andre Saraiva

133 posts

Andre Saraiva

@andresnds

o1-preview, o1-mini, o1, o3-mini,o4-mini, o3... Reasoning Researcher at OpenAI. Ex-DeepMind.

San Francisco

Joined December 2010

Andre Saraiva
@andresnds
Jul 17, 2025
1/N Yesterday in Tokyo we @OpenAI ran a 10‑hour live Humans vs AI exhibition at the AtCoder World Tour Finals Heuristic. We pointed an OpenAI reasoning model at the same brutal problem the finalists tackled—no human help, same rules, same clock. Buckle up. 👇
757K
Andre Saraiva
@andresnds
Jul 17, 2025
Replying to @andresnds
15/N We ran the model fully autonomously for the 10h window—no human intervention; same submission/data/tools/time budget as everyone else. Watching it iterate live beside elite humans was electric.
135K
Andre Saraiva
@andresnds
Jul 17, 2025
Replying to @andresnds @bminaiev and 9 others
@_lorenzkuhn and me in-person watching the model compete
15K
Andre Saraiva
@andresnds
Jul 17, 2025
Replying to @andresnds
9/N Side note: @FakePsyho came in running on ~1h sleep (!!). Dude still shipped. Massive respect. What a legend clip (~t=7h20m):
27K
Andre Saraiva
@andresnds
Jul 17, 2025
Replying to @andresnds
12/N Final minutes: Psyho ships again and opens what looks like a winning gap. Model nudges scores but can’t close. Humans > AI (provisional). Provisional Exhibition Standings: atcoder.jp/contests/awtf2…
15K
Andre Saraiva
@andresnds
Jul 17, 2025
Replying to @andresnds
2/N The task (“Group Commands & Wall Planning”): 30×30 grid, K robots start→goal. Before moving you can add walls + assign robot groups, then issue group/solo moves to steer everyone home w/ fewest ops (score = ops + distance penalty). Problem:
atcoder.jp
A - Group Commands and Wall Planning
AtCoder is a programming contest site for anyone from beginners to experts. We hold weekly programming contests online.
25K
Andre Saraiva
@andresnds
May 16, 2025
I feel especially proud to have helped train some of the steps of these models. Watching coding agents move from research to genuinely useful real-world scenarios has been amazing. I’ve been using Codex to try out research ideas faster and answer questions about the codebase.
OpenAI
@OpenAI
May 16, 2025
We’re launching a research preview of Codex: a cloud-based software engineering agent that can work on many tasks in parallel. Rolling out to Pro, Enterprise, and Team users in ChatGPT starting today. chatgpt.com/codex
12K
Andre Saraiva
@andresnds
Jul 17, 2025
Replying to @andresnds
3/N Hours 0‑3: Model does what models do—slam a fast baseline. It ignores new walls, instead routing + grouping through the initial maze, and rockets to #1. Scoreboard looks great… but seasoned AtCoder folks know early greedy leads rarely survive human midgame builds.
22K
Andre Saraiva
@andresnds
Aug 11, 2025
Just 2 years ago this line of work could barely pass easy CF tasks. Today our models reach IOI gold level on the official AI track, first among AI entrants. Long project with @ahelkky @_lorenzkuhn @alexwei_ @bminaiev @oleg_murk @MostafaRohani @clavera_i. See @SherylHsu02 thread.
Sheryl Hsu
@SherylHsu02
Aug 11, 2025
1/n I’m thrilled to share that our @OpenAI reasoning system scored high enough to achieve gold 🥇🥇 in one of the world’s top programming competitions - the 2025 International Olympiad in Informatics (IOI) - placing first among AI participants! 👨‍💻👨‍💻
12K
Andre Saraiva
@andresnds
Jul 17, 2025
Replying to @andresnds
13/N Mini timeline: 0h → Model #1 (no walls) 3h → Model walls; chat loses it 3‑7h → Model leads; humans close ~7h → Psyho passes, model #2 ~8h → Model comeback → #1 8‑9h → Model leads Final hr → Psyho surge; humans win (probably)
14K
Andre Saraiva
@andresnds
Jul 17, 2025
Replying to @andresnds
7/N See @asi1024 beautiful maze here:
あしぃ
@asi1024
Jul 16, 2025
GIF
23K
Andre Saraiva
@andresnds
Jul 17, 2025
Replying to @andresnds
5/N ~Hour 3 (~12:08 JST): WALLS!!! The model starts placing them and chat 🤯: hayashi: “Wall by OpenAI!” 北杜: “Seriously?!” tk bd: “That thing we’ve been waiting for is here.” 楽しんで学ぶ: “The tension…” よには/yoniha: "Now it might actually happen." cocoa milky: "Speechless
19K
Andre Saraiva
@andresnds
Jul 17, 2025
Replying to @andresnds
10/N ~Hour 8: Plot twist. Model finds new ideas (better walling, better resource scheduling) → jumps back to #1 and holds into ~Hour 9. This is going down to the wire!!
15K
Andre Saraiva
@andresnds
Jul 17, 2025
Replying to @andresnds
18/N At the end of the day this isn’t humans versus AI as a zero‑sum sport—it’s research. Every round like this helps us measure where our models are in our quest build AI that amplify human ingenuity and benefits humanity. @OpenAI
12K