Bowen Baker (@bobabowen) / X

Bowen Baker

59 posts

Bowen Baker

@bobabowen

Research Scientist at @openai since 2017 Robotics, Multi-Agent Reinforcement Learning, LM Reasoning, and now Alignment.

Joined January 2017

Pinned
Bowen Baker
@bobabowen
Dec 18, 2025
To preserve or improve chain-of-thought (CoT) monitorability, we have to be able to measure it. I'm excited to announce our new research on this at OpenAI
13K
Bowen Baker
@bobabowen
Jul 15, 2025
Modern reasoning models think in plain English. Monitoring their thoughts could be a powerful, yet fragile, tool for overseeing future AI systems. I and researchers across many organizations think we should work to evaluate, preserve, and even improve CoT monitorability.
756K
Bowen Baker
@bobabowen
Mar 10, 2025
Excited to share what my team has been working on at OpenAI!
OpenAI
@OpenAI
Mar 10, 2025
Detecting misbehavior in frontier reasoning models Chain-of-thought (CoT) reasoning models “think” in natural language understandable by humans. Monitoring their “thinking” has allowed us to detect misbehavior such as subverting tests in coding tasks, deceiving users, or giving
28K
Bowen Baker
@bobabowen
Jul 5, 2023
Excited to be working on OpenAI's new superalignment team to help solve superintelligence alignment!
OpenAI
@OpenAI
Jul 5, 2023
We need new technical breakthroughs to steer and control AI systems much smarter than us. Our new Superalignment team aims to solve this problem within 4 years, and we’re dedicating 20% of the compute we've secured to date towards this problem. Join us! openai.com/blog/introduci…
30K
Bowen Baker
@bobabowen
Jul 15, 2025
Replying to @bobabowen
CoT monitoring is already useful! In a recent OpenAI blog and paper, we showed that we could catch reward hacks in code via CoT monitoring. Others have found they can catch early signals of misalignment, prompt injections, and evaluation awareness.
Detecting misbehavior in frontier reasoning models
From openai.com
9.7K
Bowen Baker
@bobabowen
Jul 15, 2025
Replying to @bobabowen
Read the paper here bit.ly/cot-monitorabi…
8.4K
Bowen Baker
@bobabowen
Jul 15, 2025
Replying to @bobabowen
For hard enough tasks, models may have to reason out loud and be monitorable. If actions that cause severe harm require complex reasoning, then this gives hope we could catch and stop them with CoT monitoring.
5.2K
Bowen Baker
@bobabowen
Jul 15, 2025
Replying to @bobabowen
Like any safety layer, CoT monitoring has limitations. - If CoT reasoning isn’t required, relying on a model’s propensity to reason out loud may not be enough for a strong safety case. - More advanced, situationally aware models may be able to deliberately obfuscate their CoT
3.5K
Bowen Baker
@bobabowen
Jul 15, 2025
Replying to @bobabowen
I am grateful to have worked closely with @tomekkorbak, @balesni, @rohinmshah and Vlad Mikulik on this paper, and I am very excited that researchers across many prominent AI institutions collaborated with us and came to consensus around this important direction.
3.5K
Bowen Baker
@bobabowen
Mar 11, 2025
One direction I'm excited to see more work on in the future is CoT monitoring as a potential scalable oversight method. In our work, we found that we could monitor a strong reasoning model (same class as o1 or o3-mini) with a weaker model (gpt-4o).
2.1K
Bowen Baker
@bobabowen
Jul 15, 2025
Replying to @bobabowen
Furthermore, the existing CoT monitorability may be extremely fragile. Higher-compute RL, alternative model architectures, certain forms of process supervision, etc. may all lead to models that obfuscate their thinking.
3.1K
Bowen Baker
@bobabowen
Jul 15, 2025
Replying to @bobabowen
We argue that researchers should study - How to evaluate monitorability - When CoT monitoring can be relied on as a load-bearing safety measure - How different pieces of the training stack effect monitorability - How to construct better monitors
2.9K
Bowen Baker
@bobabowen
Apr 17, 2025
eep
Transluce
@TransluceAI
Apr 16, 2025
We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted. We were surprised, so we dug deeper 🔎🧵(1/) x.com/OpenAI/status/…
2K
Bowen Baker
@bobabowen
Nov 20, 2023
❤️
Ilya Sutskever
@ilyasut
Nov 20, 2023
I deeply regret my participation in the board's actions. I never intended to harm OpenAI. I love everything we've built together and I will do everything I can to reunite the company.
3.3K