My first paper @AnthropicAI is out!
We show that Chains-of-Thought often don’t reflect models’ true reasoning—posing challenges for safety monitoring.
It’s been an incredible 6 months pushing the frontier toward safe AGI with brilliant colleagues. Huge thanks to the team! 🙏
New Anthropic research: Do reasoning models accurately verbalize their reasoning?
Our new paper shows they don't.
This casts doubt on whether monitoring chains-of-thought (CoT) will be enough to reliably catch safety issues.











