Today we’re introducing GDPval, a new evaluation that measures AI on real-world, economically valuable tasks.
Evals ground progress in evidence instead of speculation and help track how AI improves at the kind of work that matters most.
openai.com/index/gdpval-v0
Introducing Claude Sonnet 4.5—the best coding model in the world.
It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.
we've all faced that mountain of paperwork and just wanted to throw in the towel.
turns out claude is a pretty great cool-headed tool for thought when you need to dig in and stand your ground:
[I work at Stripe] I got very worried after seeing your tweet and escalated this—we don't make decisions like these lightly and I wanted to make sure we hadn't made a mistake. Unfortunately, the business has a chargeback rate far in excess of what's permissible. Card networks