
i build AI evaluation systems for a living. most people want to talk about what AI can do. i spend my days figuring out how to tell if it actually did it.
turns out that's the harder problem.
this is where i think out loud. patterns i notice at work that seem to apply elsewhere. connections between fields that probably shouldn't exist but do. the stuff that's left over after you've read all the obvious takes.
i keep coming back to a few things:
why capable systems fail in the real world. why the metrics we optimize rarely match what we want. why biology keeps arriving at solutions we thought we invented. why the most consequential decisions happen when nobody's paying attention.
i don't write on a schedule. most ideas don't survive a week of thinking. the ones that do end up here.
— dp