user avatar
Sam Bowman
@sleepinyourhat
AI alignment + LLMs at Anthropic. On leave from NYU. Views not employers'. No relation to @s8mb. Into @givingwhatwecan.
San Francisco
Joined July 2011
Posts
  • user avatar
    As a specialist in evaluating language models, I declare that this is the best way of evaluating language models:
  • user avatar
    PhD admissions season is ramping up, so I feel obliged to join the chorus of voices reminding everyone that doing a PhD is, in most cases, a terrible idea.
  • user avatar
    🧵✨🙏 With the new Claude Opus 4, we conducted what I think is by far the most thorough pre-launch alignment assessment to date, aimed at understanding its values, goals, and propensities. Preparing it was a wild ride. Here’s some of what we learned. 🙏✨🧵
  • user avatar
    I just got tenure! Wheee! Predictable-but-heartfelt gratitude thread:
  • user avatar
    I’m sharing a draft of a slightly-opinionated survey paper I’ve been working on for the last couple of months. It's meant for a broad audience—not just LLM researchers. (🧵)
    A paper header for "Eight things to know about large language models" by Sam Bowman.
  • user avatar
    I’m starting an AI safety research group at NYU. Why? (🧵)
  • user avatar
    I deleted the earlier tweet on whistleblowing as it was being pulled out of context. TBC: This isn't a new Claude feature and it's not possible in normal usage. It shows up in testing environments where we give it unusually free access to tools and very unusual instructions.
  • user avatar
    I'm hiring research engineers for several alignment/technical safety teams at Anthropic!
    A tech office scene with large plants and a sofa in the foreground.
  • user avatar
    Early this summer, OpenAI and Anthropic agreed to try some of our best existing tests for misalignment on each others’ models. After discussing our results privately, we’re now sharing them with the world. 🧵
    A screenshot of the title and first few lines of the mentioned blog post.
  • user avatar
    AI/ML faculty: A student of mine did an internship at Google, and got the resulting paper accepted to a top conference. The host team isn't willing to pay for conference registration, so I'll have to pay or else the paper won't be published, going against the norm here. Advice?
  • user avatar
    Everybody, please stop publishing interesting research. I'm trying to have a sabbatical.
  • user avatar
    You'll sometimes see the meme that NLP is solved. That's hype, and it's doing harm in the real world. But it's worth thinking about what it'd look like to actually achieve what we're aiming for. (📄 paper, thread 🧵) cims.nyu.edu/~sbowman/bowma…
    Paper header. Title: When combating hype, proceed with caution
  • user avatar
    I'll likely admit a couple new PhD students this year. If you're interested in NLP and you have experience either in crowdsourcing/human feedback for ML or in AI truthfulness/alignment/safety, consider @NYUDataScience!
  • user avatar
    A big part of my job these days is to think about what technical work Anthropic needs to do to make things go well with the development of very powerful AI. I digested my thinking on this, plus some of the Anthropic zeitgeist around it, into this piece: sleepinyourhat.github.io/checklist/
    A beautiful sunset over calm seas, but with waves, rocks, and a rusty concrete staircase in the foreground. 

Credit: Justin Kern via flickr at https://www.flickr.com/photos/justinwkern/5972229589