My girlfriend returned from Taiwan with the most romantic gift: an TSMC exclusive notebook!
Turns out this notebook is so limited edition that it's only available to TSMC employees, but she found a second-hand seller, and gave up her afternoon to go meet them. I feel so loved ❤️
Neel Nanda
5,287 posts
Mechanistic Interpretability lead DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!
- After supervising 20+ papers, I have highly opinionated views on writing great ML papers. When I entered the field I found this all frustratingly opaque So I wrote a guide on turning research into high-quality papers with scientific integrity! Hopefully still useful for NeurIPS
- I'm honoured to have made the MIT Tech Review Innovators Under 35 List for mechanistic interpretability research and work to build the field I think technical work to deliberately build a research field is underrated and leveraged. It's great to see how far mech interp has come!
- I'm excited that, this year, interpretability finally works well enough to be practically useful in the real world! We found that, with enough effort into dataset construction, simple linear probes are cheap, real-time, token level hallucination detectors and beat baselinesImagine if ChatGPT highlighted every word it wasn't sure about. We built a streaming hallucination detector that flags hallucinations in real-time.
00:00 - I've spent the past few months exploring @OpenAI's grokking result through the lens of mechanistic interpretability. I fully reverse engineered the modular addition model, and looked at what it does when training. So what's up with grokking? A 🧵... (1/17) alignmentforum.org/posts/N6WM6hs7…
- I know I've really made it as a researcher when Claude unexpectedly says this: (Context: It was helping copy edit a PhD letter of recommendation for one of my mentees)
- The LessWrong policy against LLM spam has an incredible escape clause for AI agents that want to whistleblow - I love it!
- Extremely slimy behaviour from OpenAI. If I worked for OpenAI I'd be pretty embarrassed about my employer right now If you want the world to trust you to make super intelligence, you need to hold yourself to *far* higher standardsOne Tuesday night, as my wife and I sat down for dinner, a sheriff’s deputy knocked on the door to serve me a subpoena from OpenAI. I held back on talking about it because I didn't want to distract from SB 53, but Newsom just signed the bill so... here's what happened: 🧵
- My first @GoogleDeepMind project: How do LLMs recall facts? Early MLP layers act as a lookup table, with significant superposition! They recognise entities and produce their attributes as directions. We suggest viewing fact recall as a black box making "multi-token embeddings”
- Speaking as a past IMO contestant, this is impressive but misleading - gold vs silver is meaningless, 1 pt below gold vs borderline gold is noise The impressive bit is using a general reasoning model, not a specialised system, and no verified reward. Peak AI maths is unchanged1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
- I find the replies to this tweet wild and sad. Isn't it pretty obvious by now that the old OpenAI board was right? Healthy companies, with good CEOs, do not threaten their employee's compensation, have a long stream of executives quitting and so many scandals.Hi Marc 👋 Seems like you've joined the confusingly large club of people who have strong opinions about me & what I think, despite having ~no idea what I actually think. Happy to talk sometime if you want to fix that, otherwise, maybe pick a different villain for your fanfic?
- Sparse Autoencoders act like a microscope for AI internals. They're a powerful tool for interpretability, but training costs limit research Announcing Gemma Scope: An open suite of SAEs on every layer & sublayer of Gemma 2 2B & 9B! We hope to enable even more ambitious work
GIF - Working for Google certainly has its share of BS, but I've never had anything as bad as an employer threatening to take back years of paid compensation unless I signed a lifetime concealed non disparagement. Not everything is an upgrade.If you work on core Google AI products and are interested in a more fun work environment with a higher talent bar, and most importantly, less bureaucracy and BS, consider joining Anthropic, OpenAI, or xAI! All three are aggressively hiring. I will match you with a recruiter, DM!
- Why do seemingly all the ML conferences not acknowledge the existence of the many ML researchers in industry without PhDs?















