Neural networks are grown, not programmed. What does that growth process look like? Like this!
This is a small language model (3M) across training, visualised with a new interpretability technique: susceptibilities. We call this handsome critter the rainbow serpent.
Daniel Murfet
3,920 posts
Mathematician. Cofounder of Sequent Research. Formerly Timaeus, University of Melbourne. Purveyor of pelagic metaphors.
- A few months ago I resigned from my tenured position at the University of Melbourne and joined Timaeus as Director of Research. Timaeus is an AI safety non-profit research organisation. [1/n]🧵
- Elhage et al at @AnthropicAI wrote an interesting paper in 2022 on "superposition" transformer-circuits.pub/2022/toy_model… the tendency of neurons in artificial neural networks to represent many independent features. They noted interesting geometry in the representations and [1/n]
- Timaeus is a new research organization, dedicated to making fundamental breakthroughs in technical AI alignment using deep ideas from mathematics and the sciences. Led by @jesse_hoogland @FellowHominid Stan van Wingerden and myself. lesswrong.com/posts/nN7bHuHZ… [1/n]
- Replying to @danielmurfetThe rainbow is made of tokens. Each dot is a token y in context x, coloured by pattern, represented in a 16-dimensional space by its vector of susceptibilities (one per attn head), and projected using UMAP. The baby serpent is a mess, but the mature serpent is handsome. Why?
- I think there is an aspect of the recent Nobel Prize for Chemistry, awarded in part to @demishassabis and John Jumper, that might be underrated. It is of course natural to focus on the way in which AI was involved. However note that Hassabis and Jumper are *not in academia*. 🧵
- Replying to @danielmurfetThis charming fellow is, however, too small to be really interesting. In larger models we see more complex structures, stay tuned! To read more: arxiv.org/abs/2508.00331 joint with @georgeyw_ @Gman5938 and Andy Gordon
- We wrote an outline of the research agenda we're pursuing on technical AI alignment, based on Singular Learning Theory (lesswrong.com/posts/TjaeCWvL…). Interesting math, and an important problem.
- Mom: we have rainbow serpent at home. Rainbow serpent at home: rainbowserpent.dev We recently introduced an approach to interpretability for language models based on susceptibility UMAPs, and it's now available in a webapp for you to try (with some Pythia models too!)
- Replying to @danielmurfetCompared to math, experiments may gray the hair, but the eye candy is beyond compare: we nearly fell out of our chairs when the first UMAP plots of the rainbow serpent showed up. What’s kind of wild is that four training seeds look so similar (spot the difference).
- Replying to @danielmurfetAll that is to say, I think many academics should consider spending part of their time contributing to AI safety. This is a hard, urgent and deep problem, few people are working on it, and your other intellectual activities might soon be relatively pointless (sorry!) [10/n]
- Replying to @danielmurfetWe know that the induction circuit develops over this period, and also that the change in susceptibility vectors is due to the heads in the induction circuit (arxiv.org/abs/2504.18274). That is, the pictures genuinely show visually the emergence of the induction circuit.
- We just concluded a week-long Primer for Singular Learning Theory, with lectures also on physics, mechanistic interpretability and AI alignment and a keynote by Sumio Watanabe himself (who has his own clear views on the need for mathematics in AI safety) youtube.com/@SLTSummit.
- Replying to @danielmurfetI've put my time where my mouth is, and am now working on this problem with fantastic colleagues at Timaeus. If you're an academic and curious about how you could contribute, feel free to reach out (by email, or find me on Discord). [n/n]







