Daniel Murfet (@danielmurfet) / X

Daniel Murfet

3,920 posts

Daniel Murfet

@danielmurfet

Mathematician. Cofounder of Sequent Research. Formerly Timaeus, University of Melbourne. Purveyor of pelagic metaphors.

Melbourne, Victoria

Joined June 2012

Daniel Murfet
@danielmurfet
Aug 5, 2025
Neural networks are grown, not programmed. What does that growth process look like? Like this! This is a small language model (3M) across training, visualised with a new interpretability technique: susceptibilities. We call this handsome critter the rainbow serpent.
126K
Daniel Murfet
@danielmurfet
May 25, 2025
A few months ago I resigned from my tenured position at the University of Melbourne and joined Timaeus as Director of Research. Timaeus is an AI safety non-profit research organisation. [1/n]🧵
37K
Daniel Murfet
@danielmurfet
Oct 14, 2023
Elhage et al at @AnthropicAI wrote an interesting paper in 2022 on "superposition" transformer-circuits.pub/2022/toy_model… the tendency of neurons in artificial neural networks to represent many independent features. They noted interesting geometry in the representations and [1/n]
22K
Daniel Murfet
@danielmurfet
Oct 22, 2023
Timaeus is a new research organization, dedicated to making fundamental breakthroughs in technical AI alignment using deep ideas from mathematics and the sciences. Led by @jesse_hoogland @FellowHominid Stan van Wingerden and myself. lesswrong.com/posts/nN7bHuHZ… [1/n]
25K
Daniel Murfet
@danielmurfet
Aug 5, 2025
Replying to @danielmurfet
The rainbow is made of tokens. Each dot is a token y in context x, coloured by pattern, represented in a 16-dimensional space by its vector of susceptibilities (one per attn head), and projected using UMAP. The baby serpent is a mess, but the mature serpent is handsome. Why?
7.3K
Daniel Murfet
@danielmurfet
Oct 9, 2024
I think there is an aspect of the recent Nobel Prize for Chemistry, awarded in part to @demishassabis and John Jumper, that might be underrated. It is of course natural to focus on the way in which AI was involved. However note that Hassabis and Jumper are *not in academia*. 🧵
8.1K
Daniel Murfet
@danielmurfet
Aug 5, 2025
Replying to @danielmurfet
This charming fellow is, however, too small to be really interesting. In larger models we see more complex structures, stay tuned! To read more: arxiv.org/abs/2508.00331 joint with @georgeyw_ @Gman5938 and Andy Gordon
arxiv.org
Embryology of a Language Model
Understanding how language models develop their internal computational structure is a central problem in the science of deep learning. While susceptibilities, drawn from statistical physics, offer...
5.8K
Daniel Murfet
@danielmurfet
Jul 13, 2023
We wrote an outline of the research agenda we're pursuing on technical AI alignment, based on Singular Learning Theory (lesswrong.com/posts/TjaeCWvL…). Interesting math, and an important problem.
lesswrong.com
Towards Developmental Interpretability — LessWrong
Developmental interpretability is a research agenda that has grown out of a meeting of the Singular Learning Theory (SLT) and AI alignment communitie…
5.9K
Daniel Murfet
@danielmurfet
Aug 12, 2025
Mom: we have rainbow serpent at home. Rainbow serpent at home: rainbowserpent.dev We recently introduced an approach to interpretability for language models based on susceptibility UMAPs, and it's now available in a webapp for you to try (with some Pythia models too!)
5.4K
Daniel Murfet
@danielmurfet
Aug 5, 2025
Replying to @danielmurfet
Compared to math, experiments may gray the hair, but the eye candy is beyond compare: we nearly fell out of our chairs when the first UMAP plots of the rainbow serpent showed up. What’s kind of wild is that four training seeds look so similar (spot the difference).
3K
Daniel Murfet
@danielmurfet
May 25, 2025
Replying to @danielmurfet
All that is to say, I think many academics should consider spending part of their time contributing to AI safety. This is a hard, urgent and deep problem, few people are working on it, and your other intellectual activities might soon be relatively pointless (sorry!) [10/n]
2.4K
Daniel Murfet
@danielmurfet
Aug 5, 2025
Replying to @danielmurfet
We know that the induction circuit develops over this period, and also that the change in susceptibility vectors is due to the heads in the induction circuit (arxiv.org/abs/2504.18274). That is, the pictures genuinely show visually the emergence of the induction circuit.
arxiv.org
Structural Inference: Interpreting Small Language Models with...
We develop a linear response framework for interpretability that treats a neural network as a Bayesian statistical mechanical system. A small perturbation of the data distribution, for example...
3.3K
Daniel Murfet
@danielmurfet
Jun 25, 2023
We just concluded a week-long Primer for Singular Learning Theory, with lectures also on physics, mechanistic interpretability and AI alignment and a keynote by Sumio Watanabe himself (who has his own clear views on the need for mathematics in AI safety) youtube.com/@SLTSummit.
6.7K
Daniel Murfet
@danielmurfet
May 25, 2025
Replying to @danielmurfet
I've put my time where my mouth is, and am now working on this problem with fantastic colleagues at Timaeus. If you're an academic and curious about how you could contribute, feel free to reach out (by email, or find me on Discord). [n/n]
1.5K