LMs capture many factual associations, but how do they recall them internally during inference?
In a new preprint, we find that LMs build attribute-rich subject representations, from which attention heads extract the predicted attribute.
@jasmijnbastings @fajtak @amirgloberson 🧵
Mor Geva
613 posts
Assistant Professor at @TelAvivUni and Research Scientist at @Irregular; previously at @GoogleResearch, @GoogleDeepMind and @allen_ai
Joined April 2017
- What if I told you that you can *easily* control the behavior of GPT and change it in particular directions of your choice, with only a few simple and intuitive steps? Meet ✨LM-Debugger✨, an open-source interactive tool for inspection and intervention in transformer LMs 👇1/8
- "Transformer Feed-Forward Layers Are Key-Value Memories" Check out our new preprint where we analyze the role of FF layers in transformer models. arxiv.org/abs/2012.14913 With @RoeiSchuster @JonathanBerant @omerlevy_ 1/3
- What's in an attention head? 🤯 We present an efficient framework – MAPS – for inferring the functionality of attention heads in LLMs ✨directly from their parameters✨ A new preprint with @AmitElhelo 🧵 (1/10)
- ✨MLP layers have just become more interpretable than ever ✨ In a new paper: * We show a simple method for decomposing MLP activations into interpretable features * Our method uncovers hidden concept hierarchies, where sparse neuron combinations form increasingly abstract ideas
- We present StrategyQA, a question answering benchmark with *implicit* reasoning strategies, accepted to TACL, 2021. Dataset --> allenai.org/data/strategyqa Paper --> arxiv.org/abs/2101.02235 With @DanielKhashabi @eladsegal @tusharkhot @dannydanr @JonathanBerant
- New preprint! We show that training transformer models with multiple output heads leads to non-trivial interactions between the heads and emergent head behaviour that generalizes beyond the task the head was trained for. arxiv.org/abs/2104.06129 @UrikaUri Aviv BA @JonathanBerant
- Do you have a "tell" when you are about to lie? We find that LLMs have “tells” in their internal representations which allow estimating how knowledgeable a model is about an entity 𝘣𝘦𝘧𝘰𝘳𝘦 it generates even a single token. Paper: arxiv.org/abs/2406.12673… 🧵 @dhgottesman
- Numerical reasoning skills are difficult to learn from a LM objective. In our new paper, we show how to inject the skills into pre-trained LMs, such that numerical computations are performed internally by the model. arxiv.org/abs/2004.04487 @ankgup2 @JonathanBerant
- How can we interpret LLM features at scale? 🤔 Current pipelines use activating inputs, which is costly and ignores how features causally affect model outputs! We propose efficient output-centric methods that better predict how steering a feature will affect model outputs. New
GIF - Removing certain knowledge from LLMs is hard. Our lab has been tackling this problem at the level of model parameters. Excited to have two papers on this topic accepted at #EMNLP2025 main conf: ⭐️Precise In-Parameter Concept Erasure in Large Language Models🚀The first-ever parametric LLM Unlearning Benchmark! We find current unlearning only modify model’s behavior without truly erasing encoded knowledge in parameters, presenting ConceptVectors Benchmark, with each vector strongly tied to a specific concept.🔗yihuaihong.github.io/ConceptVectors…
- Replying to @goodsideTransformers have an inherent limitation in solving this task: arxiv.org/abs/2407.15160
- Check out BREAK - a new NLU benchmark for testing the ability of models to break down a question into the required steps for computing its answer. allenai.github.io/Break/ A work by Tomer Wolfson, accepted to TACL 2020. @JonathanBerant @yoavgo @ankgup2 @nlpmattg












