Mor Geva (@megamor2) / X

Mor Geva

613 posts

Mor Geva

@megamor2

Assistant Professor at @TelAvivUni and Research Scientist at @Irregular; previously at @GoogleResearch, @GoogleDeepMind and @allen_ai

Joined April 2017

Mor Geva
@megamor2
May 1, 2023
LMs capture many factual associations, but how do they recall them internally during inference? In a new preprint, we find that LMs build attribute-rich subject representations, from which attention heads extract the predicted attribute. @jasmijnbastings @fajtak @amirgloberson 🧵
61K
Mor Geva
@megamor2
Apr 27, 2022
What if I told you that you can *easily* control the behavior of GPT and change it in particular directions of your choice, with only a few simple and intuitive steps? Meet ✨LM-Debugger✨, an open-source interactive tool for inspection and intervention in transformer LMs 👇1/8
Mor Geva
@megamor2
Mar 29, 2022
New preprint!📣 How do transformer LMs construct predictions? We tackle this question by reverse-engineering the FFN layers in LMs and the mechanism in which they update the prediction across layers.🧵 (1/6) @clu_avi, @k3vwang, @yoavgo
Mor Geva
@megamor2
Jan 1, 2021
"Transformer Feed-Forward Layers Are Key-Value Memories" Check out our new preprint where we analyze the role of FF layers in transformer models. arxiv.org/abs/2012.14913 With @RoeiSchuster @JonathanBerant @omerlevy_ 1/3
Mor Geva
@megamor2
Dec 18, 2024
What's in an attention head? 🤯 We present an efficient framework – MAPS – for inferring the functionality of attention heads in LLMs ✨directly from their parameters✨ A new preprint with @AmitElhelo 🧵 (1/10)
26K
Mor Geva
@megamor2
Jun 13, 2025
✨MLP layers have just become more interpretable than ever ✨ In a new paper: * We show a simple method for decomposing MLP activations into interpretable features * Our method uncovers hidden concept hierarchies, where sparse neuron combinations form increasingly abstract ideas
63K
Mor Geva
@megamor2
Jan 8, 2021
We present StrategyQA, a question answering benchmark with *implicit* reasoning strategies, accepted to TACL, 2021. Dataset --> allenai.org/data/strategyqa Paper --> arxiv.org/abs/2101.02235 With @DanielKhashabi @eladsegal @tusharkhot @dannydanr @JonathanBerant
Mor Geva
@megamor2
Apr 14, 2021
New preprint! We show that training transformer models with multiple output heads leads to non-trivial interactions between the heads and emergent head behaviour that generalizes beyond the task the head was trained for. arxiv.org/abs/2104.06129 @UrikaUri Aviv BA @JonathanBerant
Mor Geva
@megamor2
Jun 19, 2024
Do you have a "tell" when you are about to lie? We find that LLMs have “tells” in their internal representations which allow estimating how knowledgeable a model is about an entity 𝘣𝘦𝘧𝘰𝘳𝘦 it generates even a single token. Paper: arxiv.org/abs/2406.12673… 🧵 @dhgottesman
7.3K
Mor Geva
@megamor2
Apr 10, 2020
Numerical reasoning skills are difficult to learn from a LM objective. In our new paper, we show how to inject the skills into pre-trained LMs, such that numerical computations are performed internally by the model. arxiv.org/abs/2004.04487 @ankgup2 @JonathanBerant
arxiv.org
Injecting Numerical Reasoning Skills into Language Models
Large pre-trained language models (LMs) are known to encode substantial amounts of linguistic information. However, high-level reasoning skills, such as numerical reasoning, are difficult to learn...
Mor Geva
@megamor2
Jan 15, 2025
How can we interpret LLM features at scale? 🤔 Current pipelines use activating inputs, which is costly and ignores how features causally affect model outputs! We propose efficient output-centric methods that better predict how steering a feature will affect model outputs. New
GIF
7.4K
Mor Geva
@megamor2
Aug 21, 2025
Removing certain knowledge from LLMs is hard. Our lab has been tackling this problem at the level of model parameters. Excited to have two papers on this topic accepted at #EMNLP2025 main conf: ⭐️Precise In-Parameter Concept Erasure in Large Language Models
Yihuai Hong
@YihuaiH91773
Jun 20, 2024
🚀The first-ever parametric LLM Unlearning Benchmark! We find current unlearning only modify model’s behavior without truly erasing encoded knowledge in parameters, presenting ConceptVectors Benchmark, with each vector strongly tied to a specific concept.🔗yihuaihong.github.io/ConceptVectors…
7K
Mor Geva
@megamor2
Sep 2, 2024
Replying to @goodside
Transformers have an inherent limitation in solving this task: arxiv.org/abs/2407.15160
110K
Mor Geva
@megamor2
Feb 3, 2020
Check out BREAK - a new NLU benchmark for testing the ability of models to break down a question into the required steps for computing its answer. allenai.github.io/Break/ A work by Tomer Wolfson, accepted to TACL 2020. @JonathanBerant @yoavgo @ankgup2 @nlpmattg
allenai.github.io
A Question Understanding Benchmark
A Question Understanding Benchmark