user avatar
Mor Geva
@megamor2
Assistant Professor at @TelAvivUni and Research Scientist at @Irregular; previously at @GoogleResearch, @GoogleDeepMind and @allen_ai
Joined April 2017
  • user avatar
    LMs capture many factual associations, but how do they recall them internally during inference? In a new preprint, we find that LMs build attribute-rich subject representations, from which attention heads extract the predicted attribute. @jasmijnbastings @fajtak @amirgloberson 🧵
  • user avatar
    What if I told you that you can *easily* control the behavior of GPT and change it in particular directions of your choice, with only a few simple and intuitive steps? Meet ✨LM-Debugger✨, an open-source interactive tool for inspection and intervention in transformer LMs 👇1/8
  • user avatar
    New preprint!📣 How do transformer LMs construct predictions? We tackle this question by reverse-engineering the FFN layers in LMs and the mechanism in which they update the prediction across layers.🧵 (1/6) @clu_avi, @k3vwang, @yoavgo
  • user avatar
    "Transformer Feed-Forward Layers Are Key-Value Memories" Check out our new preprint where we analyze the role of FF layers in transformer models. arxiv.org/abs/2012.14913 With @RoeiSchuster @JonathanBerant @omerlevy_ 1/3
  • user avatar
    What's in an attention head? 🤯 We present an efficient framework – MAPS – for inferring the functionality of attention heads in LLMs ✨directly from their parameters✨ A new preprint with @AmitElhelo 🧵 (1/10)
  • user avatar
    ✨MLP layers have just become more interpretable than ever ✨ In a new paper: * We show a simple method for decomposing MLP activations into interpretable features * Our method uncovers hidden concept hierarchies, where sparse neuron combinations form increasingly abstract ideas
  • user avatar
    We present StrategyQA, a question answering benchmark with *implicit* reasoning strategies, accepted to TACL, 2021. Dataset --> allenai.org/data/strategyqa Paper --> arxiv.org/abs/2101.02235 With @DanielKhashabi @eladsegal @tusharkhot @dannydanr @JonathanBerant
  • user avatar
    New preprint! We show that training transformer models with multiple output heads leads to non-trivial interactions between the heads and emergent head behaviour that generalizes beyond the task the head was trained for. arxiv.org/abs/2104.06129 @UrikaUri Aviv BA @JonathanBerant
  • user avatar
    Do you have a "tell" when you are about to lie? We find that LLMs have “tells” in their internal representations which allow estimating how knowledgeable a model is about an entity 𝘣𝘦𝘧𝘰𝘳𝘦 it generates even a single token. Paper: arxiv.org/abs/2406.12673… 🧵 @dhgottesman
  • user avatar
    Numerical reasoning skills are difficult to learn from a LM objective. In our new paper, we show how to inject the skills into pre-trained LMs, such that numerical computations are performed internally by the model. arxiv.org/abs/2004.04487 @ankgup2 @JonathanBerant
  • user avatar
    How can we interpret LLM features at scale? 🤔 Current pipelines use activating inputs, which is costly and ignores how features causally affect model outputs! We propose efficient output-centric methods that better predict how steering a feature will affect model outputs. New
    GIF
  • user avatar
    Removing certain knowledge from LLMs is hard. Our lab has been tackling this problem at the level of model parameters. Excited to have two papers on this topic accepted at #EMNLP2025 main conf: ⭐️Precise In-Parameter Concept Erasure in Large Language Models
    🚀The first-ever parametric LLM Unlearning Benchmark! We find current unlearning only modify model’s behavior without truly erasing encoded knowledge in parameters, presenting ConceptVectors Benchmark, with each vector strongly tied to a specific concept.🔗yihuaihong.github.io/ConceptVectors…
  • user avatar
    Replying to @goodside
    Transformers have an inherent limitation in solving this task: arxiv.org/abs/2407.15160
  • user avatar
    Check out BREAK - a new NLU benchmark for testing the ability of models to break down a question into the required steps for computing its answer. allenai.github.io/Break/ A work by Tomer Wolfson, accepted to TACL 2020. @JonathanBerant @yoavgo @ankgup2 @nlpmattg