Aaron Mueller (@amuuueller) / X

Aaron Mueller

324 posts

Aaron Mueller

@amuuueller

Asst. Prof. in CS at @BU_Tweets ≡ {Mechanistic, causal} {interpretability, NLP}

Boston

aaronmueller.github.io

Joined September 2015

Aaron Mueller
@amuuueller
Oct 30, 2024
I'm recruiting PhD students for our new lab, coming to Boston University in Fall 2025! Our lab aims to understand, improve, and precisely control how language is learned and used in natural language systems (such as language models). Details below!
63K
Aaron Mueller
@amuuueller
Jan 30, 2023
Announcing the BabyLM 👶 Challenge, the shared task at @conll_conf and CMCL'23! We’re calling on researchers to pre-train language models on (relatively) small datasets inspired by the input given to children learning language. babylm.github.io arxiv.org/abs/2301.11796
47K
Aaron Mueller
@amuuueller
May 24, 2024
⭐️🏛️ Very excited to announce that in fall 2025, I’ll be starting as an Assistant Professor of Computer Science at Boston University @BU_Tweets! Looking forward to joining a wonderful group of colleagues at @BUCompSci!
16K
Aaron Mueller
@amuuueller
Jul 22, 2024
What is cause and effect? What is a “mechanism”? And how do answers to these questions affect interpretability research? 📜 New preprint! 📜 Two key challenges for causal/mechanistic interpretability, and ways forward. To be presented at the mech interp workshop at #ICML2024:
11K
Aaron Mueller
@amuuueller
Mar 21, 2022
We know that pre-trained seq2seq models such as T5 perform well on many downstream NLP tasks. Turns out, pre-training also teaches them about the hierarchical structure of language! 📜arxiv.org/abs/2203.09397 👨🏻‍💻github.com/sebschu/multil… w/ @bob_frank, @tallinzen, Wes Wang, @sebschu
Aaron Mueller
@amuuueller
Jun 28, 2023
Scaling LMs works well. Is more parameters and data all it takes, or do certain architectural features or language styles bring out emergent abilities sooner? Let’s investigate by seeing what it takes for syntax 🌳 to emerge! At ACL! w/ @tallinzen 📜 arxiv.org/abs/2305.19905
18K
Aaron Mueller
@amuuueller
Aug 22, 2024
Thanks Tal! 📜 In this paper, we provide a theoretically grounded review of causal (which, imo, ⊇ mechanistic) interpretability. We argue that this gives a more cohesive narrative of the field, and makes it easier to see actionable open directions for future work! 🧵
Tal Linzen
@tallinzen
Aug 19, 2024
I very much enjoyed this survey of causal interpretability methods for neural networks from @amuuueller and many others - succinct, well organized, just opinionated enough. please write more reviews everyone arxiv.org/abs/2408.01416
12K
Aaron Mueller
@amuuueller
Jun 24, 2021
Language models are good at subject-verb agreement, even across center-embedded structures. Which neurons are responsible for this? Depends on the syntactic structure! arxiv.org/pdf/2106.06087… w/ @mattf1n, me, @sebgehr, @shieber, @tallinzen, & @boknilev. To appear at ACL'21!
Aaron Mueller
@amuuueller
Nov 15, 2023
To those using in-context learning: LLMs behave differently on in-distribution vs. out-of-distribution examples—and chain-of-thought prompting has different effects on them! New preprint w/ @albertwebson, @jowenpetty, @tallinzen 📜 arxiv.org/abs/2311.07811
8.9K
Aaron Mueller
@amuuueller
Dec 19, 2024
What can mechanistic interpretability do for computational psycholinguists? @michaelwhanna and I took a stab at this question! We investigate garden path sentence processing in LMs at the feature (circuit) level.
Michael Hanna
@michaelwhanna
Dec 19, 2024
Sentences are partially understood before they're fully read. How do LMs incrementally interpret their inputs? In a new paper @amuuueller and I use mech interp to study how LMs process structurally ambiguous sentences. We show LMs rely on both syntactic & spurious features! 1/10
5.7K
Aaron Mueller
@amuuueller
Apr 3, 2024
Excited this project is out! Using sparse feature circuits, we can explain and modify how LMs arrive at a behavior. In this thread, I want to highlight open directions where computational linguists can use sparse feature circuits. 🧵
Samuel Marks
@saprmarks
Apr 3, 2024
Can we understand & edit unanticipated mechanisms in LMs? We introduce sparse feature circuits, & use them to explain LM behaviors, discover & fix LM bugs, & build an automated interpretability pipeline! Preprint w/ @can_rager, @ericjmichaud_, @boknilev, @davidbau, @amuuueller
GIF
13K
Aaron Mueller
@amuuueller
Jun 14, 2024
On my way to #NAACL24! 🇲🇽 Friends and folks interested in evaluation, (mechanistic) interpretability, causality, robustness, psycholinguistics, and/or coffee, let’s meet up? (And if you’re interested in doing a PhD in any/all of these topics, I would love to chat!)
3.7K
Aaron Mueller
@amuuueller
Mar 23, 2023
The evaluation pipeline for the BabyLM 👶 Challenge is out! We’re evaluating on BLiMP and a selection of (Super)GLUE tasks. Code 💻:
github.com
GitHub - babylm/evaluation-pipeline-2023: Evaluation pipeline for the BabyLM Challenge 2023.
Evaluation pipeline for the BabyLM Challenge 2023. - babylm/evaluation-pipeline-2023
9.6K
Aaron Mueller
@amuuueller
Dec 6, 2022
I’ll be in Abu Dhabi starting tomorrow for EMNLP! Come see my CoNLL talk about causally probing for syntax in multilingual LMs! I’ll be around to chat about interpretability, multilingual NLP, robustness, and syntax. Get in touch via DMs or email! arxiv.org/abs/2210.14328