Sewon Min (@sewon_

Sewon Min

1,108 posts

Sewon Min

@sewon__min

Assistant professor @Berkeley_EECS @berkeley_ai || Research scientist at @allen_ai || PhD from @uwcse @uwnlp

Seattle, WA

Joined November 2017

Pinned
Sewon Min
@sewon__min
May 8
As MoEs grow larger and sparser, they become memory-bottlenecked. What if experts were actually composable - so you only keep the subset relevant to your task? We show that this doesn't emerge in standard MoEs (their training makes this hard), but you can pre-train MoEs to
Ryan Yixiang Wang
@RyanYixiang
May 8
MoEs are everywhere in frontier models, and they are deployed as a monolith system. But many applications only need a narrow slice of capabilities, e.g., math, code, biomedical, etc. So what if "modularity" is actually the missing opportunity for MoEs? Today, we're releasing
49K
Sewon Min
@sewon__min
Jul 30, 2024
📣 After graduating from @uwcse, I am joining @UCBerkeley as an Assistant Professor (affiliated w @berkeley_ai @BerkeleyNLP) and @allen_ai as a Research Scientist. I'm looking forward to tackling exciting challenges in NLP & generative AI together with new colleagues! 🐻✨
147K
Sewon Min
@sewon__min
Dec 6, 2022
Most if not all language models use a softmax that gives a categorical probability distribution over a finite vocab. We introduce NPM: the first nonparametric masked LM that replaces this softmax with a nonparametric distribution over a text corpus. arxiv.org/abs/2212.01349 (1/4)
GIF
Sewon Min
@sewon__min
Feb 28, 2022
LMs can learn via inference alone through demonstrations -- but how does it work? We find that LMs do not really need correct input-output pairs. Randomly replacing labels in the demonstrations barely hurts performance, consistently over 12 models. arxiv.org/abs/2202.12837
Sewon Min
@sewon__min
Aug 9, 2023
Excited to present SILO, a new nonparametric LM that * excludes copyrighted data from parameters❌ * instead stores it in a datastore and retrieves it at inference time✨ * achieves performance that is close to the model trained on all data🚀 📄arxiv.org/abs/2308.04430
55K
Sewon Min
@sewon__min
Jul 9, 2025
It has been great working on the project with support from @allen_ai! I believe there are many meaningful ways different people and orgs can work together to build strong shared models, and data collaboration might be the most impactful form of it. 📄Paper:
Ai2
@allen_ai
Jul 9, 2025
Introducing FlexOlmo, a new paradigm for language model training that enables the co-development of AI through data collaboration. 🧵
00:00
47K
Sewon Min
@sewon__min
May 11, 2020
I wrote a PyTorch & BART-based code for closed-book QA, following @ada_rob and @colinraffel’s TF & T5-based model (arxiv.org/abs/2002.08910). github.com/shmsw25/bart-c… Code based on @huggingface's Transformers.
arxiv.org
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
It has recently been observed that neural language models trained on unstructured text can implicitly store and retrieve knowledge using natural language queries. In this short paper, we measure...
Sewon Min
@sewon__min
Nov 1, 2021
Introducing ✨MetaICL✨, where an LM is learned how to in-context learn, and then is tested frozen on an unseen target task. #NLProc Paper: arxiv.org/abs/2110.15943 Code: github.com/facebookresear… Demo: qa.cs.washington.edu:2021 with @ml_perception @LukeZettlemoyer @HannaHajishirzi
Sewon Min
@sewon__min
Aug 10, 2021
New paper!✨We introduce a noisy channel approach for LM prompting in few-shot text classification. Channel models are more stable (much lower variance), and better with limited data / imbalanced labels. arxiv.org/abs/2108.04106 w/ @ml_perception @HannaHajishirzi @LukeZettlemoyer
Sewon Min
@sewon__min
Apr 30, 2022
This *unintentionally* spreads the idea of which person gets the x-th place, who are the top-x, etc. Please don't rank researchers and judge them based on # of papers. I know the original tweet never meant this, but seeing this will implicitly affect young researchers like us.
Marek Rei
@MarekRei
Apr 28, 2022
Analysis of ML and NLP publication statistics from 2021. marekrei.com/blog/ml-and-nl… #machinelearning #NLProc
Sewon Min
@sewon__min
Jan 9, 2024
Excited to be hosting the workshop on Mathematical & Empirical Understanding of Foundation Models at #ICLR2024 in Vienna! Website: sites.google.com/view/me-fomo20… Paper deadline: Feb 3 We welcome unpublished/ongoing work, or work published to non-ML venues!✨
Sadhika Malladi
@SadhikaMalladi
Jan 9, 2024
Announcing the 2nd Workshop on Mathematical and Empirical Understanding of Foundation Models (ME-FoMo) at ICLR 2024! Improving our understanding helps us advance capabilities and build safer, more aligned models. Paper deadline is Feb 3! Website: sites.google.com/view/me-fomo20…
sites.google.com
ME-FoMo 2024
Update April 21, 2024: Schedule is available here! Foundation models (FMs) have revolutionized machine learning research across domains. These models are trained on extensive, highly varied datasets...
18K
Sewon Min
@sewon__min
Jan 1, 2021
Happy new year! #NeurIPS2020 EfficientQA organizers, together with participants, wrote a paper that includes systems, analyses, and lessons learned from the competition. tinyurl.com/efficientqa-re… Thanks to everyone who took part in it!
Sewon Min
@sewon__min
May 24, 2023
Check out our new work that tries to make the evaluation of LM's factuality📘 easier & simpler🚗 w/o compromising thoroughness🔎
Kalpesh Krishna
@kalpeshk2011
May 23, 2023
Factuality in long-form generation is hard to evaluate because (1) we don't know how to assign an accuracy value when a generation has mixed pieces of true/false info, and (2) human evaluation is extremely costly. But from now on, you can use FActScore! tinyurl.com/FActScore
20K
Sewon Min
@sewon__min
Mar 29, 2024
I agree! Evaluating factuality of long-form text in general is very difficult as some sentences are hard to decompose into independent claims and many claims are not easily verifiable. "Biography" is a *very special case* where these things are relatively easy.
Greg Durrett
@gregd_nlp
Mar 28, 2024
This is a cool method, but "superhuman" is an overclaim based on the data shown. There are better datasets than FActScore for evaluating this: ExpertQA arxiv.org/abs/2309.07852 by @cmalaviya11 +al Factcheck-GPT arxiv.org/abs/2311.09000 by Yuxia Wang +al (+ same methodology) 🧵
25K