Percy Liang (@percyliang) / X

Percy Liang

1,306 posts

Percy Liang

@percyliang

professor of computer science @Stanford @stanfordnlp, co-founder of @togethercompute, creator of marin.community, co-founder of @simile_ai, pianist

Stanford, CA

cs.stanford.edu/~pliang/

Joined October 2009

Pinned
Percy Liang
@percyliang
May 19, 2025
What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:
207K
Percy Liang
@percyliang
Jun 18, 2025
Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team @tatsu_hashimoto @marcelroed @neilbband @rckpudi. Researchers are becoming detached from the technical details of how LMs work. In CS336, we try to fix that by having students build everything:
679K
Percy Liang
@percyliang
Jan 26, 2025
While we celebrate @deepseek_ai 's release of open-weight models that we can all play with at home, just a friendly reminder that they are not *open-source*; there’s no training / data processing code, and hardly any information about the data.
776K
Percy Liang
@percyliang
Oct 24, 2025
You spend $1B training a model A. Someone on your team leaves and launches their own model API B. You're suspicious. Was B was derived (e.g., fine-tuned) from A? But you only have blackbox access to B... With our paper, you can still tell with strong statistical guarantees
Sally Zhu
@SallyHZhu
Oct 23, 2025
🔎Did someone steal your language model? We can tell you, as long as you shuffled your training data🔀. All we need is some text from their model! Concretely, suppose Alice trains an open-weight model and Bob uses it to produce text. Can Alice prove Bob used her model?🚨
00:00
381K
Percy Liang
@percyliang
Jun 11, 2024
We should call models like Llama 3, Mixtral, etc. “open-weight models”, not “open-source models”. For a model to be open-source, the code and training data need to be public (good examples: GPT-J, OLMo, RedPajama, StarCoder, K2, etc.). Weights are like an exe file, which would be
261K
Percy Liang
@percyliang
Dec 15, 2022
📣 CRFM announces PubMedGPT, a new 2.7B language model that achieves a new SOTA on the US medical licensing exam. The recipe is simple: a standard Transformer trained from scratch on PubMed (from The Pile) using @MosaicML on the MosaicML Cloud, then fine-tuned for the QA task.
427K
Percy Liang
@percyliang
Oct 23, 2022
Writing on a whiteboard can make it easier for students to follow compared to slides (especially for math). During the pandemic, I added a feature to sfig (my Javascript slides library) to allow me to reveal parts of a slide using the mouse as if I were writing on a whiteboard:
00:00
Percy Liang
@percyliang
Nov 3, 2023
Myth: open foundation models are antithetical to AI safety. Fact: open foundation models are critical for AI safety. Here are three reasons why:
426K
Percy Liang
@percyliang
Jan 29, 2023
I worry about language models being trained on test sets. Recently, we emailed [email protected] to opt out of having our (test) data be used to improve models. This isn't enough though: others running evals could still inadvertently contribute those test sets to training.
292K
Percy Liang
@percyliang
Dec 7, 2022
RL from human feedback seems to be the main tool for alignment. Given reward hacking and the falliability of humans, this strategy seems bound to produce agents that merely appear to be aligned, but are bad/wrong in subtle, inconspicuous ways. Is anyone else worried about this?
Percy Liang
@percyliang
Dec 6, 2024
I miss the days when we evaluated algorithms rather than models. Rather than "how well does model M do?", it should be "given data D and compute C, how well does running algorithm A on D with C do?" I don't think we can get scientific clarity unless we do the latter.
56K
Percy Liang
@percyliang
Nov 17, 2022
Language models are becoming the foundation of language technologies, but when do they work or don’t work? In a new CRFM paper, we propose Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of LMs. Holistic evaluation includes three elements:
Percy Liang
@percyliang
Nov 25, 2024
This year, I have 4 exceptional students on the academic job market, and they couldn’t be more diffferent, with research spanning AI policy, robotics, NLP, and HCI. Here’s a brief summary of their research, along with one representative work each:
123K
Percy Liang
@percyliang
Sep 4, 2025
We did a very careful study of 10 optimizers with no horse in the race. Despite all the excitement about Muon, Mars, Kron, Soap, etc., at the end of the day, if you tune the hyperparameters rigorously and scale up, the speedup over AdamW diminishes to only 10% :-( Experiments
Kaiyue Wen
@wen_kaiyue
Sep 4, 2025
(1/n) Check out our new paper: "Fantastic Pretraining Optimizers and Where to Find Them"! >4000 models to find the fastest optimizer! 2× speedups over AdamW? Unlikely. Beware under-tuned baseline or limited scale! E.g. Muon: ~40% speedups <0.5B & only 10% at 1.2B (8× Chinchilla)!
Fantastic Pretraining Optimizers And Where to Find them · Issue #1290 · marin-community/marin
From github.com
182K