Stefano Ermon (@StefanoErmon) / X

Stefano Ermon

855 posts

Stefano Ermon

@StefanoErmon

AI Prof @Stanford | CEO & Cofounder @_inception_ai | Co-inventor of DDIM, FlashAttention, DPO, GAIL, and score-based/diffusion models

Stanford, CA

cs.stanford.edu/~ermon/

Joined February 2013

Pinned
Stefano Ermon
@StefanoErmon
Feb 24
Mercury 2 is live 🚀🚀 The world’s first reasoning diffusion LLM, delivering 5x faster performance than leading speed-optimized LLMs. Watching the team turn years of research into a real product never gets old, and I’m incredibly proud of what we’ve built. We’re just getting
00:00
1M
Stefano Ermon
@StefanoErmon
Nov 6, 2025
When we began applying diffusion to language in my lab at Stanford, many doubted it could work. That research became Mercury diffusion LLM: 10X faster, more efficient, and now the foundation of @_inception_ai. Proud to raise $50M with support from top investors.
Inception
@_inception_ai
Nov 6, 2025
Today’s LLMs are painfully slow and expensive. They are autoregressive and spit out words sequentially. One. At. A. Time. Our dLLMs generate text in parallel, delivering answers up to 10X faster. Now we’ve raised $50M to scale them. Full story from @russellbrandom in
201K
Stefano Ermon
@StefanoErmon
Oct 29, 2025
Tired of chasing references across dozens of papers? This monograph distills it all: the principles, intuition, and math behind diffusion models. Thrilled to share!
Chieh-Hsin (Jesse) Lai
@JCJesseLai
Oct 29, 2025
Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core
127K
Stefano Ermon
@StefanoErmon
Feb 26, 2025
Excited to share that I’ve been working on scaling up diffusion language models at Inception. A new generation of LLMs with unprecedented capabilities is coming!
Inception
@_inception_ai
Feb 26, 2025
We are excited to introduce Mercury, the first commercial-grade diffusion large language model (dLLM)! dLLMs push the frontier of intelligence and speed with parallel, coarse-to-fine text generation.
00:00
50K
Stefano Ermon
@StefanoErmon
Jul 9, 2020
Super proud of my student Aditya who successfully defended his #PhD dissertation today! He has done awesome work on unsupervised learning with generative models. Congrats, Dr. @adityagrover_ 👏🎊🎉
Stefano Ermon
@StefanoErmon
Nov 7, 2025
Replying to @elonmusk and @_inception_ai
Totally agree. Diffusion works on any bitstream, and once you remove the sequential bottleneck, you unlock a new regime of speed and fidelity. For text, the small delay to first sentence is often outweighed by massive gains in coherence and global planning. And for video and
57K
Stefano Ermon
@StefanoErmon
Sep 29, 2019
If all training images for a GAN/VAE/PixelCNN have 2 objects, will they only generate images with 2 objects? If trained on (🔵,💙,🔴), will they also generate ❤️? Find out in @shengjia_zhao's blog post on generalization and bias for generative models. 👉ermongroup.github.io/blog/bias-and-…
GIF
Stefano Ermon
@StefanoErmon
Jul 25, 2025
Replying to @shengjia_zhao
So proud to see this, Shengjia. It’s been a joy to be your PhD advisor and watch your path evolve. Excited to see where you and the team take things next!
32K
Stefano Ermon
@StefanoErmon
Apr 23, 2022
Thrilled to share that our paper "Comparing Distributions by Measuring Differences that Affect Decision Making" wins #ICLR2022 Outstanding Paper Award🎉blog.iclr.cc/2022/04/20/ann… Congratulations to my awesome students @shengjia_zhao @a7b2_3 @electronickale Aidan @baaadas👏
Stefano Ermon
@StefanoErmon
Jul 23, 2024
Diffusion models are state-of-the-art for continuous data generation (images, videos, etc). Can they beat autoregressive models also on text generation? Check out our ICML paper tomorrow to find out how. Congrats to my students @aaron_lou @chenlin_meng for the best paper award!
ICML Conference
@icmlconf
Jul 23, 2024
Congratulations to the best paper award winners
33K
Stefano Ermon
@StefanoErmon
Mar 1, 2024
Very excited about this work: diffusion models finally bridging the gap with autoregressive models on language!
Aaron Lou
@aaron_lou
Feb 29, 2024
Announcing Score Entropy Discrete Diffusion (SEDD) w/ @chenlin_meng @StefanoErmon. SEDD challenges the autoregressive language paradigm, beating GPT-2 on perplexity and quality! Arxiv: arxiv.org/abs/2310.16834 Code: github.com/louaaron/Score… Blog: aaronlou.com/blog/discrete-… 🧵1/n
GIF
59K
Stefano Ermon
@StefanoErmon
Feb 20, 2020
Want to recharge your electric vehicle in 10 minutes? Check out our @Nature paper on optimizing battery charging protocols with machine learning👉nature.com/articles/s4158… 🔋 battery testing times slashed by nearly 15-fold news.stanford.edu/?p=32364 via @Stanford
Stefano Ermon
@StefanoErmon
Dec 1, 2017
Vintage AI hype: NYTimes on the perceptron (1958)
Stefano Ermon
@StefanoErmon
Mar 26, 2024
A paper blatantly plagiarized our CTM paper (see some of their verbatim copy&paste below). Feeling bad for my junior collaborators @gimdong58085414 and @JCJesseLai who worked so hard on this.
Dongjun Kim
@gimdong58085414
Mar 25, 2024
We sadly found out our CTM paper (ICLR24) was plagiarized by TCD! It's unbelievable😢—they not only stole our idea of trajectory consistency but also comitted "verbatim plagiarism," literally copying our proofs word for word! Please help me spread this.
70K