user avatar
Stefano Ermon
Inception
@StefanoErmon
AI Prof @Stanford | CEO & Cofounder @_inception_ai | Co-inventor of DDIM, FlashAttention, DPO, GAIL, and score-based/diffusion models
Stanford, CA
Joined February 2013
Posts
  • Pinned
    user avatar
    Mercury 2 is live 🚀🚀 The world’s first reasoning diffusion LLM, delivering 5x faster performance than leading speed-optimized LLMs. Watching the team turn years of research into a real product never gets old, and I’m incredibly proud of what we’ve built. We’re just getting
    00:00
  • user avatar
    When we began applying diffusion to language in my lab at Stanford, many doubted it could work. That research became Mercury diffusion LLM: 10X faster, more efficient, and now the foundation of @_inception_ai. Proud to raise $50M with support from top investors.
    Today’s LLMs are painfully slow and expensive. They are autoregressive and spit out words sequentially. One. At. A. Time. Our dLLMs generate text in parallel, delivering answers up to 10X faster. Now we’ve raised $50M to scale them. Full story from @russellbrandom in
  • user avatar
    Tired of chasing references across dozens of papers? This monograph distills it all: the principles, intuition, and math behind diffusion models. Thrilled to share!
    Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core
  • user avatar
    Excited to share that I’ve been working on scaling up diffusion language models at Inception. A new generation of LLMs with unprecedented capabilities is coming!
    We are excited to introduce Mercury, the first commercial-grade diffusion large language model (dLLM)! dLLMs push the frontier of intelligence and speed with parallel, coarse-to-fine text generation.
    00:00
  • user avatar
    Super proud of my student Aditya who successfully defended his #PhD dissertation today! He has done awesome work on unsupervised learning with generative models. Congrats, Dr. @adityagrover_ 👏🎊🎉
  • user avatar
    Replying to @elonmusk and @_inception_ai
    Totally agree. Diffusion works on any bitstream, and once you remove the sequential bottleneck, you unlock a new regime of speed and fidelity. For text, the small delay to first sentence is often outweighed by massive gains in coherence and global planning. And for video and
  • user avatar
    If all training images for a GAN/VAE/PixelCNN have 2 objects, will they only generate images with 2 objects? If trained on (🔵,💙,🔴), will they also generate ❤️? Find out in @shengjia_zhao's blog post on generalization and bias for generative models. 👉ermongroup.github.io/blog/bias-and-…
    GIF
  • user avatar
    Replying to @shengjia_zhao
    So proud to see this, Shengjia. It’s been a joy to be your PhD advisor and watch your path evolve. Excited to see where you and the team take things next!
  • user avatar
    Thrilled to share that our paper "Comparing Distributions by Measuring Differences that Affect Decision Making" wins #ICLR2022 Outstanding Paper Award🎉blog.iclr.cc/2022/04/20/ann… Congratulations to my awesome students @shengjia_zhao @a7b2_3 @electronickale Aidan @baaadas👏
  • user avatar
    Diffusion models are state-of-the-art for continuous data generation (images, videos, etc). Can they beat autoregressive models also on text generation? Check out our ICML paper tomorrow to find out how. Congrats to my students @aaron_lou @chenlin_meng for the best paper award!
    Congratulations to the best paper award winners
  • user avatar
    Very excited about this work: diffusion models finally bridging the gap with autoregressive models on language!
    Announcing Score Entropy Discrete Diffusion (SEDD) w/ @chenlin_meng @StefanoErmon. SEDD challenges the autoregressive language paradigm, beating GPT-2 on perplexity and quality! Arxiv: arxiv.org/abs/2310.16834 Code: github.com/louaaron/Score… Blog: aaronlou.com/blog/discrete-… 🧵1/n
    GIF
  • user avatar
    Want to recharge your electric vehicle in 10 minutes? Check out our @Nature paper on optimizing battery charging protocols with machine learning👉nature.com/articles/s4158… 🔋 battery testing times slashed by nearly 15-fold news.stanford.edu/?p=32364 via @Stanford
  • user avatar
    Vintage AI hype: NYTimes on the perceptron (1958)
  • user avatar
    A paper blatantly plagiarized our CTM paper (see some of their verbatim copy&paste below). Feeling bad for my junior collaborators @gimdong58085414 and @JCJesseLai who worked so hard on this.
    We sadly found out our CTM paper (ICLR24) was plagiarized by TCD! It's unbelievable😢—they not only stole our idea of trajectory consistency but also comitted "verbatim plagiarism," literally copying our proofs word for word! Please help me spread this.