Ludwig Schmidt (@lschmidt3) / X

Ludwig Schmidt

246 posts

Ludwig Schmidt

@lschmidt3

Assistant professor at @Stanford and member of the technical staff at @AnthropicAI.

Palo Alto, CA

people.csail.mit.edu/ludwigs/

Joined August 2009

Ludwig Schmidt
@lschmidt3
Jun 5, 2025
Very excited to finally release our paper for OpenThoughts! After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.
188K
Ludwig Schmidt
@lschmidt3
Apr 28, 2023
Super excited to finally release DataComp! There is still a lot we don't understand about Internet-scale datasets. DataComp makes research on datasets more accessible and leads to better training sets. The results so far are very encouraging and there is much more to explore!
Gabriel Ilharco
@gabriel_ilharco
Apr 28, 2023
Introducing DataComp, a new benchmark for multimodal datasets! We release 12.8B image-text pairs, 300+ experiments and a 1.4B subset that outcompetes compute-matched CLIP runs from OpenAI & LAION 📜 arxiv.org/abs/2304.14108 🖥️ github.com/mlfoundations/… 🌐 datacomp.ai
17K
Ludwig Schmidt
@lschmidt3
Jun 23, 2025
I'm a big fan of the approach to research funding @andykonwinski and the Laude team are taking! Working with them on terminal-bench has been fantastic (thanks @alexgshaw!) and I'm excited that they're going to support more open, impact-oriented research.
Andy Konwinski
@andykonwinski
Jun 23, 2025
Today, I’m launching a deeply personal project. I’m betting $100M that we can help computer scientists create more upside impact for humanity. Built for and by researchers, including @JeffDean & @jpineau1 on the board, @LaudeInstitute catalyzes research with real-world impact.
12K
Ludwig Schmidt
@lschmidt3
May 20, 2025
Very excited about our new agent benchmark! I think it's a nice way of evaluating how well agents can do complex task in terminal (command line) environments.
Mike A. Merrill
@Mike_A_Merrill
May 19, 2025
Many agents (Claude Code, Codex CLI) interact with the terminal to do valuable tasks, but do they currently work well enough to deploy en masse? We’re excited to introduce Terminal-Bench: An evaluation environment and benchmark for AI agents on real-world terminal tasks. Tl;dr
12K
Ludwig Schmidt
@lschmidt3
Jun 19, 2024
Very excited about this! DCLM already led to a great training set for language models, and there is (much) more to understand + more room for improvement here.
Vaishaal Shankar
@Vaishaal
Jun 18, 2024
I am really excited to introduce DataComp for Language Models (DCLM), our new testbed for controlled dataset experiments aimed at improving language models. 1/x
7.2K
Ludwig Schmidt
@lschmidt3
Jan 28, 2025
Very excited about this!
Ryan Marten
@ryanmart3n
Jan 28, 2025
Announcing the Open Thoughts project. We are building the best reasoning datasets out in the open. Building off our work with Stratos, today we are releasing OpenThoughts-114k and OpenThinker-7B.
6.2K
Ludwig Schmidt
@lschmidt3
Apr 30, 2019
If you are working on empirical phenomena in deep learning, consider submitting to our ICML workshop "Identifying and Understanding Deep Learning Phenomena" (deep-phenomena.org). The deadline is May 5, but relevant work that was already published elsewhere is still welcome!
Ludwig Schmidt
@lschmidt3
Jul 16, 2024
I learned a lot about the nuances of language model scaling laws from this project. Also the checkpoints are available now:
Tomer Porian
@tomerporian
Jul 2, 2024
🧵1/8 We resolve the discrepancy between the compute optimal scaling laws of Kaplan (exponent 0.88, Figure 14, left) et al. and Hoffmann et al. (“Chinchilla”, exponent 0.5). Paper: arxiv.org/abs/2406.19146 Data + Code: github.com/formll/resolvi…
formll/resolving-scaling-law-discrepancies · Hugging Face
From huggingface.co
7.3K
Ludwig Schmidt
@lschmidt3
Jun 5, 2025
Replying to @lschmidt3
Similar to previous DataComp projects, we systematically experiment with every step of the data generation pipeline to build a state-of-the-art training set. Overall we conducted more than 1,000 individual experiments.
4.4K
Ludwig Schmidt
@lschmidt3
Jun 5, 2025
Replying to @lschmidt3
More details on openthoughts.ai/blog/ot3, Ryan’s thread below, and the paper itself arxiv.org/abs/2506.04178
Ryan Marten
@ryanmart3n
Jun 5, 2025
Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data
OpenThoughts3 - A new SOTA Reasoning Data Recipe
From openthoughts.ai
6.6K
Ludwig Schmidt
@lschmidt3
Jun 7, 2025
Replying to @giffmana
Thanks for the kind words, Lucas! I hope we get a chance to work together some day, I'm a big fan of your work. BTW my lab is always looking for good postdocs. Comp is probably worse than OpenAI, but long-time lab members get to go on runs with @Vaishaal's dog Kaya. He's great!
3.1K
Ludwig Schmidt
@lschmidt3
Feb 12, 2025
Very nice community progress on open-data reasoning models since the R1 release!
Negin Raoof
@NeginRaoof_
Feb 12, 2025
Announcing OpenThinker-32B: the best open-data reasoning model distilled from DeepSeek-R1. Our results show that large, carefully curated datasets with verified R1 annotations produce SoTA reasoning models. Our 32B model outperforms all 32B models including
3.9K
Ludwig Schmidt
@lschmidt3
Jun 5, 2025
Replying to @lschmidt3
Together with the paper we also release our new dataset OpenThoughts3-1.2M and the corresponding model OpenThinker3-7B, which is currently the best open-data 7B reasoning model.
5.6K
Ludwig Schmidt
@lschmidt3
Dec 8, 2020
Replying to @beenwrekt and @HazyResearch
Congrats! Do you know "Benjamen Recht"? He won the 2017 test of time award nips.cc/Conferences/20… Could be related?