Yuntian Deng (@yuntiandeng) / X

Yuntian Deng

1,295 posts

Yuntian Deng

@yuntiandeng

yuntiandeng.com

Joined September 2016

Pinned
Yuntian Deng
@yuntiandeng
Jun 5
Article
From AI as Problem Solver to AI as Tool Builder
Imagine a future "SoftwareGPT". You buy a new household cleaning robot. You take a few pictures of your rooms. You describe a few rules: avoid the cables in that corner. Then SoftwareGPT generates a...
8.2K
Yuntian Deng
@yuntiandeng
Sep 17, 2024
Is OpenAI's o1 a good calculator? We tested it on up to 20x20 multiplication—o1 solves up to 9x9 multiplication with decent accuracy, while gpt-4o struggles beyond 4x4. For context, this task is solvable by a small LM using implicit CoT with stepwise internalization. 1/4
2.5M
Yuntian Deng
@yuntiandeng
May 29, 2024
Can we teach LMs to internalize chain-of-thought (CoT) reasoning steps? We found a simple method: start with an LM trained with CoT, gradually remove CoT steps and finetune, forcing the LM to internalize reasoning. Paper: bit.ly/internalize_st… Done w/ @YejinChoinka @pmphlt 1/5
GIF
127K
Yuntian Deng
@yuntiandeng
Feb 12, 2025
For those curious about how o3-mini performs on multi-digit multiplication, here's the result. It does much better than o1 but still struggles past 13×13. (Same evaluation setup as before, but with 40 test examples per cell.)
Yuntian Deng
@yuntiandeng
Sep 17, 2024
Is OpenAI's o1 a good calculator? We tested it on up to 20x20 multiplication—o1 solves up to 9x9 multiplication with decent accuracy, while gpt-4o struggles beyond 4x4. For context, this task is solvable by a small LM using implicit CoT with stepwise internalization. 1/4
1.2M
Yuntian Deng
@yuntiandeng
Jul 19, 2024
We trained GPT2 to predict the product of two numbers up to 🌟20🌟 digits w/o intermediate reasoning steps, surpassing our previous 15-digit demo! How does a 12-layer LM solve 20-digit multiplication w/o CoT?🤯 Try our demo: huggingface.co/spaces/yuntian… Paper: bit.ly/internalize_st…
GIF
157K
Yuntian Deng
@yuntiandeng
Sep 17, 2024
Replying to @yuntiandeng
Lastly, this task is solvable even by a small language model: Implicit CoT with Stepwise Internalization can solve up to 20x20 multiplication with 99.5% accuracy, using a gpt-2 small architecture (117M parameters). 4/4 x.com/yuntiandeng/st…
Yuntian Deng
@yuntiandeng
Jul 19, 2024
We trained GPT2 to predict the product of two numbers up to 🌟20🌟 digits w/o intermediate reasoning steps, surpassing our previous 15-digit demo! How does a 12-layer LM solve 20-digit multiplication w/o CoT?🤯 Try our demo: huggingface.co/spaces/yuntian… Paper: bit.ly/internalize_st…
75K
Yuntian Deng
@yuntiandeng
Jul 26, 2025
Today I learned a student of mine from China gave up waiting for his Canadian visa after over a year without updates: 1. He was a Vector Scholarship awardee. 2. He had to set aside $20K under the Direct Stream (for faster visa processing), despite being my funded student. 3. He
74K
Yuntian Deng
@yuntiandeng
Nov 20, 2023
I am hiring NLP/ML PhD students at UWaterloo, home to 5 NLP professors! Apply by Dec 1 Strong consideration will be given to those who can tackle the below challenge: Can we use LM's hidden states to reason multiple problems simultaneously? Retweets/shares appreciated🥰
139K
Yuntian Deng
@yuntiandeng
Nov 7, 2023
Can LMs solve reasoning tasks without showing their work? "Implicit Chain of Thought Reasoning via Knowledge Distillation" teaches LMs to reason internally to solve tasks like 5×5 multiplication. Here's how we bypass human-like step-by-step reasoning bit.ly/implicitCoT 1/6
GIF
94K
Yuntian Deng
@yuntiandeng
Oct 21, 2024
How many reasoning tokens does OpenAI o1 use? It turns out they are almost always multiples of 64 (99+% of the time in 100K collected turns)🤔Could it be that the model only uses multiples of 64 tokens to think? Or maybe OpenAI rounds the token count in the returned usage? 1/4
63K
Yuntian Deng
@yuntiandeng
Sep 17, 2024
Replying to @yuntiandeng
Interestingly, the number of private reasoning tokens grows sublinearly with problem size, but is beyond what human-written CoT requires. For example, for 20x20, o1 uses ~3600 reasoning tokens, but human CoT needs ~400 for partial products and ~400 for sums, totaling ~800. 2/4
75K
Yuntian Deng
@yuntiandeng
Jun 26, 2023
Excited to share that I'm joining @UWCheritonCS as an Assistant Professor and @VectorInst as a Faculty Affiliate in Fall '24. Before that, I'm doing a postdoc at @allen_ai with @YejinChoinka. Immensely grateful to my PhD advisors @srush_nlp and @pmphlt. This journey wouldn't have
74K
Yuntian Deng
@yuntiandeng
Mar 29, 2023
Ever wondered how nondeterministic GPT-4 is even with greedy decoding (T=0)? I built a website that asks GPT-4 to draw a unicorn every hour and tracks if the results stay consistent over time (spoiler alert: they don't! 🦄). Explore the findings: openaiwatch.com
78K
Yuntian Deng
@yuntiandeng
Aug 11, 2025
🚀New dataset release: WildChat-4.8M 4.8M real user-ChatGPT conversations collected from our public chatbots: - 122K from reasoning models (o1-preview, o1-mini): represent real uses in the wild and very costly to collect - 2.5M from GPT-4o 🔗 hf.co/datasets/allen… (1/4)
Yuntian Deng
@yuntiandeng
May 3, 2024
Thrilled to see WildChat featured by @_akhaliq, just as predicted by AKSelectionPredictor!😊 Explore 1 million user-ChatGPT conversations, plus details like country, state, timestamp, hashed IP, and request headers here: huggingface.co/datasets/allen…
GIF
allenai/WildChat-4.8M · Datasets at Hugging Face
From huggingface.co
44K