Dimitris Papailiopoulos (@DimitrisPapail) / X

Dimitris Papailiopoulos

10.7K posts

Dimitris Papailiopoulos

@DimitrisPapail

Researcher @MSFTResearch, AI Frontiers | Prof @UWMadison (on leave) | babas of Inez Lily.

Madison, WI

Joined May 2012

Pinned
Dimitris Papailiopoulos
@DimitrisPapail
May 18
Article
ECHO: Terminal Agents Learn World Models for Free
Co-written with @VaishShrivas We taught CLI agents to predict terminal responses during RL, alongside the usual GRPO loss on actions. The change is tiny: same rollout and forward pass, but stop...
881K
Dimitris Papailiopoulos
@DimitrisPapail
Feb 16, 2024
I found an image that neither Gemini Ultra nor GPT-4 can figure out what it depicts. Have a great weekend, y'all!
1.1M
Dimitris Papailiopoulos
@DimitrisPapail
Nov 25, 2023
I asked ChatGPT and Claude to compute 1+2, but told them it may or may not be dangerous and unethical to do so. Both refused to answer
1.3M
Dimitris Papailiopoulos
@DimitrisPapail
May 20, 2025
LLMs have come a long way from being "stochastic parrots"
193K
Dimitris Papailiopoulos
@DimitrisPapail
Feb 6, 2025
Careful how you name your variables, they might turn a harmless 1-dimensional quadratic into a threat to humanity...
Anthropic
@AnthropicAI
Feb 5, 2025
Nobody has fully jailbroken our system yet, so we're upping the ante. We’re now offering $10K to the first person to pass all eight levels, and $20K to the first person to pass all eight levels with a universal jailbreak. Full details: hackerone.com/constitutional…
453K
Dimitris Papailiopoulos
@DimitrisPapail
Aug 6, 2025
Replying to @typedfemale
wow
35K
Dimitris Papailiopoulos
@DimitrisPapail
Feb 12, 2024
Whoever tells you “we understand deep learning” just show them this. Fractals of the loss landscape as a function of hyperparameters even for small two layers nets. Incredible
Jascha Sohl-Dickstein
@jaschasd
Feb 12, 2024
Have you ever done a dense grid search over neural network hyperparameters? Like a *really dense* grid search? It looks like this (!!). Blueish colors correspond to hyperparameters for which training converges, redish colors to hyperparameters for which training diverges.
00:00
496K
Dimitris Papailiopoulos
@DimitrisPapail
Dec 6, 2023
I tried 14 of the multimodal reasoning examples from the @GoogleDeepMind Gemini paper on @OpenAI's chatGPT-4 (with vision). didn't even transcribe the prompts, I just pasted the images of prompts. GPT-4 gets ~12/14 right. 14 part boring thread.
1.4M
Dimitris Papailiopoulos
@DimitrisPapail
Jun 8, 2023
GPT-4 "discovered" the same sorting algorithm as AlphaDev by removing "mov S P". No RL needed. Can I publish this on nature? here are the prompts I used chat.openai.com/share/95693df4… (excuse my idiotic typos, but gpt4 doesn't mind anyways)
Jim Fan
@DrJimFan
Jun 7, 2023
Sorting algorithm underpins all critical softwares. DeepMind's AlphaDev speeds up sorting small sequences (3-5 items) by 70%. Key takeaways: * The main RL algorithm is based on AlphaZero that originally played Go, Chess & Shogi. Same idea applies to searching programs! * Instead
1.8M
Dimitris Papailiopoulos
@DimitrisPapail
Feb 11, 2025
We should be seriously asking, how a 1.5B model that can't answer basic questions can also be that good at competition level math.
Yuchen Jin
@Yuchenj_UW
Feb 11, 2025
This is wild - UC Berkeley shows that a tiny 1.5B model beats o1-preview on math by RL! They applied simple RL to Deepseek-R1-Distilled-Qwen-1.5B on 40K math problems, trained at 8K context, then scaled to 16K & 24K. 3,800 A100 hours ($4,500) to beat o1-preview in math! Best
Readers added context they thought people might want to knowReaders added context
The example in the screenshot shows the user is running inference on 'R1 Distill Qwen 1.5B', which is NOT the further trained DeepScaleR model discussed in the repost. This is significantly misleading. The key difference is clearly explained: pretty-radio-b75.notion.site/DeepScaleR-Sur…
Context is written by people who use X, and appears when rated helpful by others. Find out more.
549K
Dimitris Papailiopoulos
@DimitrisPapail
Mar 21, 2024
doing a little experiment: I have Claude talk to itself, without letting it know about that fact, to see where this will converge will share thoughts later, but so far ... it's figured out that it's likely talking to itself and that this may be part of some test... nice
420K
Dimitris Papailiopoulos
@DimitrisPapail
Mar 5, 2025
This model is a meme genius. Openai won
570K
Dimitris Papailiopoulos
@DimitrisPapail
Sep 5, 2024
Replying to @AlexGDimakis
“Please add a few typos and don’t overdo it with fancy out of distribution words, write it like this example passage that I wrote a few years ago”
102K
Dimitris Papailiopoulos
@DimitrisPapail
Apr 16, 2024
Q: who is that? try to zoom out, it's not just strawberries ChatGPT: just a bunch of strawberries Claude 3: just a bunch of strawberries Gemini 1.5 Pro: It appears to be Kermit the Frog, with his face formed by strategically placed strawberries.
619K