Rafael Rafailov @ NeurIPS (@rm

Rafael Rafailov @ NeurIPS

1,255 posts

Rafael Rafailov @ NeurIPS

@rm_rafailov

I work on RL at @thinkymachines. Previously at @StanfordAILab @GoogleDeepMind @UCBerkeley

Stanford, CA

Joined May 2023

Pinned
Rafael Rafailov @ NeurIPS
@rm_rafailov
Dec 2, 2025
I will be at NeurIPS this week! If you want to talk about research, RL, life at @thinkymachines or get some Tinker credits reach out!
19K
Rafael Rafailov @ NeurIPS
@rm_rafailov
Jan 9, 2025
We have a new position paper on "inference time compute" and what we have been working on in the last few months! We present some theory on why it is necessary, how does it work, why we need it and what does it mean for "super" intelligence.
181K
Rafael Rafailov @ NeurIPS
@rm_rafailov
Apr 19, 2024
We have a new preprint out - your language model is not a reward, it’s a Q function! 1. The likelihood of the preferred answer must go down - it’s a policy divergence 2. MCTS guided decoding on language is equivalent to likelihood search on DPO 3. DPO learns credit assignment
100K
Rafael Rafailov @ NeurIPS
@rm_rafailov
Aug 13, 2024
Super excited to announce what we have been working on in the last six months - Agent Q is out now! This is a framework for self-supervised agent reasoning and search that can self-correct and autonomously improve by self-play and RL on real tasks on the real internet! 👇
166K
Rafael Rafailov @ NeurIPS
@rm_rafailov
Sep 29, 2025
The most surprising thing working on this was that RL with LoRA completely matches full training and develops the same extended reasoning patterns. I think this is a great sign for custom agent training.
Thinking Machines
@thinkymachines
Sep 29, 2025
LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.
45K
Rafael Rafailov @ NeurIPS
@rm_rafailov
Nov 30, 2023
Excited to announce DPO has gone multi-modal! New paper out on RLHF for text-to-image diffusion models! We obtain large-scale state of the art results with 70% win rates against Stable Diffusion XL on human evals! Deep dive below 🧵
234K
Rafael Rafailov @ NeurIPS
@rm_rafailov
Aug 26, 2024
My Bet: Strawberry is algorithm distillation/procedural cloning. Everyone right now is coming up with ways to distill System 2 into System 1, but that will always be limited. We need to train the model to run the algorithms, not just outputs (and post-train with RL of course).
124K
Rafael Rafailov @ NeurIPS
@rm_rafailov
Nov 29, 2023
I saw this challenge aimoprize.com to develop an AI that can win a gold medal at the IMO. I competed at that level a couple of times (only silver medals though) and have been working on RL and LLMs for a bit. Here is my thoughts on what the challenges are: 1/N
AIMO Prize
From aimoprize.com
161K
Rafael Rafailov @ NeurIPS
@rm_rafailov
May 11, 2024
Not to mention that most students don’t even have access to that cluster. I don’t have access to any A100s myself. It is becoming increasingly hard to even do research and that is Stanford, other places have it even worse.
Tsarathustra
@tsarnick
May 10, 2024
Fei-Fei Li says Stanford's Natural Language computing lab has only 64 GPUs and academia is "falling off a cliff" relative to industry
00:00
106K
Rafael Rafailov @ NeurIPS
@rm_rafailov
Oct 7, 2024
Excited to announce our latest work on generative reward models that unify RLHF and RLAIF approaches! We begin with a standard LLM-as-a-judge RLAIF framework and use further RL tuning to align the judge model's evaluations with the preference dataset.
66K
Rafael Rafailov @ NeurIPS
@rm_rafailov
Oct 2, 2025
I actually believe Tinker could be the most advanced ML system in the world. It optimizes everything from the kernel level to a distributed system that can process millions of simultaneous requests with near 100% reliability and insane throughput efficiency.
Myle Ott
@myleott
Oct 1, 2025
So excited about this! Tinker provides a simple+powerful interface for postraining/RL research. It also manages all the infrastructure so that users can focus on data and environments. Hidden behind that simple interface is a ton of interesting and complex ML systems challenges!
46K
Rafael Rafailov @ NeurIPS
@rm_rafailov
Sep 12, 2024
Fn nailed it - tree search distillation + RL post training!
Rafael Rafailov @ NeurIPS
@rm_rafailov
Aug 26, 2024
My Bet: Strawberry is algorithm distillation/procedural cloning. Everyone right now is coming up with ways to distill System 2 into System 1, but that will always be limited. We need to train the model to run the algorithms, not just outputs (and post-train with RL of course).
39K
Rafael Rafailov @ NeurIPS
@rm_rafailov
Oct 1, 2025
Very excited to share what I have been working on with a great team of people at @thinkymachines. Tinker is a whole new way to train and customize models all the way up to frontier scale. Most importantly, it allows everyone to use their own code, data, tools and environments,
Thinking Machines
@thinkymachines
Oct 1, 2025
Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!
52K
Rafael Rafailov @ NeurIPS
@rm_rafailov
Mar 6, 2025
This is a really cool project where we trained a multi-agent system of 3 LLMs to do cooperative problem-solving end-to-end with reinforcement learning! MARL holds a lot of promise to teach models to be more cooperative with real collaborators! Check out @sumeetrm's thread bellow!
Sumeet Motwani
@sumeetrm
Mar 6, 2025
Introducing MALT: Improving Reasoning with Multi-Agent LLM Training🫡 We present a new multi-agent post-training method that uses credit assigned synthetic data to improve the reasoning capabilities and self-correction rates of a generator, critic, and refinement model working
56K