Deep (Learning) Focus
Subscribe
Sign in
Home
Notes
The Author
Archive
About
GRPO++: Tricks for Making RL Actually Work
How to go from the vanilla GRPO algorithm to functional RL training at scale...
READ THE LATEST
Most Popular
View all
Decoder-Only Transformers: The Workhorse of Generative LLMs
Mar 4, 2024
•
Cameron R. Wolfe, Ph.D.
157
15
9
Demystifying Reasoning Models
Feb 18, 2025
•
Cameron R. Wolfe, Ph.D.
269
5
28
AI Agents from First Principles
Jun 9, 2025
•
Cameron R. Wolfe, Ph.D.
352
24
44
Understanding and Using Supervised Fine-Tuning (SFT) for Language Models
Sep 11, 2023
•
Cameron R. Wolfe, Ph.D.
80
5
8
Latest
Top
Discussions
Olmo 3 and the Open LLM Renaissance
Fully-open artifacts with the potential to make LLM research a reality for anyone...
Dec 15, 2025
•
Cameron R. Wolfe, Ph.D.
74
7
14
Group Relative Policy Optimization (GRPO)
How the algorithm that teaches LLMs to reason actually works...
Nov 24, 2025
•
Cameron R. Wolfe, Ph.D.
96
11
14
PPO for LLMs: A Guide for Normal People
Understanding the complex RL algorithm that gave us modern LLMs…
Oct 27, 2025
•
Cameron R. Wolfe, Ph.D.
138
12
14
REINFORCE: Easy Online RL for LLMs
How to get the benefits of online RL without the complexity of PPO...
Sep 29, 2025
•
Cameron R. Wolfe, Ph.D.
94
11
6
Online versus Offline RL for LLMs
A deep dive into the online-offline performance gap in LLM alignment...
Sep 8, 2025
•
Cameron R. Wolfe, Ph.D.
86
5
10
GPT-oss from the Ground Up
Everything you should know about OpenAI's new open-weight language models...
Aug 18, 2025
•
Cameron R. Wolfe, Ph.D.
100
13
14
Direct Preference Optimization (DPO)
How to align LLMs with limited hardware and minimal complexity...
Jul 28, 2025
•
Cameron R. Wolfe, Ph.D.
118
19
11
See all
Deep (Learning) Focus
I contextualize and explain important topics in AI research.
Subscribe
Recommendations
View all 13
Javarevisited Newsletter
javinpaul
AI by Hand ✍️
Prof. Tom Yeh
The VC Corner
Ruben Dominguez
AI Newsletter
elvis
Interconnects
Nathan Lambert
Deep (Learning) Focus
Subscribe
About
Archive
Recommendations
Sitemap
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts