Maxime Labonne
Subscribe
Sign in
Home
LLM Course
LLM Engineer's Handbook
X
LinkedIn
HF
DeepSeek V4: ten teachers, one student
On-policy distillation replaced the RL stage
READ THE LATEST
Recent posts
View all
Nemotron 3 Ultra: what distillation can't fix
Ten specialist teachers distilled into one open 550B model
Jun 8
•
Maxime Labonne
15
2
2
Nemotron Cascade 2: On-policy distillation is back!
Multi-domain On-Policy Distillation, open datasets, and gold medals
Mar 23
•
Maxime Labonne
10
Nemotron 3 Super: NVIDIA's gpt-oss killer?
120B parameters, 12B active, 512 experts, and 25 trillion tokens of NVFP4 pretraining
Mar 12
•
Maxime Labonne
20
4
Kimi K2.5: Still Worth It After Two Weeks?
Agent swarm and early fusion for better vision capabilities
Feb 19
•
Maxime Labonne
8
2
Qwen3.5: Nobody Agrees on Attention Anymore
Bigger than MiniMax-M2.5, sparser than GLM-5, as good as Kimi K2.5?
Feb 16
•
Maxime Labonne
31
3
4
See all
Top posts
Qwen3.5: Nobody Agrees on Attention Anymore
Feb 16
•
Maxime Labonne
31
3
4
MiniMax-M2.5: The $1/hour Frontier Model
Feb 13
•
Maxime Labonne
24
2
4
Nemotron 3 Super: NVIDIA's gpt-oss killer?
Mar 12
•
Maxime Labonne
20
4
Nemotron 3 Ultra: what distillation can't fix
Jun 8
•
Maxime Labonne
15
2
2
Kimi K2.5: Still Worth It After Two Weeks?
Feb 19
•
Maxime Labonne
8
2
Subscribe
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts