Maxime Labonne

Maxime Labonne

Home
LLM Course
LLM Engineer's Handbook
X
LinkedIn
HF

DeepSeek V4: ten teachers, one student

On-policy distillation replaced the RL stage
READ THE LATEST

Recent posts

View all
Nemotron 3 Ultra: what distillation can't fix
Ten specialist teachers distilled into one open 550B model
Jun 8 • Maxime Labonne
Nemotron Cascade 2: On-policy distillation is back!
Multi-domain On-Policy Distillation, open datasets, and gold medals
Mar 23 • Maxime Labonne
Nemotron 3 Super: NVIDIA's gpt-oss killer?
120B parameters, 12B active, 512 experts, and 25 trillion tokens of NVFP4 pretraining
Mar 12 • Maxime Labonne
Kimi K2.5: Still Worth It After Two Weeks?
Agent swarm and early fusion for better vision capabilities
Feb 19 • Maxime Labonne
Qwen3.5: Nobody Agrees on Attention Anymore
Bigger than MiniMax-M2.5, sparser than GLM-5, as good as Kimi K2.5?
Feb 16 • Maxime Labonne
Top posts
Qwen3.5: Nobody Agrees on Attention Anymore
Feb 16 • Maxime Labonne
MiniMax-M2.5: The $1/hour Frontier Model
Feb 13 • Maxime Labonne
Nemotron 3 Super: NVIDIA's gpt-oss killer?
Mar 12 • Maxime Labonne
Nemotron 3 Ultra: what distillation can't fix
Jun 8 • Maxime Labonne
Kimi K2.5: Still Worth It After Two Weeks?
Feb 19 • Maxime Labonne
© 2026 Maxime Labonne · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture