Pinned
🧵 1/11 Everyone's doing on-policy distillation now (Qwen3, Deepseek V4, GLM-5).
But here's what nobody's asking: at any given token or for a question and a teacher, when does the teacher's guidance actually help, and when does it quietly make things worse?
We found a way to










