You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Deferred from PR #2508 (feat(xtoken): cross-tokenizer off-policy distillation). That PR only supports TP=1, CP=1, which is
already covered by an existing functional test — so adding a nightly there would catch few accuracy regressions while making
the nightly suite heavier. Per agreement in #2508 (#2508 (comment))
(@yuki-97 / @RayenTian / @avenkateshha), the meaningful nightly — one exercising a heterogeneous teacher/student TP/CP
parallel plan — is tracked here and will land in the TP/CP-support PR.
TP/CP-sharded logits + per-rank teacher IPC export are being added on branch ruit/xtoken-tp-cp-l1b (the follow-up). That
path is where heterogeneous TP/CP (teacher TP/CP ≠ student TP/CP, possibly different DP degrees) becomes testable.
A TP=1/CP=1 nightly cannot regress the sharding/reduction logic that the follow-up introduces; a heterogeneous-parallel
nightly can.
Scope / Acceptance criteria
Add a nightly test following the conventions in .claude/skills/testing / skills:
Recipe YAML under examples/configs/recipes/llm/ exercising a heterogeneous TP/CP plan (e.g. teacher TP2/CP1 → student
TP1/CP1, or a TP×CP combo that triggers cross-tokenizer logit sharding + reduction).
Driver script under tests/test_suites/llm/ with the matching base name + .sh, sourcing common.env and invoking the
xtoken distillation entrypoint via uv run ... --config .
Register the driver path in tests/test_suites/nightly.txt, under the existing # Distillation section.
Keep it lightweight (short run, in the spirit of the existing
distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh), so as not to bloat the nightly.
Summary
Deferred from PR #2508 (feat(xtoken): cross-tokenizer off-policy distillation). That PR only supports TP=1, CP=1, which is
already covered by an existing functional test — so adding a nightly there would catch few accuracy regressions while making
the nightly suite heavier. Per agreement in #2508 (#2508 (comment))
(@yuki-97 / @RayenTian / @avenkateshha), the meaningful nightly — one exercising a heterogeneous teacher/student TP/CP
parallel plan — is tracked here and will land in the TP/CP-support PR.
Background
path is where heterogeneous TP/CP (teacher TP/CP ≠ student TP/CP, possibly different DP degrees) becomes testable.
nightly can.
Scope / Acceptance criteria
Add a nightly test following the conventions in .claude/skills/testing / skills:
TP1/CP1, or a TP×CP combo that triggers cross-tokenizer logit sharding + reduction).
xtoken distillation entrypoint via uv run ... --config .
distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh), so as not to bloat the nightly.
References