chore(distill): default to MODEL-1 7B teacher + SPEC-DISTILL-001 §86 (PMAT-701 follow-up) by noahgift · Pull Request #1871 · paiml/aprender

noahgift · 2026-05-22T07:14:57Z

Summary

Now that PMAT-701 Bug A (#1863) and Bug B (#1869) have landed, the MODEL-1 7B teacher (`paiml/qwen2.5-coder-7b-apache-q4k-v1`) is feasible on Grace Blackwell GB10. This PR is the small `chore` change that flips the dispatch script default + records the why in SPEC-DISTILL-001 §86.

Changes

`scripts/dispatch-distill-phase-3-gx10.sh`: `TEACHER_REPO` default changes from `Qwen/Qwen2.5-Coder-0.5B-Instruct` (smoke fallback) → `paiml/qwen2.5-coder-7b-apache-q4k-v1` (the spec's intended teacher). Smoke-only callers override with `TEACHER_REPO=...`. Old comment about the 1.5B Block-0 OOM is replaced with the PMAT-701 fix references.
`docs/specifications/aprender-train/distillation-epic-spec.md`: new §86 amendment documenting the 5-whys, the two fixed bugs, the new falsifier `F-DISTILL-V2-001-TEACHER-DIVERGENCE` (preflight reject when STEPS>=5000 and teacher==student without an explicit override), and the discharge of the prior Stage D 50K + 10K runs as no-KD. Spec version bumped 1.1.0 → 1.2.0.

Why this matters

The Phase 4 Stage D 50K (25 h) and 10K (5 h) runs in 2026-05-20/21 silently inherited the Phase 3 smoke workaround of TEACHER_REPO == STUDENT_INIT == 0.5B. KD signal was ~zero (KL between identical distributions); 30 hours of compute fine-tuned the base model toward gibberish on a synthetic-ish corpus. The §86 amendment makes that mistake hard to repeat.

Test plan

`bash -n scripts/dispatch-distill-phase-3-gx10.sh` — syntax-ok
Spec markdown renders cleanly
CI: `ci / gate` + `workspace-test` green
Operator: when ready, re-dispatch Stage D with the new defaults — `STEPS=50000 ./scripts/dispatch-distill-phase-3-gx10.sh`. Compute estimate ~50 h on GB10 (slower than the previous 0.5B-teacher run because realizar's 7B forward is heavier; this is acceptable given the falsifier-quality gain).

🤖 Generated with Claude Code

…(PMAT-701 follow-up) The Phase 4 Stage D 50K + 10K runs (2026-05-20/21) silently inherited the Phase 3 smoke workaround of TEACHER_REPO == STUDENT_INIT == 0.5B. Result: no KD signal, 30 h of compute that fine-tuned the base model toward gibberish on a small corpus. Documented in `evidence/distill-7b-teacher-loadtest-gx10/findings.json` + this spec amendment. Now that PMAT-701 Bug A (PR #1863) and Bug B (PR #1869) have landed, the 7B Q4K teacher is feasible on Grace Blackwell GB10: * PR #1863: trueno-gpu allocator autodetects unified-memory devices (Grace, Tegra) and routes to cuMemAllocManaged so the full 128 GB pool is reachable. * PR #1869: new RealizarQ4KTeacher keeps Q4K teacher weights quantized on the GPU (no F32 dequant at upload), eliminating the OOM-kill that was killing the first training step. This PR flips the dispatch script's default and codifies the why in spec §86: * `scripts/dispatch-distill-phase-3-gx10.sh` — TEACHER_REPO default changes from `Qwen/Qwen2.5-Coder-0.5B-Instruct` (smoke fallback) to `paiml/qwen2.5-coder-7b-apache-q4k-v1` (the MODEL-1 teacher the spec was designed around). Smoke-only callers override with the env var. * `docs/specifications/aprender-train/distillation-epic-spec.md` — adds §86 documenting the 5-whys, the fix references, and a new falsifier F-DISTILL-V2-001-TEACHER-DIVERGENCE that rejects future Phase-4-class dispatches where teacher == student unless an explicit override is set. * Spec version bumped to 1.2.0 with changelog entry. The §86 amendment also notes that the existing 50K + 10K Stage D runs do NOT count toward AC-DISTILL-003 — they're discharged as no-KD baselines, and a re-dispatched 50K run with the 7B teacher is required for a real Phase 4 verdict. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…g turn Adds a §87 amendment to SPEC-DISTILL-001 documenting the root cause of the PMAT-704 cascade fix: PR #1869 (Bug B / RealizarQ4KTeacher) was a wrong turn — the realizar `_cuda` forward path is CPU-bound and unusable as a distillation teacher on Grace Blackwell GB10. The 7B vocab-aligned 500-step validation hung at step 0 for 1.5 h with GPU at 0% utilization — empirical proof of the defect. The amendment includes: * Full five-whys chain (cuMemAlloc 30 GB ceiling vs phantom OOM-killer SIGKILL on the explicit-managed path), with file/line citations pointing to the CPU-heavy ops in crates/aprender-serve/src/gguf/cuda/cuda.rs:18 * Root cause: conflated two failures, missed the cheap dispatch-flip experiment that would have rejected Bug B's hypothesis in 5 minutes. * Fix references: PR #1879 (PMAT-704) — cuBLAS default, RealizarQ4KTeacher demoted to APR_DISTILL_TEACHER_BACKEND=realizar-q4k opt-in fallback. * Contract changes: new `apr-distill-teacher-backend-selection-v1.yaml`, `cuda-q4k-frozen-teacher-v1.yaml` demoted (not retracted). * Methodology lesson: cheap-experiment-before-design discipline. * Cascade closure table covering PRs #1863, #1869, #1871, #1874, #1877, #1879. Spec version bumped 1.1.0 → 1.3.0 with changelog entries for both §86 (via PR #1871, also pending merge) and §87 (this PR). The amendment notes the §86 cross-reference and explains the order-of-operations in case readers see this on a build of main that predates #1871. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 22, 2026 07:15

noahgift mentioned this pull request May 22, 2026

fix(eval): apr eval no longer reports fake pass@1=1.0 on broken models (PMAT-702) #1874

Closed

6 tasks

Merge branch 'main' into chore/distill-phase4-7b-teacher-default

a693a6d

This was referenced May 22, 2026

docs(spec): SPEC-DISTILL-001 §87 — PMAT-704 post-mortem on Bug B wrong turn #1880

Closed

feat(distill): wire ProgressCallback into Pipeline — close training-monitoring gap (PMAT-705) #1881

Closed

Merge branch 'main' into chore/distill-phase4-7b-teacher-default

e068ddd

noahgift merged commit 97d9b80 into main May 22, 2026
10 checks passed

noahgift deleted the chore/distill-phase4-7b-teacher-default branch May 22, 2026 14:06

noahgift mentioned this pull request May 22, 2026

release: v0.35.0 (subsumes #1873, #1884, #1887, #1890) #1894

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(distill): default to MODEL-1 7B teacher + SPEC-DISTILL-001 §86 (PMAT-701 follow-up)#1871

chore(distill): default to MODEL-1 7B teacher + SPEC-DISTILL-001 §86 (PMAT-701 follow-up)#1871
noahgift merged 3 commits into
mainfrom
chore/distill-phase4-7b-teacher-default

noahgift commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 22, 2026

Summary

Changes

Why this matters

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant