Skip to content

fix(distill): re-land APR_DISTILL_MAX_STEPS smoke gate dropped by #1888 (PMAT-706)#1911

Merged
noahgift merged 2 commits into
mainfrom
fix/distill-smoke-gate-reland-pmat-706
Jun 10, 2026
Merged

fix(distill): re-land APR_DISTILL_MAX_STEPS smoke gate dropped by #1888 (PMAT-706)#1911
noahgift merged 2 commits into
mainfrom
fix/distill-smoke-gate-reland-pmat-706

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Problem

#1888 (`52650c60c`, "feat(distill): APR_DISTILL_MAX_STEPS smoke-validation mode (PMAT-706)") announced the 60-second distillation smoke gate, but the squash landed only `contracts/apr-distill-smoke-validation-v1.yaml`. The `pipeline.rs` implementation the commit message described in detail never existed in source:

```
$ git show 52650c6 --stat
contracts/apr-distill-smoke-validation-v1.yaml | 177 +++++++++++++++++++++++++
1 file changed, 177 insertions(+)
$ git grep APR_DISTILL_MAX_STEPS # HEAD before this PR
CHANGELOG.md, the contract, the resume doc — zero .rs files
```

So the fail-fast gate the entire distillation critical path is supposed to run before a 30-50h Stage D job was absent. (Surfaced by a repo-wide roadmap analysis as the first broken link in the model-science critical path.)

Fix — implement against the contract (the behavioral source of truth)

  • `Pipeline.smoke_max_steps: Option` — read once from `APR_DISTILL_MAX_STEPS` at construction (CLI dispatch picks it up unchanged), overridable via `with_max_steps()` so falsifiers never touch process env (avoids the env-var-race flake class this repo has fought).
  • `train()` — early-breaks the inner loop after exactly N steps (`step >= N`, no off-by-one), rejects `N=0` with a clear `ConfigValue` error, guards the 0-steps load-failure case, and prints the two `[SMOKE]` summary lines only on the early-break path.
  • `execute()` — skips export in smoke mode: no `model.safetensors`/`output.apr`, so `apr eval`/`apr run` can't consume a smoke result by accident.
  • `smoke_summary_lines()` — pure/total helper so the exact wire format is unit-tested without capturing stdout.
  • `scripts/dispatch-distill-stage-d.sh` — forwards `APR_DISTILL_MAX_STEPS` across the ssh/`env` boundary (+ echo, JSON manifest, usage). The documented `APR_DISTILL_MAX_STEPS=10 ./scripts/dispatch-distill-stage-d.sh` was previously a silent no-op over ssh.

Verification

Falsifier (contract `evidence:` path) Checks
`pmat_706_smoke` FT-SMOKE-001 — exactly N steps (not the full epoch budget)
`pmat_706_no_regression` FT-SMOKE-002 — unset = full schedule + export still happens
`pmat_706_summary_format` FT-SMOKE-003 — two grep/awk-parseable lines
`pmat_706_no_output_in_smoke` FT-SMOKE-004 — no artifact written
`pmat_706_zero_steps_is_clear_error` KANI-SMOKE-002 analog
`pmat_706_parse_max_steps_value` env-var precondition (invalid → unset)
  • ✅ 70/70 `aprender-train-distill` lib tests pass
  • ✅ `pipeline.rs` clippy-clean; `cargo fmt --all --check` clean
  • ✅ `pv validate contracts/apr-distill-smoke-validation-v1.yaml` → 0 errors, 0 warnings
  • ✅ 1392 `aprender-contracts` tests pass
  • ✅ `bashrs lint` 0 errors; `bash -n` OK

Post-merge verify (per `feedback_squash_merge_post_verify`):
`git show origin/main:crates/aprender-train-distill/src/pipeline.rs | grep APR_DISTILL_MAX_STEPS` must return ≥1.

🤖 Generated with Claude Code

… (PMAT-706)

#1888 (commit 52650c6) announced the PMAT-706 smoke-validation mode but the
squash landed ONLY contracts/apr-distill-smoke-validation-v1.yaml — the
pipeline.rs implementation described in the commit message never existed in
source (`git grep APR_DISTILL_MAX_STEPS` at HEAD hit only CHANGELOG, the
contract, and the resume doc; the commit's --stat was "1 file changed"). The
60s fail-fast gate the entire distillation critical path is supposed to run
before a 30-50h Stage D job was therefore absent.

Re-land against the existing contract (the behavioral source of truth):

- `Pipeline` gains a `smoke_max_steps: Option<u64>` field, read once from
  `APR_DISTILL_MAX_STEPS` at construction (so the CLI dispatch picks it up
  with no changes) and overridable via `with_max_steps()` — a test seam so
  the falsifiers never touch process env (this repo has repeatedly been bitten
  by env-var races in parallel tests).
- `train()`: early-breaks the inner loop after exactly N steps (step >= N, no
  off-by-one), rejects N=0 with a clear ConfigValue error, guards the
  zero-steps-completed load-failure case, and prints the two `[SMOKE]` summary
  lines ONLY on the early-break path.
- `execute()`: skips the export side-effect in smoke mode — no
  model.safetensors / output.apr so `apr eval`/`apr run` can't consume a
  smoke result by accident.
- `smoke_summary_lines()` factored out as a pure, total function so the exact
  wire format is unit-tested without capturing stdout.
- `scripts/dispatch-distill-stage-d.sh`: forward APR_DISTILL_MAX_STEPS across
  the ssh/`env` boundary, echo it, and add it to the JSON manifest +
  usage header — the documented `APR_DISTILL_MAX_STEPS=10 ./scripts/...` was
  previously a silent no-op over ssh.

Falsifiers (all passing, paths match the contract's `evidence:` exactly):
  pipeline::tests::pmat_706_smoke               (FT-SMOKE-001: exactly N steps)
  pipeline::tests::pmat_706_no_regression       (FT-SMOKE-002: unset = full run + export)
  pipeline::tests::pmat_706_summary_format      (FT-SMOKE-003: two parseable lines)
  pipeline::tests::pmat_706_no_output_in_smoke  (FT-SMOKE-004: no artifact written)
  pipeline::tests::pmat_706_zero_steps_is_clear_error  (KANI-SMOKE-002 analog)
  pipeline::tests::pmat_706_parse_max_steps_value      (env-var precondition)

70/70 distill lib tests pass, pipeline.rs clippy-clean, pv validate OK,
1392 aprender-contracts tests pass. Found via the repo-roadmap analysis
(critical-path first link). See feedback_squash_merge_post_verify.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) June 10, 2026 11:56
@noahgift noahgift merged commit ff37aa5 into main Jun 10, 2026
10 checks passed
@noahgift noahgift deleted the fix/distill-smoke-gate-reland-pmat-706 branch June 10, 2026 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant