falsify(apr-cli-distill-train-v1): TRAIN-006 PARTIAL_ALGORITHM_LEVEL — train cache-resume idempotency by noahgift · Pull Request #1439 · paiml/aprender

noahgift · 2026-05-03T19:46:39Z

Summary

Closes contract drift between task #196 (claimed FALSIFY-APR-DISTILL-TRAIN-006 PARTIAL_ALGORITHM_LEVEL on 2026-04-30) and the YAML which had no `algorithm_evidence` block. Mirrors PR #1438's pattern for TRAIN-005.

Contract: `apr-cli-distill-train-v1.yaml` → TRAIN-006 gains `algorithm_evidence` (status `PARTIAL_ALGORITHM_LEVEL`, last_verified 2026-05-03).

Falsifier tests (both pass)

`falsify_apr_distill_train_006_train_errors_without_precompute_cache` — negative half: stage train MUST error when `manifest.json` is absent; asserts `CliError::ValidationFailed` with "Precompute" in message.
`falsify_apr_distill_train_006_train_does_not_error_when_cache_present` — positive half: after precompute drops `manifest.json`, stage train MUST NOT error with the cache-missing message (proves manifest is actually consulted).

Five Whys

Why bind now? Spec §42.7 (b) MODEL-2 distill-train track; task Qwen2.5-Coder-0.5B MVP certification blocked: 4 conversion pipeline defects #196 claimed PARTIAL but YAML had no algorithm_evidence — same drift pattern as TRAIN-005 (falsify(apr-cli-distill-train-v1): TRAIN-005 PARTIAL_ALGORITHM_LEVEL — precompute byte-determinism #1438).
Why two halves? Cache-resume has two failure modes: (a) train silently skips manifest check, (b) train ignores manifest after seeing it. Both must be tested.
Why test the error message? Per feedback_apr_trace_not_eprintln, surfacing the "Precompute" keyword IS the user-facing contract — if it regresses to "missing file", users don't know what to run next.
Why not test logits equivalence? Real logits-on-disk require real teacher forward (§35 missing implementation). Algorithm-level discharge holds until that lands.
Why bounded? ~80 LOC test scaffolding + 14 LOC contract amendment. No production code change.

Net effect

Coverage tally: 15+34 → 15+35 (+1 PARTIAL_ALGORITHM_LEVEL)
MODEL-2 ship %: 55% → 56%
Stacks on PR falsify(apr-cli-distill-train-v1): TRAIN-005 PARTIAL_ALGORITHM_LEVEL — precompute byte-determinism #1438; no merge conflict expected (independent contract block)
`pv validate` exits 0

Test plan

`cargo test -p apr-cli --lib falsify_apr_distill_train_006` (2 pass)
`pv validate contracts/apr-cli-distill-train-v1.yaml` exit 0
CI green on required gates

🤖 Generated with Claude Code

…— train cache-resume idempotency Adds 2 unit tests in distill_include_01.rs::tests that algorithm-bind FALSIFY-APR-DISTILL-TRAIN-006 (stage train can resume from precompute cache): - falsify_apr_distill_train_006_train_errors_without_precompute_cache: negative half — stage train MUST error when manifest.json is absent; asserts CliError::ValidationFailed with "Precompute" in message. - falsify_apr_distill_train_006_train_does_not_error_when_cache_present: positive half — after precompute drops manifest.json, stage train MUST NOT error with the cache-missing message (proves the manifest is actually consulted, not just stat-checked). Contract apr-cli-distill-train-v1.yaml: TRAIN-006 gains algorithm_evidence (status: PARTIAL_ALGORITHM_LEVEL, last_verified 2026-05-03, two test_locations + notes documenting that DISCHARGED requires real teacher forward + real student forward that actually loads logits-on-disk and compares to a baseline that re-ran precompute proving no recomputation happened). Five Whys 1. Why bind TRAIN-006 now? Spec §42.7 (b) MODEL-2 distill-train track; task #196 claimed PARTIAL on 2026-04-30 but the YAML had no algorithm_evidence — same pattern of contract drift as TRAIN-005 (PR #1438). 2. Why two halves, not one? The cache-resume invariant has two failure modes: (a) train silently skips manifest check and runs anyway, (b) train ignores manifest after seeing it. Both must be tested for the gate to be meaningful. 3. Why test the error message specifically? Per feedback_apr_trace_not_eprintln, surfacing the "Precompute" keyword in the error message is the user-facing contract — if it regresses to "missing file" with no remediation hint, users won't know what to run next. 4. Why not test logits content equivalence? Real logits-on-disk require real teacher forward, which is the §35 missing real-training implementation. Algorithm-level discharge holds until that lands. 5. Why bounded? ~80 LOC test scaffolding + 14 LOC contract amendment. No production code change. Coverage uplift only. Net effect - Coverage tally: 15+34 → 15+35 (+1 PARTIAL_ALGORITHM_LEVEL). - MODEL-2 ship %: 55% → 56% (cache-resume idempotency locked in). - Stacks on PR #1438 (TRAIN-005); no merge conflict expected. - pv validate exits 0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…osine helper for FALSIFY-CPU-GPU-005 part b (#1441) Canonical record of today's split-track cycle (PRs #1438-#1440). Maintains the §41/§42 amendment cadence — each /loop iteration that lands ≥3 PRs gets a single audit story. Chain landed: - #1438: FALSIFY-APR-DISTILL-TRAIN-005 PARTIAL_ALGORITHM_LEVEL (precompute byte-determinism, 2 unit tests, local + remote-stub branches) - #1439: FALSIFY-APR-DISTILL-TRAIN-006 PARTIAL_ALGORITHM_LEVEL (train cache-resume idempotency, 2 unit tests, negative + positive halves) - #1440: cpu_vs_gpu_cosine_similarity helper at module scope + 3 tests (parallel=1, orthogonal=0, fail-closed; cosine math now callable without --features cuda for the future part b wgpu cosine gate) §43 documents: what landed (table), coverage flips (TRAIN-005, TRAIN-006 unbound → PARTIAL_ALGORITHM_LEVEL), why for MODEL-1+MODEL-2 (parallel contract drift closure + part b infrastructure), Five Whys, ship % effects (MODEL-1 87→88, MODEL-2 54→56), and next-session pickup options (CPU-GPU-005 part b OR distill-train real implementation). Coverage tally: 15+33 → 15+35 (+2 PARTIAL_ALGORITHM_LEVEL closed). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…VEL + TRAIN-009 BLOCKER_FIXTURE_ABSENT (#1443) Closes the last three contract drifts in apr-cli-distill-train-v1 (tasks #218 + #247 claimed PARTIAL but YAML had no algorithm_evidence blocks). Same fix-pattern as TRAIN-005/006 (PRs #1438, #1439). TRAIN-007 (pv validate exits 0) — PARTIAL_ALGORITHM_LEVEL Live verification 2026-05-04 on this branch: pv validate exits 0 with "0 error(s), 0 warning(s)". Meta-discharge via the pre-commit hook + manual operator runs that have validated every amendment since v1.0.0 PROPOSED. TRAIN-008 (3-surface drift cli + registry + test) — PARTIAL_ALGORITHM_LEVEL Live verification 2026-05-04: cargo test -p apr-cli --test cli_commands registered_commands → "1 passed; 0 failed". The test_no_unregistered_commands integration test walks the live clap parser and enforces every Subcommand variant matches apr-cli-commands-v1.yaml, binding the invariant from feedback_cli_subcommand_three_surface_drift. TRAIN-009 (end-to-end smoke beats from-scratch baseline) — BLOCKER_FIXTURE_ABSENT Honest blocker note: discharge requires the missing real-training implementation per §35 (apr distill --stage train is currently a stub). Without gradient descent there is no val_loss to compare. Path to DISCHARGED documented in the algorithm_evidence notes. Five Whys 1. Why bind these now? Tasks #218/#247 claimed PARTIAL on 2026-04-30 but the YAML had no algorithm_evidence. Same drift pattern as TRAIN-005/006 (PRs #1438, #1439) — closing it gives the contract a complete provability surface. 2. Why mark TRAIN-009 BLOCKER_FIXTURE_ABSENT instead of unbound? It has a clear test design (tests/distill_smoke.rs) and a clear blocker (real training implementation per §35). Marking it as a blocker rather than leaving it untyped makes the dependency explicit so a future PR cannot accidentally promote it without the real-training prerequisite. 3. Why two PARTIAL + one BLOCKER, not three PARTIAL? PARTIAL implies an existing test exercises the invariant. TRAIN-009 has no test today (no `tests/distill_smoke.rs`) and cannot have one until §35 lands. Honest classification beats false PARTIAL claims. 4. Why all three in one PR? They're the last three falsifiers in this contract; bundling them produces a single audit story (9/9 falsifiers now have status). Per Toyota Way each falsifier is a distinct binding decision but they share the same review surface. 5. Why bounded? ~45 LOC of YAML, no production code change, no new tests (uses existing cargo test + pv validate). pv validate exits 0 verified locally. Net effect - All 9 TRAIN-* falsifiers in apr-cli-distill-train-v1 now have algorithm_evidence blocks (8× PARTIAL_ALGORITHM_LEVEL + 1× BLOCKER_FIXTURE_ABSENT). - Contract drift between task list (#218/#247) and YAML closed. - Coverage tally: 15+35 → 15+37 (+2 PARTIAL_ALGORITHM_LEVEL closed, TRAIN-009 explicitly blocked not counted). - MODEL-2 ship %: 56% → 57% (last falsifier-binding gap closed for the distill contract; real-training implementation per §35 is the only remaining MODEL-2 lever). - pv validate exits 0. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 3, 2026 19:46

noahgift added 2 commits May 3, 2026 22:13

Merge branch 'main' into falsify/apr-distill-train-006-cache-resume

d9dd303

Merge branch 'main' into falsify/apr-distill-train-006-cache-resume

8d6c20d

noahgift mentioned this pull request May 3, 2026

spec(ship-two-models): v2.88.0 — §43 distill-train algorithm-bind + cosine helper for FALSIFY-CPU-GPU-005 part b #1441

Merged

1 task

noahgift merged commit 8b579db into main May 3, 2026
10 checks passed

noahgift deleted the falsify/apr-distill-train-006-cache-resume branch May 3, 2026 21:08

noahgift mentioned this pull request May 3, 2026

falsify(apr-cli-distill-train-v1): TRAIN-007/008 PARTIAL_ALGORITHM_LEVEL + TRAIN-009 BLOCKER_FIXTURE_ABSENT #1443

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

falsify(apr-cli-distill-train-v1): TRAIN-006 PARTIAL_ALGORITHM_LEVEL — train cache-resume idempotency#1439

falsify(apr-cli-distill-train-v1): TRAIN-006 PARTIAL_ALGORITHM_LEVEL — train cache-resume idempotency#1439
noahgift merged 3 commits into
mainfrom
falsify/apr-distill-train-006-cache-resume

noahgift commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 3, 2026

Summary

Falsifier tests (both pass)

Five Whys

Net effect

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant