Skip to content

falsify(apr-cli-distill-train-v1): TRAIN-005 PARTIAL_ALGORITHM_LEVEL — precompute byte-determinism#1438

Merged
noahgift merged 5 commits into
mainfrom
falsify/apr-distill-train-005-precompute-determinism
May 3, 2026
Merged

falsify(apr-cli-distill-train-v1): TRAIN-005 PARTIAL_ALGORITHM_LEVEL — precompute byte-determinism#1438
noahgift merged 5 commits into
mainfrom
falsify/apr-distill-train-005-precompute-determinism

Conversation

@noahgift

@noahgift noahgift commented May 3, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes contract drift between task #195 (claimed FALSIFY-APR-DISTILL-TRAIN-005 PARTIAL_ALGORITHM_LEVEL on 2026-04-30) and the YAML which had no algorithm_evidence block. Adds 2 unit tests + contract amendment.

Contract: apr-cli-distill-train-v1.yaml → TRAIN-005 gains algorithm_evidence (status PARTIAL_ALGORITHM_LEVEL, last_verified 2026-05-03).

Falsifier tests (both pass)

  • falsify_apr_distill_train_005_precompute_is_byte_deterministic — local-teacher branch: 2 precompute runs against an identical fake teacher dir produce byte-identical manifest.json.
  • falsify_apr_distill_train_005_precompute_remote_teacher_stub_is_deterministic — remote-stub branch: 2 runs against an unresolved HF model_id produce byte-identical pending_download manifest.

Five Whys

  1. Why bind TRAIN-005 now? Per spec §42.7 next-session pickup (b); task apr tensors displays only 100 tensors from APR v2 file containing 291 #195 claimed PARTIAL but the YAML had no algorithm_evidence — contract drift to close.
  2. Why algorithm-bind, not full discharge? run_config_precompute is a stub today (no real teacher forward). DISCHARGED requires the missing real-training implementation per §35; separate larger PR per Toyota Way.
  3. Why two tests, not one? Local-teacher and remote-stub take different code paths (inspect_dir_files vs pending_download). Both must be deterministic.
  4. Why test the manifest bytes, not logits? Current impl emits NO logits. The manifest IS the only deterministic output today. When real-forward is added, the test extends to assert byte-identical logits files (DISCHARGED gate).
  5. Why bounded? ~70 LOC test scaffolding + 13 LOC contract amendment. No production code change. Coverage uplift only.

Net effect on shipping

  • Coverage tally: 15+33 → 15+34 (+1 PARTIAL_ALGORITHM_LEVEL closed)
  • MODEL-2 ship %: 54% → 55%
  • Contract drift: closed between task list and YAML
  • pv validate: exits 0

Test plan

  • cargo test -p apr-cli --lib falsify_apr_distill_train_005 (2 pass)
  • pv validate contracts/apr-cli-distill-train-v1.yaml exit 0
  • CI green on required gates

🤖 Generated with Claude Code

…— precompute byte-determinism

Adds 2 unit tests in distill_include_01.rs::tests that algorithm-bind
FALSIFY-APR-DISTILL-TRAIN-005 (precompute is byte-deterministic):

- falsify_apr_distill_train_005_precompute_is_byte_deterministic:
  local-teacher branch, two precompute runs over identical fake
  teacher dir produce byte-identical manifest.json.
- falsify_apr_distill_train_005_precompute_remote_teacher_stub_is_deterministic:
  remote-stub branch, two runs against unresolved HF model_id produce
  byte-identical pending_download manifest.json.

Contract apr-cli-distill-train-v1.yaml: TRAIN-005 gains
algorithm_evidence (status: PARTIAL_ALGORITHM_LEVEL, last_verified
2026-05-03, two test_locations + notes documenting that DISCHARGED
requires real teacher forward with logits-on-disk).

Five Whys
1. Why bind TRAIN-005 now? Per spec §42.7 next-session pickup (b);
   TRAIN-005 was claimed PARTIAL via task #195 but the YAML had no
   algorithm_evidence — this is contract drift to close.
2. Why algorithm-bind, not full discharge? run_config_precompute is
   currently a stub (writes a manifest, no real teacher forward).
   Discharging requires the missing real-training implementation per
   §35; that's a separate, larger PR. Per Toyota Way, focused PRs.
3. Why two tests, not one? The local-teacher and remote-stub branches
   take different code paths (inspect_dir_files vs pending_download
   stub). Both must be deterministic.
4. Why test the manifest bytes, not just diff some logits? The
   current impl emits NO logits — it's a stub. The manifest IS the
   only deterministic output today. When real-forward is added, the
   test extends to assert byte-identical logits files alongside the
   manifest (DISCHARGED gate).
5. Why bounded? ~70 LOC test scaffolding + 13 LOC contract
   amendment. No production code change. Coverage uplift only.

Net effect
- Coverage tally: 15+33 → 15+34 (+1 PARTIAL_ALGORITHM_LEVEL).
- MODEL-2 ship %: 54% → 55% (one more falsifier locked in).
- Contract drift between task list (#195) and YAML closed.
- pv validate exits 0.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 3, 2026
…— train cache-resume idempotency (#1439)

Adds 2 unit tests in distill_include_01.rs::tests that algorithm-bind
FALSIFY-APR-DISTILL-TRAIN-006 (stage train can resume from precompute
cache):

- falsify_apr_distill_train_006_train_errors_without_precompute_cache:
  negative half — stage train MUST error when manifest.json is absent;
  asserts CliError::ValidationFailed with "Precompute" in message.
- falsify_apr_distill_train_006_train_does_not_error_when_cache_present:
  positive half — after precompute drops manifest.json, stage train
  MUST NOT error with the cache-missing message (proves the manifest
  is actually consulted, not just stat-checked).

Contract apr-cli-distill-train-v1.yaml: TRAIN-006 gains
algorithm_evidence (status: PARTIAL_ALGORITHM_LEVEL, last_verified
2026-05-03, two test_locations + notes documenting that DISCHARGED
requires real teacher forward + real student forward that actually
loads logits-on-disk and compares to a baseline that re-ran precompute
proving no recomputation happened).

Five Whys
1. Why bind TRAIN-006 now? Spec §42.7 (b) MODEL-2 distill-train track;
   task #196 claimed PARTIAL on 2026-04-30 but the YAML had no
   algorithm_evidence — same pattern of contract drift as TRAIN-005
   (PR #1438).
2. Why two halves, not one? The cache-resume invariant has two failure
   modes: (a) train silently skips manifest check and runs anyway,
   (b) train ignores manifest after seeing it. Both must be tested
   for the gate to be meaningful.
3. Why test the error message specifically? Per
   feedback_apr_trace_not_eprintln, surfacing the "Precompute"
   keyword in the error message is the user-facing contract — if it
   regresses to "missing file" with no remediation hint, users won't
   know what to run next.
4. Why not test logits content equivalence? Real logits-on-disk
   require real teacher forward, which is the §35 missing
   real-training implementation. Algorithm-level discharge holds
   until that lands.
5. Why bounded? ~80 LOC test scaffolding + 14 LOC contract
   amendment. No production code change. Coverage uplift only.

Net effect
- Coverage tally: 15+34 → 15+35 (+1 PARTIAL_ALGORITHM_LEVEL).
- MODEL-2 ship %: 55% → 56% (cache-resume idempotency locked in).
- Stacks on PR #1438 (TRAIN-005); no merge conflict expected.
- pv validate exits 0.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit afb1d25 into main May 3, 2026
10 checks passed
@noahgift noahgift deleted the falsify/apr-distill-train-005-precompute-determinism branch May 3, 2026 22:05
noahgift added a commit that referenced this pull request May 3, 2026
…osine helper for FALSIFY-CPU-GPU-005 part b (#1441)

Canonical record of today's split-track cycle (PRs #1438-#1440).
Maintains the §41/§42 amendment cadence — each /loop iteration that
lands ≥3 PRs gets a single audit story.

Chain landed:
- #1438: FALSIFY-APR-DISTILL-TRAIN-005 PARTIAL_ALGORITHM_LEVEL
  (precompute byte-determinism, 2 unit tests, local + remote-stub
  branches)
- #1439: FALSIFY-APR-DISTILL-TRAIN-006 PARTIAL_ALGORITHM_LEVEL
  (train cache-resume idempotency, 2 unit tests, negative + positive
  halves)
- #1440: cpu_vs_gpu_cosine_similarity helper at module scope + 3 tests
  (parallel=1, orthogonal=0, fail-closed; cosine math now callable
  without --features cuda for the future part b wgpu cosine gate)

§43 documents: what landed (table), coverage flips (TRAIN-005, TRAIN-006
unbound → PARTIAL_ALGORITHM_LEVEL), why for MODEL-1+MODEL-2 (parallel
contract drift closure + part b infrastructure), Five Whys, ship %
effects (MODEL-1 87→88, MODEL-2 54→56), and next-session pickup
options (CPU-GPU-005 part b OR distill-train real implementation).

Coverage tally: 15+33 → 15+35 (+2 PARTIAL_ALGORITHM_LEVEL closed).

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 3, 2026
…VEL + TRAIN-009 BLOCKER_FIXTURE_ABSENT (#1443)

Closes the last three contract drifts in apr-cli-distill-train-v1
(tasks #218 + #247 claimed PARTIAL but YAML had no algorithm_evidence
blocks). Same fix-pattern as TRAIN-005/006 (PRs #1438, #1439).

TRAIN-007 (pv validate exits 0) — PARTIAL_ALGORITHM_LEVEL
  Live verification 2026-05-04 on this branch: pv validate exits 0
  with "0 error(s), 0 warning(s)". Meta-discharge via the pre-commit
  hook + manual operator runs that have validated every amendment
  since v1.0.0 PROPOSED.

TRAIN-008 (3-surface drift cli + registry + test) — PARTIAL_ALGORITHM_LEVEL
  Live verification 2026-05-04: cargo test -p apr-cli --test
  cli_commands registered_commands → "1 passed; 0 failed". The
  test_no_unregistered_commands integration test walks the live clap
  parser and enforces every Subcommand variant matches apr-cli-commands-v1.yaml,
  binding the invariant from feedback_cli_subcommand_three_surface_drift.

TRAIN-009 (end-to-end smoke beats from-scratch baseline) — BLOCKER_FIXTURE_ABSENT
  Honest blocker note: discharge requires the missing real-training
  implementation per §35 (apr distill --stage train is currently a
  stub). Without gradient descent there is no val_loss to compare.
  Path to DISCHARGED documented in the algorithm_evidence notes.

Five Whys
1. Why bind these now? Tasks #218/#247 claimed PARTIAL on 2026-04-30
   but the YAML had no algorithm_evidence. Same drift pattern as
   TRAIN-005/006 (PRs #1438, #1439) — closing it gives the contract a
   complete provability surface.
2. Why mark TRAIN-009 BLOCKER_FIXTURE_ABSENT instead of unbound? It
   has a clear test design (tests/distill_smoke.rs) and a clear
   blocker (real training implementation per §35). Marking it as a
   blocker rather than leaving it untyped makes the dependency
   explicit so a future PR cannot accidentally promote it without
   the real-training prerequisite.
3. Why two PARTIAL + one BLOCKER, not three PARTIAL? PARTIAL implies
   an existing test exercises the invariant. TRAIN-009 has no test
   today (no `tests/distill_smoke.rs`) and cannot have one until §35
   lands. Honest classification beats false PARTIAL claims.
4. Why all three in one PR? They're the last three falsifiers in this
   contract; bundling them produces a single audit story (9/9
   falsifiers now have status). Per Toyota Way each falsifier is a
   distinct binding decision but they share the same review surface.
5. Why bounded? ~45 LOC of YAML, no production code change, no new
   tests (uses existing cargo test + pv validate). pv validate exits
   0 verified locally.

Net effect
- All 9 TRAIN-* falsifiers in apr-cli-distill-train-v1 now have
  algorithm_evidence blocks (8× PARTIAL_ALGORITHM_LEVEL +
  1× BLOCKER_FIXTURE_ABSENT).
- Contract drift between task list (#218/#247) and YAML closed.
- Coverage tally: 15+35 → 15+37 (+2 PARTIAL_ALGORITHM_LEVEL closed,
  TRAIN-009 explicitly blocked not counted).
- MODEL-2 ship %: 56% → 57% (last falsifier-binding gap closed for
  the distill contract; real-training implementation per §35 is the
  only remaining MODEL-2 lever).
- pv validate exits 0.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant