contract(apr-pretrain-arch-polymorphic-v1): v1.3 → v1.4 — bind FALSIFY-APR-PRETRAIN-INIT-CUDA-001 + drift-prevention test by noahgift · Pull Request #1502 · paiml/aprender

noahgift · 2026-05-05T05:34:58Z

Summary

Drift-prevention bind for `FALSIFY-APR-PRETRAIN-INIT-CUDA-001`. Pre-this-bump, the falsifier id was REFERENCED in the v1.2.0 changelog and verification_summary BUT was not formally registered as a falsification_test entry. The fail-fast guard at `crates/apr-cli/src/commands/pretrain.rs::drive_real` returns Err with this id when `init_arch.is_some() && device.is_cuda()`, but no test pinned the citation. A future refactor could silently drop it → §28 SHIP-007 "silent gibberish" defect class.

What ships

Contract apr-pretrain-arch-polymorphic-v1 v1.3.0 → v1.4.0 FUNCTIONAL:

Adds FALSIFY-APR-PRETRAIN-INIT-CUDA-001 as formal falsification_test (PARTIAL_ALGORITHM_LEVEL).
10 → 11 falsifiers, all PASS.

Source:

Extracted error message into `pub(crate) const FALSIFY_APR_PRETRAIN_INIT_CUDA_001_MSG` so the const itself can be unit-tested without a `--features cuda` build.

Test:

`drive_real_cuda_init_path_fail_fasts_with_falsifier_citation` pins:
- (a) falsifier id appears
- (b) "not yet wired for --device cuda" phrase appears
- (c) "step 5f.5 follow-up" reference appears
- (d) both workarounds (`--device cpu` OR omit `--init`) are suggested

Why drift-prevention

Promotion of CUDA-001 to DISCHARGED requires §50.4 step 5f.5 LIVE (CUDA wireup landed + GPU smoke). That's multi-PR scope (refactor `upload_blocks` + new constructor + wire CLI). Until then, the fail-fast guard is the only safety. Without a formal falsifier + test, that guard silently regresses if anyone refactors `drive_real`.

Five Whys

Why bind a free-floating falsifier id? Because the id is ALREADY cited in source + contract changelog; without a formal falsification_test entry, drift between code and contract is unobservable.
Why now (during the 5g.1 wait)? Productive use of the 17hr corpus-retokenize wall; small scope, doesn't block 5g.1, doesn't move ship-% but reduces operational risk.
Why a const-based test instead of an end-to-end runtime test? Runtime test would need `--features cuda` build + CUDA-capable host. Const-extraction works on every CI matrix entry.
Why not also delete the guard once 5f.5 lands? The guard logic is still needed at the dispatch layer; the BODY of the guard changes (currently fail-fast, post-5f.5 calls `build_shared_cuda_trainer_with_init`). The const can be retired then.
Why isn't this work in §57? §57 is reserved for the next 5g.x finding. CUDA-001 binding is hygiene — no spec amendment needed.

Net effects

Contract v1.3.0 → v1.4.0 FUNCTIONAL (11 falsifiers, all PASS).
1 new drift-prevention test.
1 source const extraction.
MODEL-1 ship % unchanged at 91%; MODEL-2 ship % unchanged at 57%.

Test plan

`pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml` exits 0
PMAT pre-commit quality gates pass
New drift-prevention test passes
CI gate green (workspace-test, ci/gate)
Auto-merge fires on green CI

🤖 Generated with Claude Code

…Y-APR-PRETRAIN-INIT-CUDA-001 + drift-prevention test Pre-this-bump, the falsifier id `FALSIFY-APR-PRETRAIN-INIT-CUDA-001` was REFERENCED in the v1.2.0 changelog and verification_summary BUT was not formally registered as a falsification_test entry. The fail-fast guard at `crates/apr-cli/src/commands/pretrain.rs::drive_real` (post-#1494 5f.4 wireup) returns Err with this id when `init_arch.is_some() && device.is_cuda()`, but no test pinned the citation. A future refactor could silently drop the citation OR let CUDA + --init fall through → §28 SHIP-007 "silent gibberish" defect class. ## What ships Contract apr-pretrain-arch-polymorphic-v1 v1.3.0 → v1.4.0 FUNCTIONAL: - Adds FALSIFY-APR-PRETRAIN-INIT-CUDA-001 as formal falsification_test (PARTIAL_ALGORITHM_LEVEL). - 10 → 11 falsifiers, all PASS. Source: - Extracted error message into `pub(crate) const FALSIFY_APR_PRETRAIN_INIT_CUDA_001_MSG` so the const itself can be unit-tested without a `--features cuda` build. Test: - `drive_real_cuda_init_path_fail_fasts_with_falsifier_citation` pins: (a) falsifier id appears (b) "not yet wired for --device cuda" phrase appears (c) "step 5f.5 follow-up" reference appears (d) both workarounds (--device cpu OR omit --init) are suggested ## Why drift-prevention matters Promotion of CUDA-001 to DISCHARGED requires §50.4 step 5f.5 LIVE (CUDA wireup landed + GPU smoke). That's multi-PR scope (refactor upload_blocks + new constructor + wire CLI). Until then, the fail-fast guard is the only safety. Without a formal falsifier + test, that guard silently regresses if anyone refactors drive_real. ## Net effects - Contract apr-pretrain-arch-polymorphic-v1 v1.3.0 → v1.4.0 FUNCTIONAL. - 11 falsifiers total (10 → 11), all PASS. - 1 new drift-prevention test. - 1 source const extraction (lockup). - MODEL-1 ship % unchanged at 91%. - MODEL-2 ship % unchanged at 57% until 5g.3. This is a quality-and-hygiene PR while the 5g.1 17hr corpus retokenize runs in the background. Doesn't move ship-% but reduces drift risk + binds a previously-free-floating falsifier reference. Refs: SPEC-SHIP-TWO-001 §50.4 step 5f.5, contracts/apr-pretrain-arch-polymorphic-v1.yaml v1.4.0 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…-005/006 test-reference drift (#1505) Same drift class as PR #1504 caught in apr-pretrain-from-init-v1. Test names cited in v1.1.0 changelog never matched the actual tests PR #1476 authored. Drift survived three intervening bumps (v1.1→v1.2→v1.3→v1.4) because each focused on adding new falsifiers, not auditing existing bindings. ## Drift inventory | Falsifier | v1.4.0 cited test | Exists? | Actual test | |---|---|---|---| | FALSIFY-005 | preflight_qwen_vocab_passes_with_qwen_init | ❌ | preflight_qwen_vocab_passes_with_qwen_target | | FALSIFY-006 | preflight_qwen_vocab_fails_without_init | ❌ | preflight_qwen_vocab_fails_with_llama_target | ## Resolution Update the `test:` field for FALSIFY-005 and FALSIFY-006 to reference the actual tests authored by PR #1476. No falsifier semantics change. No new tests added. ## Verification $ cargo test -p apr-cli --lib -- commands::pretrain::tests::preflight_qwen_vocab_passes_with_qwen_target test result: ok. 1 passed; ... $ cargo test -p apr-cli --lib -- commands::pretrain::tests::preflight_qwen_vocab_fails_with_llama_target test result: ok. 1 passed; ... $ pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml 0 error(s), 0 warning(s) ## Five Whys 1. Why did the drift survive 3 bumps? Each bump (v1.2/v1.3/v1.4) focused on ADDING new content (CUDA-001, relaxed bound, etc.); none audited existing bindings. 2. Why didn't the §50.4 cascade catch this? The cascade authored tests; the contract was authored separately. Names diverged at the boundary; no cross-check landed. 3. Why is this a contract-only fix (no source change)? The tests exist and pass — the IMPL is correct. Only the contract's text reference needed correction. 4. Why bump to v1.5.0 (not v1.4.1 patch)? Same logic as PR #1504: the test-binding INVARIANT (every cited test exists) was broken in v1.4.0. v1.5.0 restores it. 5. Why is this important if the impl is correct? Per feedback_no_guessing.md, contracts that cite non-existent tests are unfalsifiable — future agents reading the contract get a false signal that the falsifier is bound. PV-VER-001 lint will catch this; better to fix it than wait for the lint engine to flag. ## Net effects - Contract v1.4.0 → v1.5.0 FUNCTIONAL. - 11 falsifiers, all PASS — same count, but FALSIFY-005/006 now reference tests that actually exist. - MODEL-1 ship % unchanged at 91%. - MODEL-2 ship % unchanged at 57% until 5g.3. This is hygiene work while 5g.1 (~12hr) corpus retokenize runs. Same defect class as PR #1504; together they close the test-reference drift across both pretrain contracts. Refs: SPEC-SHIP-TWO-001 §50.4 cascade (PRs #1473-#1494, #1502), contracts/apr-pretrain-arch-polymorphic-v1.yaml v1.5.0, contracts/apr-pretrain-from-init-v1.yaml v1.2.0 (PR #1504, sibling fix) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…oughput characterization (#1508) §56 closed with 5g.1 full-corpus retokenization dispatched (PID 2767124, ~17hr wall projected). §57 records the parallel drift-sweep work that landed during the 5g.1 wait + throughput characterization of 5g.1 mid-run. ## Drift sweep (4 PRs) While 5g.1 ran in the background, a sweep of the §50.4 cascade contracts surfaced THE SAME drift class across multiple contracts: cited test names that didn't match what the impl PR actually authored. PR | Contract | v_old → v_new | Drift --- | --- | --- | --- #1502 | apr-pretrain-arch-polymorphic-v1 | v1.3 → v1.4 | CUDA-001 was REFERENCED in changelog but had no formal falsification_test entry #1504 | apr-pretrain-from-init-v1 | v1.1 → v1.2 | 7 of 8 cited test names didn't exist; re-aligned to existing tests #1505 | apr-pretrain-arch-polymorphic-v1 | v1.4 → v1.5 | FALSIFY-005/006 cited names diverged from PR #1476's actual authoring #1506 | apr-cli-tokenize-import-hf-v1 | v1.0 → v1.1 | FALSIFY-001 cited "or equivalent" — no real test name After PR #1506 lands, `pv lint contracts/` reports 0 PV-VER-001 errors across all 870+ contracts. The drift class is fully closed. ## 5g.1 throughput (real-time mid-run) Shard | Closed at | Δ from prev 0 | 07:08 | (start) 1 | 07:24 | 16 min 2 | 07:39 | 15 min 3 | 07:55 | 16 min ... 12 | 10:16 | (in progress) Mean wall: 16.3 min/shard. Linear projection: 57 shards × 16.3 min = 929 min = ~15.5 hr total → ETA ~22:30Z (slightly under §56's 17hr smoke estimate). ## Methodology takeaway When a contract is authored in PR_A alongside its impl, AND the impl's test names are stamped in the contract's `test:` field BEFORE the impl PR finalizes the names, the names diverge at the cascade boundary. Happened in 3 of 4 §50.4 cascade contracts. Prevention rule: when authoring a new contract that cites tests, EITHER reference tests that already exist on main, OR mark them `PENDING_PR_<N>:` with the impl PR ref so PV-VER-001 lint can flag dangling refs at contract-merge time. A future spec amendment could codify a `pv lint --strict-test-binding` enforcement that blocks contract merge when any `test:` field doesn't resolve to an existing test invocation. Out of §57 scope. ## Net effects - Spec v3.01.0 → v3.02.0. - Three contract bumps land cleanly (apr-pretrain-arch-polymorphic-v1 v1.3→v1.4→v1.5, apr-pretrain-from-init-v1 v1.1→v1.2, apr-cli-tokenize-import-hf-v1 v1.0→v1.1). - pv lint contracts/ 0 PV-VER-001 errors across 870+ contracts. - 5g.1 full corpus run progressing at 16.3 min/shard; ETA ~22:30Z. - MODEL-1 ship % unchanged at 91%; MODEL-2 ship % unchanged at 57% until step 5g.3 produces val_loss < 9.38. Refs: SPEC-SHIP-TWO-001 §50.4 cascade, PRs #1502/#1504/#1505/#1506 (drift sweep), apr-cookbook spec v5.1.0 (companion update — operator-facing recipe) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…-003/004/007 drift (round 2) (#1509) * contract(apr-pretrain-arch-polymorphic-v1): v1.5 → v1.6 — fix FALSIFY-003/004/007 drift (round 2) Second-round test-reference drift correction. §57's drift sweep (this contract's v1.4 → v1.5 bump in PR #1505) caught FALSIFY-005/006 but a more thorough audit (cross-referencing every `test:` field against the source-code function-name registry) surfaced three additional dangling references. ## Drift inventory (round 2) | Falsifier | v1.5.0 cited test | Exists? | Actual test | | --- | --- | --- | --- | | 003 | build_transformer_config_qwen_init_matches_constructor | ❌ | build_transformer_config_qwen_init_matches_input | | 004 | transformer::attention::tests::gqa_7_to_1_matches_full_mha | ❌ | transformer::model::tests::falsify_apr_pretrain_arch_004_* | | 007 | build_transformer_config_encoder_init_errors | ❌ | validate_pretrain_init_arch_rejects_encoder | ## Why §57 (PR #1505) didn't catch these §57's grep audited test-name SUFFIXES and FRAGMENTS, which produced false-negatives on: - `_init_matches_constructor` vs `_init_matches_input` — both end in `_matches_<word>` so a fragment grep counted the contract's name as "not dangling" - `transformer::attention::tests::` vs `transformer::model::tests::` — module-path drift not just function-name drift; only fully- qualified path comparison catches this - `_encoder_init_errors` vs `validate_pretrain_init_arch_rejects_encoder` — the contract's name was a guess at the impl name; impl PR #1479 chose a completely different convention ## How this round was found Used a stricter audit: for every `cargo test ... ::tests::<name>` in contracts, grep `fn <name>` in the actual source tree. If the fn doesn't exist, drift. This catches drift that PR #1505's fragment-based audit missed. ## Resolution Update FALSIFY-003/004/007 `test:` fields to the actual function names. No falsifier semantics change. 11 falsifiers all PASS; contract status remains FUNCTIONAL. ## Verification $ cargo test -p aprender-train --lib -- build_transformer_config_qwen_init_matches_input test result: ok. 1 passed $ cargo test -p aprender-train --lib -- falsify_apr_pretrain_arch_004_gqa_7_1_forward_pass_smoke test result: ok. 1 passed $ cargo test -p aprender-train --lib -- validate_pretrain_init_arch_rejects_encoder test result: ok. 1 passed $ pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml 0 error(s), 0 warning(s) ## Five Whys 1. Why did §57's sweep miss these? Used name-fragment grep (`::tests::[a-z_]+`) which counted false-negatives on suffix- close names like `_constructor` ↔ `_input`. 2. Why is module-path drift a separate class? Because grep against the `[a-z_]+` regex captures the FUNCTION name, not the `::module::tests::` path. A function with the right name in the wrong module passes that audit but fails actual test invocation. 3. Why fix in a separate PR rather than amending PR #1505? PR #1505 already merged. Per `feedback_falsifier_first_cascade_pattern.md` the cleanest cadence is one-bump-per-PR. 4. Why bump to v1.6.0? Same pattern as PR #1505's v1.4 → v1.5: the test-binding INVARIANT was broken in v1.5.0 (residual drift) and v1.6.0 restores it. 5. Why now (during 5g.1 wait)? Productive use of the 5g.1 (~10hr remaining) compute-bound idle time. Each drift fix is small (~30 LOC), reduces drift risk for future agents, and restores the falsifier-binding invariant. The alternative (manufacture bigger work) would risk introducing defects the contract base doesn't catch yet. ## Net effects - Contract v1.5.0 → v1.6.0 FUNCTIONAL. - 11 falsifiers, all PASS — same count, but FALSIFY-003/004/007 now reference tests that actually exist. - MODEL-1 ship % unchanged at 91%. - MODEL-2 ship % unchanged at 57% until 5g.3. This is the SECOND round of drift sweep on this contract. Together with PRs #1502/#1504/#1505/#1506 (round 1), all known test-reference drift is closed across the §50.4 cascade contracts. A future spec amendment could codify a `pv lint --strict-test-binding` enforcement that prevents drift at contract-merge time. Refs: SPEC-SHIP-TWO-001 §50.4 cascade, contracts/apr-pretrain-arch-polymorphic-v1.yaml v1.6.0, PR #1505 (round 1 partial fix), PR #1502/#1504/#1506 (sibling fixes) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * contract(apr-pretrain-arch-polymorphic-v1): also fix FALSIFY-001 (round 2.5 — surfaced by PR #1511) Round 2 (initial commit on this branch) fixed FALSIFY-003/004/007. Sub-agent PR #1511 (`pv lint --strict-test-binding`) surfaced a 4th drift in this same contract: FALSIFY-001 cited `qwen2_0_5b_matches_hf_config` → does NOT exist on main. Actual: `qwen2_0_5b_matches_hf_config_2026_05_04` (date-suffix added by impl PR #1474 / commit 9af6e71 — May 4). The earlier round-2 audit (which focused on suffix + module-path drift) didn't catch this because the test name has a DATE-SUFFIX drift class (function name + `_<date>` is a real Rust test, but the contract truncated to the prefix). Updates: - FALSIFY-001 test ref: append `_2026_05_04` suffix. - v1.6.0 changelog updated to record 4 fixes (was 3). - Verified: cargo test qwen2_0_5b_matches_hf_config_2026_05_04 PASS. - pv lint --strict-test-binding contracts/apr-pretrain-arch-polymorphic-v1.yaml: 0 PV-VER-002 (down from 4 pre-fix). This consolidates round 2 into a single commit on the same branch + PR (#1509) rather than spawning a round-3 PR for one extra fix. The lint hardening in #1511 is what made finding the 4th drift trivial; future drift will be caught at contract-merge time once #1511 lands. Refs: SPEC-SHIP-TWO-001 §50.4 cascade, PR #1511 (sub-agent's pv lint --strict-test-binding), Issue #1510 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 5, 2026 05:35

Merge branch 'main' into feat/cuda-001-falsifier-binding

4d686a6

noahgift merged commit 88806b3 into main May 5, 2026
10 checks passed

noahgift deleted the feat/cuda-001-falsifier-binding branch May 5, 2026 06:27

noahgift mentioned this pull request May 5, 2026

docs(M61-M63): record §50.4 cascade aprender PRs #1500/#1501/#1502 SHIPPED paiml/claude-code-parity-apr#49

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

contract(apr-pretrain-arch-polymorphic-v1): v1.3 → v1.4 — bind FALSIFY-APR-PRETRAIN-INIT-CUDA-001 + drift-prevention test#1502

contract(apr-pretrain-arch-polymorphic-v1): v1.3 → v1.4 — bind FALSIFY-APR-PRETRAIN-INIT-CUDA-001 + drift-prevention test#1502
noahgift merged 2 commits into
mainfrom
feat/cuda-001-falsifier-binding

noahgift commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 5, 2026

Summary

What ships

Why drift-prevention

Five Whys

Net effects

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant