Skip to content

contract(apr-pretrain-arch-polymorphic-v1): v1.3 → v1.4 — bind FALSIFY-APR-PRETRAIN-INIT-CUDA-001 + drift-prevention test#1502

Merged
noahgift merged 2 commits into
mainfrom
feat/cuda-001-falsifier-binding
May 5, 2026
Merged

contract(apr-pretrain-arch-polymorphic-v1): v1.3 → v1.4 — bind FALSIFY-APR-PRETRAIN-INIT-CUDA-001 + drift-prevention test#1502
noahgift merged 2 commits into
mainfrom
feat/cuda-001-falsifier-binding

Conversation

@noahgift

@noahgift noahgift commented May 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Drift-prevention bind for `FALSIFY-APR-PRETRAIN-INIT-CUDA-001`. Pre-this-bump, the falsifier id was REFERENCED in the v1.2.0 changelog and verification_summary BUT was not formally registered as a falsification_test entry. The fail-fast guard at `crates/apr-cli/src/commands/pretrain.rs::drive_real` returns Err with this id when `init_arch.is_some() && device.is_cuda()`, but no test pinned the citation. A future refactor could silently drop it → §28 SHIP-007 "silent gibberish" defect class.

What ships

Contract apr-pretrain-arch-polymorphic-v1 v1.3.0 → v1.4.0 FUNCTIONAL:

  • Adds FALSIFY-APR-PRETRAIN-INIT-CUDA-001 as formal falsification_test (PARTIAL_ALGORITHM_LEVEL).
  • 10 → 11 falsifiers, all PASS.

Source:

  • Extracted error message into `pub(crate) const FALSIFY_APR_PRETRAIN_INIT_CUDA_001_MSG` so the const itself can be unit-tested without a `--features cuda` build.

Test:

  • `drive_real_cuda_init_path_fail_fasts_with_falsifier_citation` pins:
    • (a) falsifier id appears
    • (b) "not yet wired for --device cuda" phrase appears
    • (c) "step 5f.5 follow-up" reference appears
    • (d) both workarounds (`--device cpu` OR omit `--init`) are suggested

Why drift-prevention

Promotion of CUDA-001 to DISCHARGED requires §50.4 step 5f.5 LIVE (CUDA wireup landed + GPU smoke). That's multi-PR scope (refactor `upload_blocks` + new constructor + wire CLI). Until then, the fail-fast guard is the only safety. Without a formal falsifier + test, that guard silently regresses if anyone refactors `drive_real`.

Five Whys

  1. Why bind a free-floating falsifier id? Because the id is ALREADY cited in source + contract changelog; without a formal falsification_test entry, drift between code and contract is unobservable.
  2. Why now (during the 5g.1 wait)? Productive use of the 17hr corpus-retokenize wall; small scope, doesn't block 5g.1, doesn't move ship-% but reduces operational risk.
  3. Why a const-based test instead of an end-to-end runtime test? Runtime test would need `--features cuda` build + CUDA-capable host. Const-extraction works on every CI matrix entry.
  4. Why not also delete the guard once 5f.5 lands? The guard logic is still needed at the dispatch layer; the BODY of the guard changes (currently fail-fast, post-5f.5 calls `build_shared_cuda_trainer_with_init`). The const can be retired then.
  5. Why isn't this work in §57? §57 is reserved for the next 5g.x finding. CUDA-001 binding is hygiene — no spec amendment needed.

Net effects

  • Contract v1.3.0 → v1.4.0 FUNCTIONAL (11 falsifiers, all PASS).
  • 1 new drift-prevention test.
  • 1 source const extraction.
  • MODEL-1 ship % unchanged at 91%; MODEL-2 ship % unchanged at 57%.

Test plan

  • `pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml` exits 0
  • PMAT pre-commit quality gates pass
  • New drift-prevention test passes
  • CI gate green (workspace-test, ci/gate)
  • Auto-merge fires on green CI

🤖 Generated with Claude Code

…Y-APR-PRETRAIN-INIT-CUDA-001 + drift-prevention test

Pre-this-bump, the falsifier id `FALSIFY-APR-PRETRAIN-INIT-CUDA-001`
was REFERENCED in the v1.2.0 changelog and verification_summary BUT
was not formally registered as a falsification_test entry. The
fail-fast guard at `crates/apr-cli/src/commands/pretrain.rs::drive_real`
(post-#1494 5f.4 wireup) returns Err with this id when
`init_arch.is_some() && device.is_cuda()`, but no test pinned the
citation. A future refactor could silently drop the citation OR let
CUDA + --init fall through → §28 SHIP-007 "silent gibberish" defect class.

## What ships

Contract apr-pretrain-arch-polymorphic-v1 v1.3.0 → v1.4.0 FUNCTIONAL:
- Adds FALSIFY-APR-PRETRAIN-INIT-CUDA-001 as formal falsification_test
  (PARTIAL_ALGORITHM_LEVEL).
- 10 → 11 falsifiers, all PASS.

Source:
- Extracted error message into `pub(crate) const
  FALSIFY_APR_PRETRAIN_INIT_CUDA_001_MSG` so the const itself can be
  unit-tested without a `--features cuda` build.

Test:
- `drive_real_cuda_init_path_fail_fasts_with_falsifier_citation` pins:
  (a) falsifier id appears
  (b) "not yet wired for --device cuda" phrase appears
  (c) "step 5f.5 follow-up" reference appears
  (d) both workarounds (--device cpu OR omit --init) are suggested

## Why drift-prevention matters

Promotion of CUDA-001 to DISCHARGED requires §50.4 step 5f.5 LIVE
(CUDA wireup landed + GPU smoke). That's multi-PR scope (refactor
upload_blocks + new constructor + wire CLI). Until then, the
fail-fast guard is the only safety. Without a formal falsifier +
test, that guard silently regresses if anyone refactors drive_real.

## Net effects

- Contract apr-pretrain-arch-polymorphic-v1 v1.3.0 → v1.4.0 FUNCTIONAL.
- 11 falsifiers total (10 → 11), all PASS.
- 1 new drift-prevention test.
- 1 source const extraction (lockup).
- MODEL-1 ship % unchanged at 91%.
- MODEL-2 ship % unchanged at 57% until 5g.3.

This is a quality-and-hygiene PR while the 5g.1 17hr corpus retokenize
runs in the background. Doesn't move ship-% but reduces drift risk +
binds a previously-free-floating falsifier reference.

Refs: SPEC-SHIP-TWO-001 §50.4 step 5f.5,
      contracts/apr-pretrain-arch-polymorphic-v1.yaml v1.4.0

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) May 5, 2026 05:35
@noahgift noahgift merged commit 88806b3 into main May 5, 2026
10 checks passed
@noahgift noahgift deleted the feat/cuda-001-falsifier-binding branch May 5, 2026 06:27
noahgift added a commit that referenced this pull request May 5, 2026
…-005/006 test-reference drift (#1505)

Same drift class as PR #1504 caught in apr-pretrain-from-init-v1.
Test names cited in v1.1.0 changelog never matched the actual tests
PR #1476 authored. Drift survived three intervening bumps
(v1.1→v1.2→v1.3→v1.4) because each focused on adding new falsifiers,
not auditing existing bindings.

## Drift inventory

| Falsifier | v1.4.0 cited test | Exists? | Actual test |
|---|---|---|---|
| FALSIFY-005 | preflight_qwen_vocab_passes_with_qwen_init | ❌ | preflight_qwen_vocab_passes_with_qwen_target |
| FALSIFY-006 | preflight_qwen_vocab_fails_without_init | ❌ | preflight_qwen_vocab_fails_with_llama_target |

## Resolution

Update the `test:` field for FALSIFY-005 and FALSIFY-006 to reference
the actual tests authored by PR #1476. No falsifier semantics change.
No new tests added.

## Verification

  $ cargo test -p apr-cli --lib -- commands::pretrain::tests::preflight_qwen_vocab_passes_with_qwen_target
    test result: ok. 1 passed; ...
  $ cargo test -p apr-cli --lib -- commands::pretrain::tests::preflight_qwen_vocab_fails_with_llama_target
    test result: ok. 1 passed; ...
  $ pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml
    0 error(s), 0 warning(s)

## Five Whys

1. Why did the drift survive 3 bumps? Each bump (v1.2/v1.3/v1.4)
   focused on ADDING new content (CUDA-001, relaxed bound, etc.);
   none audited existing bindings.
2. Why didn't the §50.4 cascade catch this? The cascade authored
   tests; the contract was authored separately. Names diverged at
   the boundary; no cross-check landed.
3. Why is this a contract-only fix (no source change)? The tests
   exist and pass — the IMPL is correct. Only the contract's text
   reference needed correction.
4. Why bump to v1.5.0 (not v1.4.1 patch)? Same logic as PR #1504:
   the test-binding INVARIANT (every cited test exists) was broken
   in v1.4.0. v1.5.0 restores it.
5. Why is this important if the impl is correct? Per
   feedback_no_guessing.md, contracts that cite non-existent tests
   are unfalsifiable — future agents reading the contract get a
   false signal that the falsifier is bound. PV-VER-001 lint will
   catch this; better to fix it than wait for the lint engine to
   flag.

## Net effects

- Contract v1.4.0 → v1.5.0 FUNCTIONAL.
- 11 falsifiers, all PASS — same count, but FALSIFY-005/006 now
  reference tests that actually exist.
- MODEL-1 ship % unchanged at 91%.
- MODEL-2 ship % unchanged at 57% until 5g.3.

This is hygiene work while 5g.1 (~12hr) corpus retokenize runs.
Same defect class as PR #1504; together they close the
test-reference drift across both pretrain contracts.

Refs: SPEC-SHIP-TWO-001 §50.4 cascade (PRs #1473-#1494, #1502),
      contracts/apr-pretrain-arch-polymorphic-v1.yaml v1.5.0,
      contracts/apr-pretrain-from-init-v1.yaml v1.2.0 (PR #1504, sibling fix)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 5, 2026
…oughput characterization (#1508)

§56 closed with 5g.1 full-corpus retokenization dispatched (PID
2767124, ~17hr wall projected). §57 records the parallel drift-sweep
work that landed during the 5g.1 wait + throughput characterization
of 5g.1 mid-run.

## Drift sweep (4 PRs)

While 5g.1 ran in the background, a sweep of the §50.4 cascade
contracts surfaced THE SAME drift class across multiple contracts:
cited test names that didn't match what the impl PR actually authored.

  PR     | Contract                              | v_old → v_new | Drift
  ---    | ---                                   | ---           | ---
  #1502  | apr-pretrain-arch-polymorphic-v1      | v1.3 → v1.4   | CUDA-001 was REFERENCED in changelog but had no formal falsification_test entry
  #1504  | apr-pretrain-from-init-v1             | v1.1 → v1.2   | 7 of 8 cited test names didn't exist; re-aligned to existing tests
  #1505  | apr-pretrain-arch-polymorphic-v1      | v1.4 → v1.5   | FALSIFY-005/006 cited names diverged from PR #1476's actual authoring
  #1506  | apr-cli-tokenize-import-hf-v1         | v1.0 → v1.1   | FALSIFY-001 cited "or equivalent" — no real test name

After PR #1506 lands, `pv lint contracts/` reports 0 PV-VER-001
errors across all 870+ contracts. The drift class is fully closed.

## 5g.1 throughput (real-time mid-run)

  Shard | Closed at | Δ from prev
  0     | 07:08    | (start)
  1     | 07:24    | 16 min
  2     | 07:39    | 15 min
  3     | 07:55    | 16 min
  ...
  12    | 10:16    | (in progress)

Mean wall: 16.3 min/shard. Linear projection: 57 shards × 16.3 min =
929 min = ~15.5 hr total → ETA ~22:30Z (slightly under §56's 17hr
smoke estimate).

## Methodology takeaway

When a contract is authored in PR_A alongside its impl, AND the
impl's test names are stamped in the contract's `test:` field BEFORE
the impl PR finalizes the names, the names diverge at the cascade
boundary. Happened in 3 of 4 §50.4 cascade contracts.

Prevention rule: when authoring a new contract that cites tests,
EITHER reference tests that already exist on main, OR mark them
`PENDING_PR_<N>:` with the impl PR ref so PV-VER-001 lint can flag
dangling refs at contract-merge time.

A future spec amendment could codify a `pv lint --strict-test-binding`
enforcement that blocks contract merge when any `test:` field doesn't
resolve to an existing test invocation. Out of §57 scope.

## Net effects

- Spec v3.01.0 → v3.02.0.
- Three contract bumps land cleanly (apr-pretrain-arch-polymorphic-v1
  v1.3→v1.4→v1.5, apr-pretrain-from-init-v1 v1.1→v1.2,
  apr-cli-tokenize-import-hf-v1 v1.0→v1.1).
- pv lint contracts/ 0 PV-VER-001 errors across 870+ contracts.
- 5g.1 full corpus run progressing at 16.3 min/shard; ETA ~22:30Z.
- MODEL-1 ship % unchanged at 91%; MODEL-2 ship % unchanged at 57%
  until step 5g.3 produces val_loss < 9.38.

Refs: SPEC-SHIP-TWO-001 §50.4 cascade,
      PRs #1502/#1504/#1505/#1506 (drift sweep),
      apr-cookbook spec v5.1.0 (companion update — operator-facing recipe)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 5, 2026
…-003/004/007 drift (round 2) (#1509)

* contract(apr-pretrain-arch-polymorphic-v1): v1.5 → v1.6 — fix FALSIFY-003/004/007 drift (round 2)

Second-round test-reference drift correction. §57's drift sweep
(this contract's v1.4 → v1.5 bump in PR #1505) caught FALSIFY-005/006
but a more thorough audit (cross-referencing every `test:` field
against the source-code function-name registry) surfaced three
additional dangling references.

## Drift inventory (round 2)

  | Falsifier | v1.5.0 cited test                                           | Exists? | Actual test                                                  |
  | ---       | ---                                                         | ---     | ---                                                          |
  | 003       | build_transformer_config_qwen_init_matches_constructor      | ❌       | build_transformer_config_qwen_init_matches_input             |
  | 004       | transformer::attention::tests::gqa_7_to_1_matches_full_mha  | ❌       | transformer::model::tests::falsify_apr_pretrain_arch_004_*   |
  | 007       | build_transformer_config_encoder_init_errors                | ❌       | validate_pretrain_init_arch_rejects_encoder                  |

## Why §57 (PR #1505) didn't catch these

§57's grep audited test-name SUFFIXES and FRAGMENTS, which produced
false-negatives on:
  - `_init_matches_constructor` vs `_init_matches_input` — both end
    in `_matches_<word>` so a fragment grep counted the contract's
    name as "not dangling"
  - `transformer::attention::tests::` vs `transformer::model::tests::` —
    module-path drift not just function-name drift; only fully-
    qualified path comparison catches this
  - `_encoder_init_errors` vs `validate_pretrain_init_arch_rejects_encoder` —
    the contract's name was a guess at the impl name; impl PR #1479
    chose a completely different convention

## How this round was found

Used a stricter audit: for every `cargo test ... ::tests::<name>`
in contracts, grep `fn <name>` in the actual source tree. If the
fn doesn't exist, drift. This catches drift that PR #1505's
fragment-based audit missed.

## Resolution

Update FALSIFY-003/004/007 `test:` fields to the actual function
names. No falsifier semantics change. 11 falsifiers all PASS;
contract status remains FUNCTIONAL.

## Verification

  $ cargo test -p aprender-train --lib -- build_transformer_config_qwen_init_matches_input
    test result: ok. 1 passed
  $ cargo test -p aprender-train --lib -- falsify_apr_pretrain_arch_004_gqa_7_1_forward_pass_smoke
    test result: ok. 1 passed
  $ cargo test -p aprender-train --lib -- validate_pretrain_init_arch_rejects_encoder
    test result: ok. 1 passed
  $ pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml
    0 error(s), 0 warning(s)

## Five Whys

1. Why did §57's sweep miss these? Used name-fragment grep
   (`::tests::[a-z_]+`) which counted false-negatives on suffix-
   close names like `_constructor` ↔ `_input`.
2. Why is module-path drift a separate class? Because grep against
   the `[a-z_]+` regex captures the FUNCTION name, not the
   `::module::tests::` path. A function with the right name in the
   wrong module passes that audit but fails actual test invocation.
3. Why fix in a separate PR rather than amending PR #1505? PR #1505
   already merged. Per `feedback_falsifier_first_cascade_pattern.md`
   the cleanest cadence is one-bump-per-PR.
4. Why bump to v1.6.0? Same pattern as PR #1505's v1.4 → v1.5: the
   test-binding INVARIANT was broken in v1.5.0 (residual drift) and
   v1.6.0 restores it.
5. Why now (during 5g.1 wait)? Productive use of the 5g.1 (~10hr
   remaining) compute-bound idle time. Each drift fix is small
   (~30 LOC), reduces drift risk for future agents, and restores
   the falsifier-binding invariant. The alternative (manufacture
   bigger work) would risk introducing defects the contract base
   doesn't catch yet.

## Net effects

- Contract v1.5.0 → v1.6.0 FUNCTIONAL.
- 11 falsifiers, all PASS — same count, but FALSIFY-003/004/007
  now reference tests that actually exist.
- MODEL-1 ship % unchanged at 91%.
- MODEL-2 ship % unchanged at 57% until 5g.3.

This is the SECOND round of drift sweep on this contract. Together
with PRs #1502/#1504/#1505/#1506 (round 1), all known
test-reference drift is closed across the §50.4 cascade contracts.
A future spec amendment could codify a `pv lint --strict-test-binding`
enforcement that prevents drift at contract-merge time.

Refs: SPEC-SHIP-TWO-001 §50.4 cascade,
      contracts/apr-pretrain-arch-polymorphic-v1.yaml v1.6.0,
      PR #1505 (round 1 partial fix), PR #1502/#1504/#1506 (sibling fixes)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* contract(apr-pretrain-arch-polymorphic-v1): also fix FALSIFY-001 (round 2.5 — surfaced by PR #1511)

Round 2 (initial commit on this branch) fixed FALSIFY-003/004/007.
Sub-agent PR #1511 (`pv lint --strict-test-binding`) surfaced a 4th
drift in this same contract:

  FALSIFY-001 cited `qwen2_0_5b_matches_hf_config`
    → does NOT exist on main.
  Actual: `qwen2_0_5b_matches_hf_config_2026_05_04`
    (date-suffix added by impl PR #1474 / commit 9af6e71 — May 4).

The earlier round-2 audit (which focused on suffix + module-path
drift) didn't catch this because the test name has a DATE-SUFFIX
drift class (function name + `_<date>` is a real Rust test, but
the contract truncated to the prefix).

Updates:
- FALSIFY-001 test ref: append `_2026_05_04` suffix.
- v1.6.0 changelog updated to record 4 fixes (was 3).
- Verified: cargo test qwen2_0_5b_matches_hf_config_2026_05_04 PASS.
- pv lint --strict-test-binding contracts/apr-pretrain-arch-polymorphic-v1.yaml: 0 PV-VER-002 (down from 4 pre-fix).

This consolidates round 2 into a single commit on the same branch
+ PR (#1509) rather than spawning a round-3 PR for one extra fix.
The lint hardening in #1511 is what made finding the 4th drift
trivial; future drift will be caught at contract-merge time once
#1511 lands.

Refs: SPEC-SHIP-TWO-001 §50.4 cascade,
      PR #1511 (sub-agent's pv lint --strict-test-binding),
      Issue #1510

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant