contract(apr-pretrain-arch-polymorphic-v1): v1.5 → v1.6 — fix FALSIFY-003/004/007 drift (round 2) by noahgift · Pull Request #1509 · paiml/aprender

noahgift · 2026-05-05T09:04:19Z

Summary

Second-round test-reference drift correction. §57's drift sweep (PR #1505 v1.4 → v1.5) caught FALSIFY-005/006 but a more thorough audit surfaced three additional dangling references.

Drift inventory (round 2)

Falsifier	v1.5.0 cited test	Exists?	Actual test
003	`build_transformer_config_qwen_init_matches_constructor`	❌	`build_transformer_config_qwen_init_matches_input`
004	`transformer::attention::tests::gqa_7_to_1_matches_full_mha_via_repeat`	❌	`transformer::model::tests::falsify_apr_pretrain_arch_004_gqa_7_1_forward_pass_smoke`
007	`build_transformer_config_encoder_init_errors`	❌	`validate_pretrain_init_arch_rejects_encoder`

Why §57 (PR #1505) didn't catch these

§57's grep audited test-name SUFFIXES and FRAGMENTS, which produced false-negatives on:

`_init_matches_constructor` vs `_init_matches_input` — both end in `matches`
`transformer::attention::tests::` vs `transformer::model::tests::` — module-path drift, not function-name drift
`_encoder_init_errors` vs `validate_pretrain_init_arch_rejects_encoder` — completely different convention

Stricter audit (grep fn <name> for every cited test) catches these.

Verification

`cargo test -p aprender-train --lib -- build_transformer_config_qwen_init_matches_input` PASS
`cargo test -p aprender-train --lib -- falsify_apr_pretrain_arch_004_gqa_7_1_forward_pass_smoke` PASS
`cargo test -p aprender-train --lib -- validate_pretrain_init_arch_rejects_encoder` PASS
`pv validate` 0 errors

Five Whys

Why did §57 miss these? Name-fragment grep (`::tests::[a-z_]+`) gave false-negatives on suffix-close names.
Why is module-path drift a separate class? Function-name regex captures the function, not the `::module::tests::` path.
Why a separate PR? PR contract(apr-pretrain-arch-polymorphic-v1): v1.4 → v1.5 — fix FALSIFY-005/006 test-reference drift #1505 already merged; one-bump-per-PR cadence per `feedback_falsifier_first_cascade_pattern.md`.
Why bump to v1.6.0? Same pattern as PR contract(apr-pretrain-arch-polymorphic-v1): v1.4 → v1.5 — fix FALSIFY-005/006 test-reference drift #1505: invariant was broken in v1.5.0 (residual drift); v1.6.0 restores it.
Why now? Productive use of 5g.1 wait (~10hr remaining); small scope (~30 LOC); reduces drift risk.

Net effects

Contract v1.5.0 → v1.6.0 FUNCTIONAL.
11 falsifiers, all PASS — FALSIFY-003/004/007 now reference real tests.
MODEL-1 ship % unchanged at 91%; MODEL-2 ship % unchanged at 57%.

Together with PRs #1502/#1504/#1505/#1506 (round 1), all known test-reference drift is closed across §50.4 cascade contracts.

Test plan

🤖 Generated with Claude Code

…-003/004/007 drift (round 2) Second-round test-reference drift correction. §57's drift sweep (this contract's v1.4 → v1.5 bump in PR #1505) caught FALSIFY-005/006 but a more thorough audit (cross-referencing every `test:` field against the source-code function-name registry) surfaced three additional dangling references. ## Drift inventory (round 2) | Falsifier | v1.5.0 cited test | Exists? | Actual test | | --- | --- | --- | --- | | 003 | build_transformer_config_qwen_init_matches_constructor | ❌ | build_transformer_config_qwen_init_matches_input | | 004 | transformer::attention::tests::gqa_7_to_1_matches_full_mha | ❌ | transformer::model::tests::falsify_apr_pretrain_arch_004_* | | 007 | build_transformer_config_encoder_init_errors | ❌ | validate_pretrain_init_arch_rejects_encoder | ## Why §57 (PR #1505) didn't catch these §57's grep audited test-name SUFFIXES and FRAGMENTS, which produced false-negatives on: - `_init_matches_constructor` vs `_init_matches_input` — both end in `_matches_<word>` so a fragment grep counted the contract's name as "not dangling" - `transformer::attention::tests::` vs `transformer::model::tests::` — module-path drift not just function-name drift; only fully- qualified path comparison catches this - `_encoder_init_errors` vs `validate_pretrain_init_arch_rejects_encoder` — the contract's name was a guess at the impl name; impl PR #1479 chose a completely different convention ## How this round was found Used a stricter audit: for every `cargo test ... ::tests::<name>` in contracts, grep `fn <name>` in the actual source tree. If the fn doesn't exist, drift. This catches drift that PR #1505's fragment-based audit missed. ## Resolution Update FALSIFY-003/004/007 `test:` fields to the actual function names. No falsifier semantics change. 11 falsifiers all PASS; contract status remains FUNCTIONAL. ## Verification $ cargo test -p aprender-train --lib -- build_transformer_config_qwen_init_matches_input test result: ok. 1 passed $ cargo test -p aprender-train --lib -- falsify_apr_pretrain_arch_004_gqa_7_1_forward_pass_smoke test result: ok. 1 passed $ cargo test -p aprender-train --lib -- validate_pretrain_init_arch_rejects_encoder test result: ok. 1 passed $ pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml 0 error(s), 0 warning(s) ## Five Whys 1. Why did §57's sweep miss these? Used name-fragment grep (`::tests::[a-z_]+`) which counted false-negatives on suffix- close names like `_constructor` ↔ `_input`. 2. Why is module-path drift a separate class? Because grep against the `[a-z_]+` regex captures the FUNCTION name, not the `::module::tests::` path. A function with the right name in the wrong module passes that audit but fails actual test invocation. 3. Why fix in a separate PR rather than amending PR #1505? PR #1505 already merged. Per `feedback_falsifier_first_cascade_pattern.md` the cleanest cadence is one-bump-per-PR. 4. Why bump to v1.6.0? Same pattern as PR #1505's v1.4 → v1.5: the test-binding INVARIANT was broken in v1.5.0 (residual drift) and v1.6.0 restores it. 5. Why now (during 5g.1 wait)? Productive use of the 5g.1 (~10hr remaining) compute-bound idle time. Each drift fix is small (~30 LOC), reduces drift risk for future agents, and restores the falsifier-binding invariant. The alternative (manufacture bigger work) would risk introducing defects the contract base doesn't catch yet. ## Net effects - Contract v1.5.0 → v1.6.0 FUNCTIONAL. - 11 falsifiers, all PASS — same count, but FALSIFY-003/004/007 now reference tests that actually exist. - MODEL-1 ship % unchanged at 91%. - MODEL-2 ship % unchanged at 57% until 5g.3. This is the SECOND round of drift sweep on this contract. Together with PRs #1502/#1504/#1505/#1506 (round 1), all known test-reference drift is closed across the §50.4 cascade contracts. A future spec amendment could codify a `pv lint --strict-test-binding` enforcement that prevents drift at contract-merge time. Refs: SPEC-SHIP-TWO-001 §50.4 cascade, contracts/apr-pretrain-arch-polymorphic-v1.yaml v1.6.0, PR #1505 (round 1 partial fix), PR #1502/#1504/#1506 (sibling fixes) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…-003-004-007-drift

…nd 2.5 — surfaced by PR #1511) Round 2 (initial commit on this branch) fixed FALSIFY-003/004/007. Sub-agent PR #1511 (`pv lint --strict-test-binding`) surfaced a 4th drift in this same contract: FALSIFY-001 cited `qwen2_0_5b_matches_hf_config` → does NOT exist on main. Actual: `qwen2_0_5b_matches_hf_config_2026_05_04` (date-suffix added by impl PR #1474 / commit 9af6e71 — May 4). The earlier round-2 audit (which focused on suffix + module-path drift) didn't catch this because the test name has a DATE-SUFFIX drift class (function name + `_<date>` is a real Rust test, but the contract truncated to the prefix). Updates: - FALSIFY-001 test ref: append `_2026_05_04` suffix. - v1.6.0 changelog updated to record 4 fixes (was 3). - Verified: cargo test qwen2_0_5b_matches_hf_config_2026_05_04 PASS. - pv lint --strict-test-binding contracts/apr-pretrain-arch-polymorphic-v1.yaml: 0 PV-VER-002 (down from 4 pre-fix). This consolidates round 2 into a single commit on the same branch + PR (#1509) rather than spawning a round-3 PR for one extra fix. The lint hardening in #1511 is what made finding the 4th drift trivial; future drift will be caught at contract-merge time once #1511 lands. Refs: SPEC-SHIP-TWO-001 §50.4 cascade, PR #1511 (sub-agent's pv lint --strict-test-binding), Issue #1510 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…-003-004-007-drift

noahgift enabled auto-merge (squash) May 5, 2026 09:04

noahgift mentioned this pull request May 5, 2026

pv lint: add --strict-test-binding to catch test-reference drift (PV-VER-001 false-negatives) #1510

Closed

Merge branch 'main' into fix/apr-pretrain-arch-polymorphic-v1-falsify…

7ec3b74

…-003-004-007-drift

noahgift mentioned this pull request May 5, 2026

feat(pv-lint): add --strict-test-binding flag — catches dangling test refs (Closes #1510) #1511

Merged

7 tasks

noahgift and others added 2 commits May 5, 2026 11:57

Merge branch 'main' into fix/apr-pretrain-arch-polymorphic-v1-falsify…

0919d35

…-003-004-007-drift

noahgift merged commit bd92bd4 into main May 5, 2026
10 checks passed

noahgift deleted the fix/apr-pretrain-arch-polymorphic-v1-falsify-003-004-007-drift branch May 5, 2026 10:15

noahgift mentioned this pull request May 5, 2026

docs(M70): record apr-pretrain-arch-polymorphic-v1 v1.5 → v1.6 round-2 drift fix paiml/claude-code-parity-apr#56

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

contract(apr-pretrain-arch-polymorphic-v1): v1.5 → v1.6 — fix FALSIFY-003/004/007 drift (round 2)#1509

contract(apr-pretrain-arch-polymorphic-v1): v1.5 → v1.6 — fix FALSIFY-003/004/007 drift (round 2)#1509
noahgift merged 4 commits into
mainfrom
fix/apr-pretrain-arch-polymorphic-v1-falsify-003-004-007-drift

noahgift commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 5, 2026

Summary

Drift inventory (round 2)

Why §57 (PR #1505) didn't catch these

Verification

Five Whys

Net effects

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant