contract(apr-pretrain-arch-polymorphic-v1): v1.5 → v1.6 — fix FALSIFY-003/004/007 drift (round 2)#1509
Merged
noahgift merged 4 commits intoMay 5, 2026
Conversation
…-003/004/007 drift (round 2) Second-round test-reference drift correction. §57's drift sweep (this contract's v1.4 → v1.5 bump in PR #1505) caught FALSIFY-005/006 but a more thorough audit (cross-referencing every `test:` field against the source-code function-name registry) surfaced three additional dangling references. ## Drift inventory (round 2) | Falsifier | v1.5.0 cited test | Exists? | Actual test | | --- | --- | --- | --- | | 003 | build_transformer_config_qwen_init_matches_constructor | ❌ | build_transformer_config_qwen_init_matches_input | | 004 | transformer::attention::tests::gqa_7_to_1_matches_full_mha | ❌ | transformer::model::tests::falsify_apr_pretrain_arch_004_* | | 007 | build_transformer_config_encoder_init_errors | ❌ | validate_pretrain_init_arch_rejects_encoder | ## Why §57 (PR #1505) didn't catch these §57's grep audited test-name SUFFIXES and FRAGMENTS, which produced false-negatives on: - `_init_matches_constructor` vs `_init_matches_input` — both end in `_matches_<word>` so a fragment grep counted the contract's name as "not dangling" - `transformer::attention::tests::` vs `transformer::model::tests::` — module-path drift not just function-name drift; only fully- qualified path comparison catches this - `_encoder_init_errors` vs `validate_pretrain_init_arch_rejects_encoder` — the contract's name was a guess at the impl name; impl PR #1479 chose a completely different convention ## How this round was found Used a stricter audit: for every `cargo test ... ::tests::<name>` in contracts, grep `fn <name>` in the actual source tree. If the fn doesn't exist, drift. This catches drift that PR #1505's fragment-based audit missed. ## Resolution Update FALSIFY-003/004/007 `test:` fields to the actual function names. No falsifier semantics change. 11 falsifiers all PASS; contract status remains FUNCTIONAL. ## Verification $ cargo test -p aprender-train --lib -- build_transformer_config_qwen_init_matches_input test result: ok. 1 passed $ cargo test -p aprender-train --lib -- falsify_apr_pretrain_arch_004_gqa_7_1_forward_pass_smoke test result: ok. 1 passed $ cargo test -p aprender-train --lib -- validate_pretrain_init_arch_rejects_encoder test result: ok. 1 passed $ pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml 0 error(s), 0 warning(s) ## Five Whys 1. Why did §57's sweep miss these? Used name-fragment grep (`::tests::[a-z_]+`) which counted false-negatives on suffix- close names like `_constructor` ↔ `_input`. 2. Why is module-path drift a separate class? Because grep against the `[a-z_]+` regex captures the FUNCTION name, not the `::module::tests::` path. A function with the right name in the wrong module passes that audit but fails actual test invocation. 3. Why fix in a separate PR rather than amending PR #1505? PR #1505 already merged. Per `feedback_falsifier_first_cascade_pattern.md` the cleanest cadence is one-bump-per-PR. 4. Why bump to v1.6.0? Same pattern as PR #1505's v1.4 → v1.5: the test-binding INVARIANT was broken in v1.5.0 (residual drift) and v1.6.0 restores it. 5. Why now (during 5g.1 wait)? Productive use of the 5g.1 (~10hr remaining) compute-bound idle time. Each drift fix is small (~30 LOC), reduces drift risk for future agents, and restores the falsifier-binding invariant. The alternative (manufacture bigger work) would risk introducing defects the contract base doesn't catch yet. ## Net effects - Contract v1.5.0 → v1.6.0 FUNCTIONAL. - 11 falsifiers, all PASS — same count, but FALSIFY-003/004/007 now reference tests that actually exist. - MODEL-1 ship % unchanged at 91%. - MODEL-2 ship % unchanged at 57% until 5g.3. This is the SECOND round of drift sweep on this contract. Together with PRs #1502/#1504/#1505/#1506 (round 1), all known test-reference drift is closed across the §50.4 cascade contracts. A future spec amendment could codify a `pv lint --strict-test-binding` enforcement that prevents drift at contract-merge time. Refs: SPEC-SHIP-TWO-001 §50.4 cascade, contracts/apr-pretrain-arch-polymorphic-v1.yaml v1.6.0, PR #1505 (round 1 partial fix), PR #1502/#1504/#1506 (sibling fixes) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…-003-004-007-drift
Merged
7 tasks
…nd 2.5 — surfaced by PR #1511) Round 2 (initial commit on this branch) fixed FALSIFY-003/004/007. Sub-agent PR #1511 (`pv lint --strict-test-binding`) surfaced a 4th drift in this same contract: FALSIFY-001 cited `qwen2_0_5b_matches_hf_config` → does NOT exist on main. Actual: `qwen2_0_5b_matches_hf_config_2026_05_04` (date-suffix added by impl PR #1474 / commit 9af6e71 — May 4). The earlier round-2 audit (which focused on suffix + module-path drift) didn't catch this because the test name has a DATE-SUFFIX drift class (function name + `_<date>` is a real Rust test, but the contract truncated to the prefix). Updates: - FALSIFY-001 test ref: append `_2026_05_04` suffix. - v1.6.0 changelog updated to record 4 fixes (was 3). - Verified: cargo test qwen2_0_5b_matches_hf_config_2026_05_04 PASS. - pv lint --strict-test-binding contracts/apr-pretrain-arch-polymorphic-v1.yaml: 0 PV-VER-002 (down from 4 pre-fix). This consolidates round 2 into a single commit on the same branch + PR (#1509) rather than spawning a round-3 PR for one extra fix. The lint hardening in #1511 is what made finding the 4th drift trivial; future drift will be caught at contract-merge time once #1511 lands. Refs: SPEC-SHIP-TWO-001 §50.4 cascade, PR #1511 (sub-agent's pv lint --strict-test-binding), Issue #1510 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…-003-004-007-drift
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Second-round test-reference drift correction. §57's drift sweep (PR #1505 v1.4 → v1.5) caught FALSIFY-005/006 but a more thorough audit surfaced three additional dangling references.
Drift inventory (round 2)
Why §57 (PR #1505) didn't catch these
§57's grep audited test-name SUFFIXES and FRAGMENTS, which produced false-negatives on:
Stricter audit (grep
fn <name>for every cited test) catches these.Verification
Five Whys
Net effects
Together with PRs #1502/#1504/#1505/#1506 (round 1), all known test-reference drift is closed across §50.4 cascade contracts.
Test plan
🤖 Generated with Claude Code