Skip to content

contract(apr-pretrain-arch-polymorphic-v1): v1.5 → v1.6 — fix FALSIFY-003/004/007 drift (round 2)#1509

Merged
noahgift merged 4 commits into
mainfrom
fix/apr-pretrain-arch-polymorphic-v1-falsify-003-004-007-drift
May 5, 2026
Merged

contract(apr-pretrain-arch-polymorphic-v1): v1.5 → v1.6 — fix FALSIFY-003/004/007 drift (round 2)#1509
noahgift merged 4 commits into
mainfrom
fix/apr-pretrain-arch-polymorphic-v1-falsify-003-004-007-drift

Conversation

@noahgift

@noahgift noahgift commented May 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Second-round test-reference drift correction. §57's drift sweep (PR #1505 v1.4 → v1.5) caught FALSIFY-005/006 but a more thorough audit surfaced three additional dangling references.

Drift inventory (round 2)

Falsifier v1.5.0 cited test Exists? Actual test
003 `build_transformer_config_qwen_init_matches_constructor` `build_transformer_config_qwen_init_matches_input`
004 `transformer::attention::tests::gqa_7_to_1_matches_full_mha_via_repeat` `transformer::model::tests::falsify_apr_pretrain_arch_004_gqa_7_1_forward_pass_smoke`
007 `build_transformer_config_encoder_init_errors` `validate_pretrain_init_arch_rejects_encoder`

Why §57 (PR #1505) didn't catch these

§57's grep audited test-name SUFFIXES and FRAGMENTS, which produced false-negatives on:

  • `_init_matches_constructor` vs `_init_matches_input` — both end in `matches`
  • `transformer::attention::tests::` vs `transformer::model::tests::` — module-path drift, not function-name drift
  • `_encoder_init_errors` vs `validate_pretrain_init_arch_rejects_encoder` — completely different convention

Stricter audit (grep fn <name> for every cited test) catches these.

Verification

  • `cargo test -p aprender-train --lib -- build_transformer_config_qwen_init_matches_input` PASS
  • `cargo test -p aprender-train --lib -- falsify_apr_pretrain_arch_004_gqa_7_1_forward_pass_smoke` PASS
  • `cargo test -p aprender-train --lib -- validate_pretrain_init_arch_rejects_encoder` PASS
  • `pv validate` 0 errors

Five Whys

  1. Why did §57 miss these? Name-fragment grep (`::tests::[a-z_]+`) gave false-negatives on suffix-close names.
  2. Why is module-path drift a separate class? Function-name regex captures the function, not the `::module::tests::` path.
  3. Why a separate PR? PR contract(apr-pretrain-arch-polymorphic-v1): v1.4 → v1.5 — fix FALSIFY-005/006 test-reference drift #1505 already merged; one-bump-per-PR cadence per `feedback_falsifier_first_cascade_pattern.md`.
  4. Why bump to v1.6.0? Same pattern as PR contract(apr-pretrain-arch-polymorphic-v1): v1.4 → v1.5 — fix FALSIFY-005/006 test-reference drift #1505: invariant was broken in v1.5.0 (residual drift); v1.6.0 restores it.
  5. Why now? Productive use of 5g.1 wait (~10hr remaining); small scope (~30 LOC); reduces drift risk.

Net effects

  • Contract v1.5.0 → v1.6.0 FUNCTIONAL.
  • 11 falsifiers, all PASS — FALSIFY-003/004/007 now reference real tests.
  • MODEL-1 ship % unchanged at 91%; MODEL-2 ship % unchanged at 57%.

Together with PRs #1502/#1504/#1505/#1506 (round 1), all known test-reference drift is closed across §50.4 cascade contracts.

Test plan

  • `pv validate` 0 errors
  • PMAT pre-commit gates pass
  • All 3 cited tests pass
  • CI gate green
  • Auto-merge fires on green CI

🤖 Generated with Claude Code

…-003/004/007 drift (round 2)

Second-round test-reference drift correction. §57's drift sweep
(this contract's v1.4 → v1.5 bump in PR #1505) caught FALSIFY-005/006
but a more thorough audit (cross-referencing every `test:` field
against the source-code function-name registry) surfaced three
additional dangling references.

## Drift inventory (round 2)

  | Falsifier | v1.5.0 cited test                                           | Exists? | Actual test                                                  |
  | ---       | ---                                                         | ---     | ---                                                          |
  | 003       | build_transformer_config_qwen_init_matches_constructor      | ❌       | build_transformer_config_qwen_init_matches_input             |
  | 004       | transformer::attention::tests::gqa_7_to_1_matches_full_mha  | ❌       | transformer::model::tests::falsify_apr_pretrain_arch_004_*   |
  | 007       | build_transformer_config_encoder_init_errors                | ❌       | validate_pretrain_init_arch_rejects_encoder                  |

## Why §57 (PR #1505) didn't catch these

§57's grep audited test-name SUFFIXES and FRAGMENTS, which produced
false-negatives on:
  - `_init_matches_constructor` vs `_init_matches_input` — both end
    in `_matches_<word>` so a fragment grep counted the contract's
    name as "not dangling"
  - `transformer::attention::tests::` vs `transformer::model::tests::` —
    module-path drift not just function-name drift; only fully-
    qualified path comparison catches this
  - `_encoder_init_errors` vs `validate_pretrain_init_arch_rejects_encoder` —
    the contract's name was a guess at the impl name; impl PR #1479
    chose a completely different convention

## How this round was found

Used a stricter audit: for every `cargo test ... ::tests::<name>`
in contracts, grep `fn <name>` in the actual source tree. If the
fn doesn't exist, drift. This catches drift that PR #1505's
fragment-based audit missed.

## Resolution

Update FALSIFY-003/004/007 `test:` fields to the actual function
names. No falsifier semantics change. 11 falsifiers all PASS;
contract status remains FUNCTIONAL.

## Verification

  $ cargo test -p aprender-train --lib -- build_transformer_config_qwen_init_matches_input
    test result: ok. 1 passed
  $ cargo test -p aprender-train --lib -- falsify_apr_pretrain_arch_004_gqa_7_1_forward_pass_smoke
    test result: ok. 1 passed
  $ cargo test -p aprender-train --lib -- validate_pretrain_init_arch_rejects_encoder
    test result: ok. 1 passed
  $ pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml
    0 error(s), 0 warning(s)

## Five Whys

1. Why did §57's sweep miss these? Used name-fragment grep
   (`::tests::[a-z_]+`) which counted false-negatives on suffix-
   close names like `_constructor` ↔ `_input`.
2. Why is module-path drift a separate class? Because grep against
   the `[a-z_]+` regex captures the FUNCTION name, not the
   `::module::tests::` path. A function with the right name in the
   wrong module passes that audit but fails actual test invocation.
3. Why fix in a separate PR rather than amending PR #1505? PR #1505
   already merged. Per `feedback_falsifier_first_cascade_pattern.md`
   the cleanest cadence is one-bump-per-PR.
4. Why bump to v1.6.0? Same pattern as PR #1505's v1.4 → v1.5: the
   test-binding INVARIANT was broken in v1.5.0 (residual drift) and
   v1.6.0 restores it.
5. Why now (during 5g.1 wait)? Productive use of the 5g.1 (~10hr
   remaining) compute-bound idle time. Each drift fix is small
   (~30 LOC), reduces drift risk for future agents, and restores
   the falsifier-binding invariant. The alternative (manufacture
   bigger work) would risk introducing defects the contract base
   doesn't catch yet.

## Net effects

- Contract v1.5.0 → v1.6.0 FUNCTIONAL.
- 11 falsifiers, all PASS — same count, but FALSIFY-003/004/007
  now reference tests that actually exist.
- MODEL-1 ship % unchanged at 91%.
- MODEL-2 ship % unchanged at 57% until 5g.3.

This is the SECOND round of drift sweep on this contract. Together
with PRs #1502/#1504/#1505/#1506 (round 1), all known
test-reference drift is closed across the §50.4 cascade contracts.
A future spec amendment could codify a `pv lint --strict-test-binding`
enforcement that prevents drift at contract-merge time.

Refs: SPEC-SHIP-TWO-001 §50.4 cascade,
      contracts/apr-pretrain-arch-polymorphic-v1.yaml v1.6.0,
      PR #1505 (round 1 partial fix), PR #1502/#1504/#1506 (sibling fixes)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift and others added 2 commits May 5, 2026 11:57
…nd 2.5 — surfaced by PR #1511)

Round 2 (initial commit on this branch) fixed FALSIFY-003/004/007.
Sub-agent PR #1511 (`pv lint --strict-test-binding`) surfaced a 4th
drift in this same contract:

  FALSIFY-001 cited `qwen2_0_5b_matches_hf_config`
    → does NOT exist on main.
  Actual: `qwen2_0_5b_matches_hf_config_2026_05_04`
    (date-suffix added by impl PR #1474 / commit 9af6e71 — May 4).

The earlier round-2 audit (which focused on suffix + module-path
drift) didn't catch this because the test name has a DATE-SUFFIX
drift class (function name + `_<date>` is a real Rust test, but
the contract truncated to the prefix).

Updates:
- FALSIFY-001 test ref: append `_2026_05_04` suffix.
- v1.6.0 changelog updated to record 4 fixes (was 3).
- Verified: cargo test qwen2_0_5b_matches_hf_config_2026_05_04 PASS.
- pv lint --strict-test-binding contracts/apr-pretrain-arch-polymorphic-v1.yaml: 0 PV-VER-002 (down from 4 pre-fix).

This consolidates round 2 into a single commit on the same branch
+ PR (#1509) rather than spawning a round-3 PR for one extra fix.
The lint hardening in #1511 is what made finding the 4th drift
trivial; future drift will be caught at contract-merge time once
#1511 lands.

Refs: SPEC-SHIP-TWO-001 §50.4 cascade,
      PR #1511 (sub-agent's pv lint --strict-test-binding),
      Issue #1510

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit bd92bd4 into main May 5, 2026
10 checks passed
@noahgift noahgift deleted the fix/apr-pretrain-arch-polymorphic-v1-falsify-003-004-007-drift branch May 5, 2026 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant