feat(pv-lint): add --strict-test-binding flag — catches dangling test refs (Closes #1510)#1511
Merged
Merged
Conversation
…ences (Refs #1510) Add a new opt-in lint gate (Gate 9, PV-VER-002) that cross-checks every `falsification_tests[].test` cargo invocation in a contract against the `#[test]` (or `#[tokio::test]`, etc.) function set in the source tree. Catches drift classes the existing PV-VER-001 misses because it only matches `test_*` / `prop_*` prefixes. Drift classes caught (per Issue #1510): 1. Function-name suffix drift (`_init_matches_constructor` vs `_init_matches_input`) 2. Convention drift (`_encoder_init_errors` vs `validate_pretrain_init_arch_rejects_encoder`) 3. "Or equivalent" prose-style placeholders that aren't real cargo invocations 4. Shell-pipe / redirect false-positives (parser truncates at `|` and `2>&1` so the grep pattern isn't interpreted as a test name) Skipped categories: - `LIVE-PENDING:` markers (deferred-live, not cargo tests) - `pv validate ...` invocations (meta-validation, not cargo tests) Default mode emits Warning so existing CI is not broken; combine with `--strict` to promote to Error and block merges. Implementation: - New module `crates/aprender-contracts/src/lint/strict_test_binding.rs` with the gate, parser, harvester, and 19 unit/integration tests covering the 4 drift classes plus skip cases and strict-mode behavior. - New rule PV-VER-002 in the catalog (Verify category, default Warning). - New `--strict-test-binding` flag on `pv lint`. - `pv lint --explain PV-VER-002` documents intent and remediation. Verified: `pv lint --strict-test-binding contracts/` exits 0 by default (warnings only, 24 real dangling refs surfaced); exits 1 with --strict. The contracts the issue called out as regression-prevention targets (apr-pretrain-from-init-v1, apr-cli-tokenize-import-hf-v1) emit zero PV-VER-002 findings.
noahgift
added a commit
that referenced
this pull request
May 5, 2026
…nd 2.5 — surfaced by PR #1511) Round 2 (initial commit on this branch) fixed FALSIFY-003/004/007. Sub-agent PR #1511 (`pv lint --strict-test-binding`) surfaced a 4th drift in this same contract: FALSIFY-001 cited `qwen2_0_5b_matches_hf_config` → does NOT exist on main. Actual: `qwen2_0_5b_matches_hf_config_2026_05_04` (date-suffix added by impl PR #1474 / commit 9af6e71 — May 4). The earlier round-2 audit (which focused on suffix + module-path drift) didn't catch this because the test name has a DATE-SUFFIX drift class (function name + `_<date>` is a real Rust test, but the contract truncated to the prefix). Updates: - FALSIFY-001 test ref: append `_2026_05_04` suffix. - v1.6.0 changelog updated to record 4 fixes (was 3). - Verified: cargo test qwen2_0_5b_matches_hf_config_2026_05_04 PASS. - pv lint --strict-test-binding contracts/apr-pretrain-arch-polymorphic-v1.yaml: 0 PV-VER-002 (down from 4 pre-fix). This consolidates round 2 into a single commit on the same branch + PR (#1509) rather than spawning a round-3 PR for one extra fix. The lint hardening in #1511 is what made finding the 4th drift trivial; future drift will be caught at contract-merge time once #1511 lands. Refs: SPEC-SHIP-TWO-001 §50.4 cascade, PR #1511 (sub-agent's pv lint --strict-test-binding), Issue #1510 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 5, 2026
…-003/004/007 drift (round 2) (#1509) * contract(apr-pretrain-arch-polymorphic-v1): v1.5 → v1.6 — fix FALSIFY-003/004/007 drift (round 2) Second-round test-reference drift correction. §57's drift sweep (this contract's v1.4 → v1.5 bump in PR #1505) caught FALSIFY-005/006 but a more thorough audit (cross-referencing every `test:` field against the source-code function-name registry) surfaced three additional dangling references. ## Drift inventory (round 2) | Falsifier | v1.5.0 cited test | Exists? | Actual test | | --- | --- | --- | --- | | 003 | build_transformer_config_qwen_init_matches_constructor | ❌ | build_transformer_config_qwen_init_matches_input | | 004 | transformer::attention::tests::gqa_7_to_1_matches_full_mha | ❌ | transformer::model::tests::falsify_apr_pretrain_arch_004_* | | 007 | build_transformer_config_encoder_init_errors | ❌ | validate_pretrain_init_arch_rejects_encoder | ## Why §57 (PR #1505) didn't catch these §57's grep audited test-name SUFFIXES and FRAGMENTS, which produced false-negatives on: - `_init_matches_constructor` vs `_init_matches_input` — both end in `_matches_<word>` so a fragment grep counted the contract's name as "not dangling" - `transformer::attention::tests::` vs `transformer::model::tests::` — module-path drift not just function-name drift; only fully- qualified path comparison catches this - `_encoder_init_errors` vs `validate_pretrain_init_arch_rejects_encoder` — the contract's name was a guess at the impl name; impl PR #1479 chose a completely different convention ## How this round was found Used a stricter audit: for every `cargo test ... ::tests::<name>` in contracts, grep `fn <name>` in the actual source tree. If the fn doesn't exist, drift. This catches drift that PR #1505's fragment-based audit missed. ## Resolution Update FALSIFY-003/004/007 `test:` fields to the actual function names. No falsifier semantics change. 11 falsifiers all PASS; contract status remains FUNCTIONAL. ## Verification $ cargo test -p aprender-train --lib -- build_transformer_config_qwen_init_matches_input test result: ok. 1 passed $ cargo test -p aprender-train --lib -- falsify_apr_pretrain_arch_004_gqa_7_1_forward_pass_smoke test result: ok. 1 passed $ cargo test -p aprender-train --lib -- validate_pretrain_init_arch_rejects_encoder test result: ok. 1 passed $ pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml 0 error(s), 0 warning(s) ## Five Whys 1. Why did §57's sweep miss these? Used name-fragment grep (`::tests::[a-z_]+`) which counted false-negatives on suffix- close names like `_constructor` ↔ `_input`. 2. Why is module-path drift a separate class? Because grep against the `[a-z_]+` regex captures the FUNCTION name, not the `::module::tests::` path. A function with the right name in the wrong module passes that audit but fails actual test invocation. 3. Why fix in a separate PR rather than amending PR #1505? PR #1505 already merged. Per `feedback_falsifier_first_cascade_pattern.md` the cleanest cadence is one-bump-per-PR. 4. Why bump to v1.6.0? Same pattern as PR #1505's v1.4 → v1.5: the test-binding INVARIANT was broken in v1.5.0 (residual drift) and v1.6.0 restores it. 5. Why now (during 5g.1 wait)? Productive use of the 5g.1 (~10hr remaining) compute-bound idle time. Each drift fix is small (~30 LOC), reduces drift risk for future agents, and restores the falsifier-binding invariant. The alternative (manufacture bigger work) would risk introducing defects the contract base doesn't catch yet. ## Net effects - Contract v1.5.0 → v1.6.0 FUNCTIONAL. - 11 falsifiers, all PASS — same count, but FALSIFY-003/004/007 now reference tests that actually exist. - MODEL-1 ship % unchanged at 91%. - MODEL-2 ship % unchanged at 57% until 5g.3. This is the SECOND round of drift sweep on this contract. Together with PRs #1502/#1504/#1505/#1506 (round 1), all known test-reference drift is closed across the §50.4 cascade contracts. A future spec amendment could codify a `pv lint --strict-test-binding` enforcement that prevents drift at contract-merge time. Refs: SPEC-SHIP-TWO-001 §50.4 cascade, contracts/apr-pretrain-arch-polymorphic-v1.yaml v1.6.0, PR #1505 (round 1 partial fix), PR #1502/#1504/#1506 (sibling fixes) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * contract(apr-pretrain-arch-polymorphic-v1): also fix FALSIFY-001 (round 2.5 — surfaced by PR #1511) Round 2 (initial commit on this branch) fixed FALSIFY-003/004/007. Sub-agent PR #1511 (`pv lint --strict-test-binding`) surfaced a 4th drift in this same contract: FALSIFY-001 cited `qwen2_0_5b_matches_hf_config` → does NOT exist on main. Actual: `qwen2_0_5b_matches_hf_config_2026_05_04` (date-suffix added by impl PR #1474 / commit 9af6e71 — May 4). The earlier round-2 audit (which focused on suffix + module-path drift) didn't catch this because the test name has a DATE-SUFFIX drift class (function name + `_<date>` is a real Rust test, but the contract truncated to the prefix). Updates: - FALSIFY-001 test ref: append `_2026_05_04` suffix. - v1.6.0 changelog updated to record 4 fixes (was 3). - Verified: cargo test qwen2_0_5b_matches_hf_config_2026_05_04 PASS. - pv lint --strict-test-binding contracts/apr-pretrain-arch-polymorphic-v1.yaml: 0 PV-VER-002 (down from 4 pre-fix). This consolidates round 2 into a single commit on the same branch + PR (#1509) rather than spawning a round-3 PR for one extra fix. The lint hardening in #1511 is what made finding the 4th drift trivial; future drift will be caught at contract-merge time once #1511 lands. Refs: SPEC-SHIP-TWO-001 §50.4 cascade, PR #1511 (sub-agent's pv lint --strict-test-binding), Issue #1510 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 5, 2026
…h + 4 release-engineering defects closed (#1520) §58 records the parallel release-engineering track that landed during the §57 drift-sweep wait: the v0.32.0 user-facing-crate cascade publish (Issue #1514 CLOSED) and the four hidden defects it surfaced + closed. User-facing crates now live on crates.io at v0.32.0: - aprender = "0.32.0" - aprender-rag = "0.32.0" - aprender-core = "0.32.0" - apr-cli = "0.32.0" Defects closed during cascade (each in its own PR): - #1512 aprender-rag [lib] name = "trueno_rag" → "aprender_rag" BREAKING - #1513 aprender-orchestrate cmd_code 7→8 arg drift on emit_trace addition - #1515 aprender-core path-only dev-deps (publish-time cycle break) - #1517 aprender-core permissive version + path (clean-room sed-strip robustness) - #1518 apr-cli aliases.yaml in-crate copy (include_str scope fix) Plus PR #1511 (pv-lint --strict-test-binding) closes §57.4's foreshadowed prevention rule. 5g.1 corpus retokenize (PID 2767124) at 62 shards / 16h19m wall (manifest pending). Ship-% unchanged: MODEL-1=91%, MODEL-2=57%. §58 is the third hygiene amendment in a row; §59 will record 5g.1 completion when manifest emits. Refs: #1514 Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #1510. Adds opt-in lint gate
pv lint --strict-test-binding(PV-VER-002) that cross-checks everyfalsification_tests[].testcargo invocation in a contract against the actual#[test](or#[tokio::test], etc.) function set in the source tree.The existing PV-VER-001 only matches names starting with
test_orprop_, leaving the modern convention used by ~all real contracts (e.g.pretrain_init_missing_file_errors) silently unverified. Six dangling refs accumulated across PRs #1502/#1504/#1505/#1506/#1509 caught manually; this gate codifies the check.Drift classes caught
_init_matches_constructor; source has_init_matches_input→ emit warning._encoder_init_errors; source hasvalidate_pretrain_init_arch_rejects_encoder→ emit warning."or equivalent"text,bounds.,MiB., etc. → parser rejects vialooks_like_rust_identso they aren't false-positives.cargo test ... 2>&1 | grep "..."→ parser truncates before the pipe so the grep pattern isn't interpreted as a test name.Documented design choice (drift class 2 in issue: "module-path drift"): the gate matches by bare fn name only, not full module path. A test fn that exists somewhere in the workspace under the cited bare name is accepted regardless of which module it lives in. Catching wrong-module references would require parsing real
modtrees and is out of scope for this PR. Seedrift_class_2_module_path_no_warning_when_fn_exists_anywheretest for the contract.Skipped categories (not flagged)
LIVE-PENDING:prefix — explicit deferred-live markerspv validate ...— meta-validation invocations, not cargo testsDefault mode = WARNING (two-stage rollout per Toyota Way kaizen)
pv lint --strict-test-binding contracts/→ exits 0, emits warnings (existing CI not broken).pv lint --strict-test-binding --strict contracts/→ exits 1 if any ref is missing (block merge).This matches the issue's two-stage plan: (a) add flag, (b) fix all warnings, (c) flip default later.
Files changed (count: 7)
crates/aprender-contracts/src/lint/strict_test_binding.rs(~700 LOC: gate + parser + harvester + 19 tests)crates/aprender-contracts/src/lint/mod.rs— wire Gate 9 +strict_test_binding: boolconfig fieldcrates/aprender-contracts/src/lint/rules.rs— addRuleCategory::Verify+PV-VER-002rulecrates/aprender-contracts-cli/src/cli.rs— add--strict-test-bindingflag declarationcrates/aprender-contracts-cli/src/commands/lint.rs— thread flag throughrun/build_config/run_watchcrates/aprender-contracts-cli/src/commands/lint_render.rs—--explain PV-VER-002long-formcrates/aprender-contracts-cli/src/main.rs— bind new flag in dispatchTests added: 19
8 unit tests for the parser/harvester (cargo invocation parsing, shell-pipe truncation, prose-token rejection, attribute harvesting).
7 end-to-end gate tests covering the 4 drift classes from the issue + skip cases (
LIVE-PENDING,pv validate) + strict-mode pass/fail behavior.Full lint module test suite: 191 passed; 0 failed (including all pre-existing tests, none broken).
Regression-prevention check
The issue calls out three contracts (apr-pretrain-arch-polymorphic-v1, apr-pretrain-from-init-v1, apr-cli-tokenize-import-hf-v1) recently fixed in PRs #1505/#1509. With the new gate enabled:
qwen2_0_5b_matches_hf_configcited but the source test was renamed toqwen2_0_5b_matches_hf_config_2026_05_04post-merge). The gate is doing exactly what the issue asked: catching newly-introduced drift that the existing lint silently passed. These are tracked separately and should be the next contract-bump cycle.Across all 760 contracts, the gate surfaces 24 PV-VER-002 findings — all are real drift, not parser false-positives.
Default-mode safety
Five Whys
pv lintcatch it? The PV-VER-001 check validates onlytest_*/prop_*-prefixed names — but virtually no real test in the codebase uses that convention.pv lintand would break if the new check were ERROR-by-default. Two-stage rollout: (a) add flag, (b) fix all warnings, (c) flip default. Per Toyota Way kaizen.Test plan
cargo fmt --all -- --checkclean on changed files.pv lint contracts/(without flag) still exits 0 — backward compat verified.pv lint --strict-test-binding contracts/exits 0;... --strictexits 1.pv lint --explain PV-VER-002shows category, severity, and remediation guidance.pv lint --helpdocuments the flag.🤖 Generated with Claude Code