Skip to content

feat(pv-lint): add --strict-test-binding flag — catches dangling test refs (Closes #1510)#1511

Merged
noahgift merged 3 commits into
mainfrom
feat/pv-lint-strict-test-binding
May 5, 2026
Merged

feat(pv-lint): add --strict-test-binding flag — catches dangling test refs (Closes #1510)#1511
noahgift merged 3 commits into
mainfrom
feat/pv-lint-strict-test-binding

Conversation

@noahgift

@noahgift noahgift commented May 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes #1510. Adds opt-in lint gate pv lint --strict-test-binding (PV-VER-002) that cross-checks every falsification_tests[].test cargo invocation in a contract against the actual #[test] (or #[tokio::test], etc.) function set in the source tree.

The existing PV-VER-001 only matches names starting with test_ or prop_, leaving the modern convention used by ~all real contracts (e.g. pretrain_init_missing_file_errors) silently unverified. Six dangling refs accumulated across PRs #1502/#1504/#1505/#1506/#1509 caught manually; this gate codifies the check.

Drift classes caught

  1. Suffix drift: contract cites _init_matches_constructor; source has _init_matches_input → emit warning.
  2. Convention drift: contract cites _encoder_init_errors; source has validate_pretrain_init_arch_rejects_encoder → emit warning.
  3. Prose-style placeholders: "or equivalent" text, bounds., MiB., etc. → parser rejects via looks_like_rust_ident so they aren't false-positives.
  4. Shell-pipe residue: cargo test ... 2>&1 | grep "..." → parser truncates before the pipe so the grep pattern isn't interpreted as a test name.

Documented design choice (drift class 2 in issue: "module-path drift"): the gate matches by bare fn name only, not full module path. A test fn that exists somewhere in the workspace under the cited bare name is accepted regardless of which module it lives in. Catching wrong-module references would require parsing real mod trees and is out of scope for this PR. See drift_class_2_module_path_no_warning_when_fn_exists_anywhere test for the contract.

Skipped categories (not flagged)

  • LIVE-PENDING: prefix — explicit deferred-live markers
  • pv validate ... — meta-validation invocations, not cargo tests

Default mode = WARNING (two-stage rollout per Toyota Way kaizen)

  • pv lint --strict-test-binding contracts/ → exits 0, emits warnings (existing CI not broken).
  • pv lint --strict-test-binding --strict contracts/ → exits 1 if any ref is missing (block merge).

This matches the issue's two-stage plan: (a) add flag, (b) fix all warnings, (c) flip default later.

Files changed (count: 7)

  • New crates/aprender-contracts/src/lint/strict_test_binding.rs (~700 LOC: gate + parser + harvester + 19 tests)
  • crates/aprender-contracts/src/lint/mod.rs — wire Gate 9 + strict_test_binding: bool config field
  • crates/aprender-contracts/src/lint/rules.rs — add RuleCategory::Verify + PV-VER-002 rule
  • crates/aprender-contracts-cli/src/cli.rs — add --strict-test-binding flag declaration
  • crates/aprender-contracts-cli/src/commands/lint.rs — thread flag through run/build_config/run_watch
  • crates/aprender-contracts-cli/src/commands/lint_render.rs--explain PV-VER-002 long-form
  • crates/aprender-contracts-cli/src/main.rs — bind new flag in dispatch

Tests added: 19

8 unit tests for the parser/harvester (cargo invocation parsing, shell-pipe truncation, prose-token rejection, attribute harvesting).

7 end-to-end gate tests covering the 4 drift classes from the issue + skip cases (LIVE-PENDING, pv validate) + strict-mode pass/fail behavior.

running 19 tests
test lint::strict_test_binding::tests::detect_basic_cargo_test ... ok
test lint::strict_test_binding::tests::extract_dashed_filter_after_separator ... ok
test lint::strict_test_binding::tests::extract_compound_invocation ... ok
test lint::strict_test_binding::tests::extract_rejects_prose_tokens ... ok
test lint::strict_test_binding::tests::extract_skips_features_arg_value ... ok
test lint::strict_test_binding::tests::extract_skips_shell_pipe_residue ... ok
test lint::strict_test_binding::tests::extract_simple_filter ... ok
test lint::strict_test_binding::tests::harvest_attribute_marked_tests ... ok
test lint::strict_test_binding::tests::harvest_legacy_prefix_tests ... ok
test lint::strict_test_binding::tests::harvest_with_intervening_attribute ... ok
test lint::strict_test_binding::tests::skip_live_pending_marker ... ok
test lint::strict_test_binding::tests::skip_pv_validate_invocation ... ok
test lint::strict_test_binding::tests::happy_path_existing_test_no_warning ... ok
test lint::strict_test_binding::tests::drift_class_1_suffix_drift_emits_warning ... ok
test lint::strict_test_binding::tests::drift_class_2_module_path_no_warning_when_fn_exists_anywhere ... ok
test lint::strict_test_binding::tests::drift_class_3_convention_drift_emits_warning ... ok
test lint::strict_test_binding::tests::live_pending_skip_no_warning ... ok
test lint::strict_test_binding::tests::pv_validate_invocation_skipped ... ok
test lint::strict_test_binding::tests::strict_mode_gate_fails_when_refs_missing ... ok

test result: ok. 19 passed; 0 failed

Full lint module test suite: 191 passed; 0 failed (including all pre-existing tests, none broken).

Regression-prevention check

The issue calls out three contracts (apr-pretrain-arch-polymorphic-v1, apr-pretrain-from-init-v1, apr-cli-tokenize-import-hf-v1) recently fixed in PRs #1505/#1509. With the new gate enabled:

  • apr-pretrain-from-init-v1: 0 PV-VER-002 findings (clean) ✓
  • apr-cli-tokenize-import-hf-v1: 0 PV-VER-002 findings (clean) ✓
  • apr-pretrain-arch-polymorphic-v1: 4 findings — these are new drift introduced AFTER contract(apr-pretrain-arch-polymorphic-v1): v1.5 → v1.6 — fix FALSIFY-003/004/007 drift (round 2) #1509 merged (e.g. qwen2_0_5b_matches_hf_config cited but the source test was renamed to qwen2_0_5b_matches_hf_config_2026_05_04 post-merge). The gate is doing exactly what the issue asked: catching newly-introduced drift that the existing lint silently passed. These are tracked separately and should be the next contract-bump cycle.

Across all 760 contracts, the gate surfaces 24 PV-VER-002 findings — all are real drift, not parser false-positives.

Default-mode safety

$ pv lint --strict-test-binding contracts/ ; echo $?
... 24 PV-VER-002 warnings ...
Result: PASS
0

$ pv lint --strict-test-binding --strict contracts/ ; echo $?
... 24 PV-VER-002 errors ...
Result: FAIL
1

$ pv lint contracts/ ; echo $?  # backward compat: no flag = no PV-VER-002 emitted
0

Five Whys

  1. Why did this drift accumulate? Contracts authored alongside their impl PRs stamped anticipated test names; impl PRs landed with different conventions; no automated cross-check ran at merge time.
  2. Why didn't pv lint catch it? The PV-VER-001 check validates only test_* / prop_*-prefixed names — but virtually no real test in the codebase uses that convention.
  3. Why does referent verification matter? A contract citing a non-existent test is unfalsifiable — operators reading the contract think the falsifier is bound when it isn't.
  4. Why now? Five sequential drift PRs in 24h is a smell that the lint is too lax; codifying the strict check prevents recurrence.
  5. Why a separate flag rather than promoting to default error? Backward compatibility — existing CI runs pv lint and would break if the new check were ERROR-by-default. Two-stage rollout: (a) add flag, (b) fix all warnings, (c) flip default. Per Toyota Way kaizen.

Test plan

  • Unit + integration tests for parser, harvester, and end-to-end gate (19 tests, all passing).
  • cargo fmt --all -- --check clean on changed files.
  • No new clippy errors in changed files (remaining workspace clippy errors are pre-existing on main).
  • pv lint contracts/ (without flag) still exits 0 — backward compat verified.
  • pv lint --strict-test-binding contracts/ exits 0; ... --strict exits 1.
  • pv lint --explain PV-VER-002 shows category, severity, and remediation guidance.
  • Help text on pv lint --help documents the flag.

🤖 Generated with Claude Code

…ences (Refs #1510)

Add a new opt-in lint gate (Gate 9, PV-VER-002) that cross-checks every
`falsification_tests[].test` cargo invocation in a contract against the
`#[test]` (or `#[tokio::test]`, etc.) function set in the source tree.
Catches drift classes the existing PV-VER-001 misses because it only
matches `test_*` / `prop_*` prefixes.

Drift classes caught (per Issue #1510):
  1. Function-name suffix drift (`_init_matches_constructor` vs
     `_init_matches_input`)
  2. Convention drift (`_encoder_init_errors` vs
     `validate_pretrain_init_arch_rejects_encoder`)
  3. "Or equivalent" prose-style placeholders that aren't real cargo
     invocations
  4. Shell-pipe / redirect false-positives (parser truncates at `|` and
     `2>&1` so the grep pattern isn't interpreted as a test name)

Skipped categories:
  - `LIVE-PENDING:` markers (deferred-live, not cargo tests)
  - `pv validate ...` invocations (meta-validation, not cargo tests)

Default mode emits Warning so existing CI is not broken; combine with
`--strict` to promote to Error and block merges.

Implementation:
  - New module `crates/aprender-contracts/src/lint/strict_test_binding.rs`
    with the gate, parser, harvester, and 19 unit/integration tests
    covering the 4 drift classes plus skip cases and strict-mode behavior.
  - New rule PV-VER-002 in the catalog (Verify category, default Warning).
  - New `--strict-test-binding` flag on `pv lint`.
  - `pv lint --explain PV-VER-002` documents intent and remediation.

Verified: `pv lint --strict-test-binding contracts/` exits 0 by default
(warnings only, 24 real dangling refs surfaced); exits 1 with --strict.
The contracts the issue called out as regression-prevention targets
(apr-pretrain-from-init-v1, apr-cli-tokenize-import-hf-v1) emit zero
PV-VER-002 findings.
@noahgift noahgift enabled auto-merge (squash) May 5, 2026 09:48
noahgift added a commit that referenced this pull request May 5, 2026
…nd 2.5 — surfaced by PR #1511)

Round 2 (initial commit on this branch) fixed FALSIFY-003/004/007.
Sub-agent PR #1511 (`pv lint --strict-test-binding`) surfaced a 4th
drift in this same contract:

  FALSIFY-001 cited `qwen2_0_5b_matches_hf_config`
    → does NOT exist on main.
  Actual: `qwen2_0_5b_matches_hf_config_2026_05_04`
    (date-suffix added by impl PR #1474 / commit 9af6e71 — May 4).

The earlier round-2 audit (which focused on suffix + module-path
drift) didn't catch this because the test name has a DATE-SUFFIX
drift class (function name + `_<date>` is a real Rust test, but
the contract truncated to the prefix).

Updates:
- FALSIFY-001 test ref: append `_2026_05_04` suffix.
- v1.6.0 changelog updated to record 4 fixes (was 3).
- Verified: cargo test qwen2_0_5b_matches_hf_config_2026_05_04 PASS.
- pv lint --strict-test-binding contracts/apr-pretrain-arch-polymorphic-v1.yaml: 0 PV-VER-002 (down from 4 pre-fix).

This consolidates round 2 into a single commit on the same branch
+ PR (#1509) rather than spawning a round-3 PR for one extra fix.
The lint hardening in #1511 is what made finding the 4th drift
trivial; future drift will be caught at contract-merge time once
#1511 lands.

Refs: SPEC-SHIP-TWO-001 §50.4 cascade,
      PR #1511 (sub-agent's pv lint --strict-test-binding),
      Issue #1510

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 5, 2026
…-003/004/007 drift (round 2) (#1509)

* contract(apr-pretrain-arch-polymorphic-v1): v1.5 → v1.6 — fix FALSIFY-003/004/007 drift (round 2)

Second-round test-reference drift correction. §57's drift sweep
(this contract's v1.4 → v1.5 bump in PR #1505) caught FALSIFY-005/006
but a more thorough audit (cross-referencing every `test:` field
against the source-code function-name registry) surfaced three
additional dangling references.

## Drift inventory (round 2)

  | Falsifier | v1.5.0 cited test                                           | Exists? | Actual test                                                  |
  | ---       | ---                                                         | ---     | ---                                                          |
  | 003       | build_transformer_config_qwen_init_matches_constructor      | ❌       | build_transformer_config_qwen_init_matches_input             |
  | 004       | transformer::attention::tests::gqa_7_to_1_matches_full_mha  | ❌       | transformer::model::tests::falsify_apr_pretrain_arch_004_*   |
  | 007       | build_transformer_config_encoder_init_errors                | ❌       | validate_pretrain_init_arch_rejects_encoder                  |

## Why §57 (PR #1505) didn't catch these

§57's grep audited test-name SUFFIXES and FRAGMENTS, which produced
false-negatives on:
  - `_init_matches_constructor` vs `_init_matches_input` — both end
    in `_matches_<word>` so a fragment grep counted the contract's
    name as "not dangling"
  - `transformer::attention::tests::` vs `transformer::model::tests::` —
    module-path drift not just function-name drift; only fully-
    qualified path comparison catches this
  - `_encoder_init_errors` vs `validate_pretrain_init_arch_rejects_encoder` —
    the contract's name was a guess at the impl name; impl PR #1479
    chose a completely different convention

## How this round was found

Used a stricter audit: for every `cargo test ... ::tests::<name>`
in contracts, grep `fn <name>` in the actual source tree. If the
fn doesn't exist, drift. This catches drift that PR #1505's
fragment-based audit missed.

## Resolution

Update FALSIFY-003/004/007 `test:` fields to the actual function
names. No falsifier semantics change. 11 falsifiers all PASS;
contract status remains FUNCTIONAL.

## Verification

  $ cargo test -p aprender-train --lib -- build_transformer_config_qwen_init_matches_input
    test result: ok. 1 passed
  $ cargo test -p aprender-train --lib -- falsify_apr_pretrain_arch_004_gqa_7_1_forward_pass_smoke
    test result: ok. 1 passed
  $ cargo test -p aprender-train --lib -- validate_pretrain_init_arch_rejects_encoder
    test result: ok. 1 passed
  $ pv validate contracts/apr-pretrain-arch-polymorphic-v1.yaml
    0 error(s), 0 warning(s)

## Five Whys

1. Why did §57's sweep miss these? Used name-fragment grep
   (`::tests::[a-z_]+`) which counted false-negatives on suffix-
   close names like `_constructor` ↔ `_input`.
2. Why is module-path drift a separate class? Because grep against
   the `[a-z_]+` regex captures the FUNCTION name, not the
   `::module::tests::` path. A function with the right name in the
   wrong module passes that audit but fails actual test invocation.
3. Why fix in a separate PR rather than amending PR #1505? PR #1505
   already merged. Per `feedback_falsifier_first_cascade_pattern.md`
   the cleanest cadence is one-bump-per-PR.
4. Why bump to v1.6.0? Same pattern as PR #1505's v1.4 → v1.5: the
   test-binding INVARIANT was broken in v1.5.0 (residual drift) and
   v1.6.0 restores it.
5. Why now (during 5g.1 wait)? Productive use of the 5g.1 (~10hr
   remaining) compute-bound idle time. Each drift fix is small
   (~30 LOC), reduces drift risk for future agents, and restores
   the falsifier-binding invariant. The alternative (manufacture
   bigger work) would risk introducing defects the contract base
   doesn't catch yet.

## Net effects

- Contract v1.5.0 → v1.6.0 FUNCTIONAL.
- 11 falsifiers, all PASS — same count, but FALSIFY-003/004/007
  now reference tests that actually exist.
- MODEL-1 ship % unchanged at 91%.
- MODEL-2 ship % unchanged at 57% until 5g.3.

This is the SECOND round of drift sweep on this contract. Together
with PRs #1502/#1504/#1505/#1506 (round 1), all known
test-reference drift is closed across the §50.4 cascade contracts.
A future spec amendment could codify a `pv lint --strict-test-binding`
enforcement that prevents drift at contract-merge time.

Refs: SPEC-SHIP-TWO-001 §50.4 cascade,
      contracts/apr-pretrain-arch-polymorphic-v1.yaml v1.6.0,
      PR #1505 (round 1 partial fix), PR #1502/#1504/#1506 (sibling fixes)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* contract(apr-pretrain-arch-polymorphic-v1): also fix FALSIFY-001 (round 2.5 — surfaced by PR #1511)

Round 2 (initial commit on this branch) fixed FALSIFY-003/004/007.
Sub-agent PR #1511 (`pv lint --strict-test-binding`) surfaced a 4th
drift in this same contract:

  FALSIFY-001 cited `qwen2_0_5b_matches_hf_config`
    → does NOT exist on main.
  Actual: `qwen2_0_5b_matches_hf_config_2026_05_04`
    (date-suffix added by impl PR #1474 / commit 9af6e71 — May 4).

The earlier round-2 audit (which focused on suffix + module-path
drift) didn't catch this because the test name has a DATE-SUFFIX
drift class (function name + `_<date>` is a real Rust test, but
the contract truncated to the prefix).

Updates:
- FALSIFY-001 test ref: append `_2026_05_04` suffix.
- v1.6.0 changelog updated to record 4 fixes (was 3).
- Verified: cargo test qwen2_0_5b_matches_hf_config_2026_05_04 PASS.
- pv lint --strict-test-binding contracts/apr-pretrain-arch-polymorphic-v1.yaml: 0 PV-VER-002 (down from 4 pre-fix).

This consolidates round 2 into a single commit on the same branch
+ PR (#1509) rather than spawning a round-3 PR for one extra fix.
The lint hardening in #1511 is what made finding the 4th drift
trivial; future drift will be caught at contract-merge time once
#1511 lands.

Refs: SPEC-SHIP-TWO-001 §50.4 cascade,
      PR #1511 (sub-agent's pv lint --strict-test-binding),
      Issue #1510

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit ff2e0b6 into main May 5, 2026
10 checks passed
@noahgift noahgift deleted the feat/pv-lint-strict-test-binding branch May 5, 2026 10:45
noahgift added a commit that referenced this pull request May 5, 2026
…h + 4 release-engineering defects closed (#1520)

§58 records the parallel release-engineering track that landed during the §57 drift-sweep wait:
the v0.32.0 user-facing-crate cascade publish (Issue #1514 CLOSED) and the four hidden defects
it surfaced + closed.

User-facing crates now live on crates.io at v0.32.0:
  - aprender = "0.32.0"
  - aprender-rag = "0.32.0"
  - aprender-core = "0.32.0"
  - apr-cli = "0.32.0"

Defects closed during cascade (each in its own PR):
  - #1512 aprender-rag [lib] name = "trueno_rag" → "aprender_rag" BREAKING
  - #1513 aprender-orchestrate cmd_code 7→8 arg drift on emit_trace addition
  - #1515 aprender-core path-only dev-deps (publish-time cycle break)
  - #1517 aprender-core permissive version + path (clean-room sed-strip robustness)
  - #1518 apr-cli aliases.yaml in-crate copy (include_str scope fix)

Plus PR #1511 (pv-lint --strict-test-binding) closes §57.4's foreshadowed prevention rule.

5g.1 corpus retokenize (PID 2767124) at 62 shards / 16h19m wall (manifest pending).
Ship-% unchanged: MODEL-1=91%, MODEL-2=57%. §58 is the third hygiene amendment in a row;
§59 will record 5g.1 completion when manifest emits.

Refs: #1514

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pv lint: add --strict-test-binding to catch test-reference drift (PV-VER-001 false-negatives)

1 participant