Skip to content

ci(fork-sync): add fork-boundary mock baseline + clean stale allowlist (#2436)#2445

Merged
alexey-pelykh merged 1 commit intomainfrom
ci/2436-fork-boundary-mock-baseline
Apr 20, 2026
Merged

ci(fork-sync): add fork-boundary mock baseline + clean stale allowlist (#2436)#2445
alexey-pelykh merged 1 commit intomainfrom
ci/2436-fork-boundary-mock-baseline

Conversation

@alexey-pelykh
Copy link
Copy Markdown

Summary

Extends scripts/check-stub-debt.mjs (ADR 0005 H8) with a second baseline-gated counter: vi.mock/vi.doMock calls targeting src/agents/ or src/middleware/. Closes the test-side gap in the #2408-class defense that H7 (production-side AST classifier) cannot see. Also cleans up a stale .throwing-stub-callers-allowlist entry for resolveAgentRuntimeOrThrow (restored to real implementation in commit d4a01b9190).

Why this counter

Tests that mock fork-boundary modules can mask production throwing-stubs. The test exercises the mock; production hits a broken stub. This is the underlying cause of #2408 and the #2337 cascade.

H7 (PR #2444) catches production-side throwing stubs at PR time via 4-signal AST classification. It cannot see test files. H8 creates a test-side friction point: new vi.mocks targeting fork-boundary modules bump the baseline and must be categorized in the PR description:

Category When it applies
isolation Side-effect-heavy dependency; unit-test the logic in isolation
performance Expensive real implementation; mock for test speed
stub-placeholder RED FLAG. Mocking because the real impl is a throwing-stub — open a tracking issue

See CONTRIBUTING.md § Fork-boundary mocks for detailed guidance.

Re-scope history

Original #2436 proposed three baselines. 360 evaluation on 2026-04-20 (post-H7 ship):

  • Rejected .never-return-baseline — grep-inflated (24 raw hits → only ~6-8 real function : never returns); mostly tracks legitimate typed error-throw helpers; H7 catches the malicious subset
  • Rejected .fork-boundary-stub-baseline — H7's forkMessagePattern calibration signal already catches fork-attributed throw stubs; redundant dual-ceremony
  • Accepted .fork-boundary-mock-baseline — unique value; test-side signal neither H7 nor H9 can see

HQ ADR 0005 H8 description updated in parallel (hq commit 33d248d) with a sunset-review trigger: 6 months post-ship, evaluate whether the mock-reason categorization is providing real signal or has become rubber-stamped.

Changes

  • scripts/check-stub-debt.mjs: dual-counter refactor; new helpers (resolveForkBoundaryMock, findForkBoundaryMocks, readBaseline, reportCounter). Catches both vi.mock and vi.doMock. Scans test files AND test-adjacent fixture files (.test-helpers.ts, .test-mocks.ts, .mocks.ts, .e2e-mocks.ts).
  • .fork-boundary-mock-baseline: seeded at 134 (current count).
  • .throwing-stub-callers-allowlist: remove stale resolveAgentRuntimeOrThrow entry. Gate was explicitly reporting it as stale; the function is now a real typed implementation. Removing the entry revokes a silent pre-approval for future regression.
  • CONTRIBUTING.md: new § Fork-boundary mocks section.

Verification

  • node scripts/check-stub-debt.mjsstub-debt check passed: 12 == baseline 12. + fork-boundary-mock check passed: 134 == baseline 134.
  • node scripts/check-throwing-stub-callers.mjs --inventory0 violations + 0 stale (post-cleanup)
  • pnpm check → clean (format + typecheck + lint + custom gates)
  • Fresh-context adversarial validation (session 93f1a611): initial FINDINGS on 3 gaps (ratchet-down filename, vi.doMock regex miss, .mocks.ts/.e2e-mocks.ts suffix miss) — all fixed, re-validated CLEAN
  • Fresh-context polish (session be0ec76b): CLEAN after 2 iterations applying DRY / idiomatic refactors (matchAll, .some(), removed dead absolute-path branch, spread operator for inventory field propagation). Output unchanged (12/134 still pass).

Cascade impact

H8 delivered. Unblocks Phase 3 (#2441 composite gate) once #2437 (H9) also lands.

Test plan

  • Both counters pass locally
  • Ratchet-up / ratchet-down behavior verified via baseline probes
  • vi.doMock now caught (2 previously-missed calls in inventory)
  • Test-adjacent files .mocks.ts / .e2e-mocks.ts now scanned (3 previously-missed mocks in inventory)
  • pnpm check clean
  • Pre-commit hooks pass

Closes #2436.

#2436)

Extends scripts/check-stub-debt.mjs with a second baseline-gated counter
alongside the existing @ts-expect-error counter (ADR 0005 H5). The new
counter tracks vi.mock/vi.doMock calls in test files that target modules
under src/agents/ or src/middleware/ — the test-side cause of
#2408-class regressions that H7's AST classifier cannot see (H7 only
examines production code).

Rationale: tests that mock fork-boundary modules can mask production
throwing-stubs — the test exercises the mock while production hits a
broken stub. The baseline creates a friction point at PR time where new
mocks must be categorized in the PR description: isolation, performance,
or stub-placeholder (the last being a RED FLAG).

Re-scope history: original #2436 proposed three baselines. 360 evaluation
on 2026-04-20 rejected the other two as redundant with H7 or noise-heavy:

  Rejected: .never-return-baseline — mostly tracks legitimate typed
            error-throw helpers; H7 already catches the malicious subset
  Rejected: .fork-boundary-stub-baseline — H7's forkMessagePattern
            calibration signal already catches these
  Accepted: .fork-boundary-mock-baseline — unique value, test-side signal
            neither H7 nor H9 can see

Also in this PR: remove stale .throwing-stub-callers-allowlist entry for
resolveAgentRuntimeOrThrow. That function was restored to a real typed
implementation in commit d4a01b9; the allowlist entry had become a
silent pre-approval for future regression. Gate was explicitly reporting
it as stale via --inventory.

Changes:
  - scripts/check-stub-debt.mjs: dual-counter refactor + vi.mock detection
    via regex, .js→.ts resolution, fork-boundary prefix match
  - .fork-boundary-mock-baseline: seeded at 134 (current count)
  - .throwing-stub-callers-allowlist: remove stale #2408 entry
  - CONTRIBUTING.md: new § Fork-boundary mocks documenting the three
    categorization buckets with references to ADR 0005 H8

Verification:
  - node scripts/check-stub-debt.mjs           -> 12 and 134 pass, exit 0
  - node scripts/check-throwing-stub-callers.mjs --inventory -> 0 stale
  - pnpm check                                 -> clean
  - Fresh-context adversarial validation (session 93f1a611) found 3
    blocking findings; all fixed + re-validated CLEAN
  - Fresh-context polish (session be0ec76b) applied DRY/idiomatic
    refactors across 2 iterations, output unchanged

HQ ADR 0005 H8 description updated in parallel (hq commit 33d248d)
with sunset-review trigger: 6 months post-ship, evaluate whether the
mock-reason categorization is providing real signal or has become
rubber-stamped. If rubber-stamped, defer to H7+H9 alone and remove this
gate.
@alexey-pelykh alexey-pelykh enabled auto-merge (squash) April 20, 2026 13:47
@alexey-pelykh alexey-pelykh merged commit 448a36d into main Apr 20, 2026
13 checks passed
@alexey-pelykh alexey-pelykh deleted the ci/2436-fork-boundary-mock-baseline branch April 20, 2026 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ci(fork-sync): fork-boundary mock baseline in check-stub-debt.mjs (H8)

1 participant