feat: incremental invalidation + discharge cache (#222 Phase B) by aallan · Pull Request #717 · aallan/vera

aallan · 2026-06-10T15:38:49Z

Summary

Phase B of the #222 plan: incremental verification behind Phase A's unchanged API. VerificationSession now caches each top-level function's verification output and replays it when the function's invalidation key is unchanged — only invalidated functions re-enter Z3. This PR does not finish #222 (Phases C–E follow; the issue stays open).

The soundness model (`vera/obligations/cache.py`)

A function's cache key covers everything its verification reads:

Component	Covers	Rule it implements
`fn_structural_hash`	the full `FnDecl` subtree including spans	byte-identical-at-same-position replay only — cached diagnostics carry `location`/`source_line`, obligations carry line/col, so a moved function is a deliberate miss
`callee_component`	direct callees' interfaces (signature + contracts + type params)	callee contract change → callers re-verify (call sites check pre / assume post); callee body change → callers replay (bodies are never read across the boundary)
`program_context_hash`	non-fn declarations, imported-module contracts, timeout, file	ADT / alias / effect / import / timeout change → invalidate everything (coarse, conservative — per the plan: when in doubt, invalidate more)

Hard rail honoured: entries containing any timeout-status obligation are never cached (DischargeCache.put refuses them) — solver-timeout outcomes are load-dependent.

A bug the oracle caught before it shipped

The first structural hash was sha256(repr(decl)) — but Node.span is declared repr=False, so the hash was position-blind: a function shifted down one line would have replayed cached output with stale line numbers. The targeted span-shift test failed immediately; the hash now folds every node's span in via a generic walk_nodes digest (a dataclass-field walker — no isinstance chain to maintain, the #597 walker-completeness lesson applied structurally).

Implementation

vera/obligations/cache.py (new): key derivation, walk_nodes, FnCacheEntry, bounded-FIFO DischargeCache.
vera/verifier.py: public register_program() seam (verify_program delegates — behaviour identical); the session drives per-function verification selectively while registration stays whole-program.
vera/obligations/session.py: incremental drive in declaration order, so replayed and fresh slices interleave into the exact cold output stream. SessionRunStats (replayed/verified counts) added — additive observability; API otherwise unchanged.

Verification

Corpus-wide oracle, now exercising replay: the existing warm-twice differential test runs the second pass entirely from cache across all 35 examples + every verify/run-level conformance program — replay == re-verify == cold verify() on diagnostics, summary, and the obligation stream.
Targeted invalidation-rule tests (TestIncrementalInvalidation, +8): identical-source full replay; callee-body-edit re-verifies only the callee while the caller replays; callee-contract-edit invalidates the caller — and the tightened precondition surfaces a violated call_pre at the call site, proving genuine re-verification; span-shift and ADT-edit conservative invalidation; cross-program isolation; timeout-never-cached (monkeypatched solver, second run re-verifies); FIFO eviction bound.
Hard rail: test_verifier.py + test_verifier_coverage.py pass unchanged (237/237). mypy vera/ strict clean (63 files); ruff S clean; full non-stress suite green via pre-commit on both commits; doc counts consistent (4,220 tests).

Release prep (in this PR per workflow)

v0.0.162 across all tracked sites, CHANGELOG cut, HISTORY.md single-sentence row, site assets regenerated, tag on branch.

Part of #222 (Phase B of the phased plan — C–E follow in their own PRs).

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Incremental verification that replays per-function cached results when unchanged and a registration-only pass to support incremental runs.
- Bounded discharge cache with FIFO eviction; solver-timeout outcomes are never replayed and are re-verified.
Tests
- Added tests covering incremental replay, targeted invalidation rules, timeout handling and cache eviction behaviour.
Documentation
- Updated changelog, history, README, roadmap and testing docs; release bumped to v0.0.162.

Behind Phase A's unchanged API: VerificationSession now caches each top-level function's verification output (obligations, diagnostics, summary deltas) and replays it when the function's invalidation key is unchanged, re-entering Z3 only for invalidated functions. New vera/obligations/cache.py: - fn_cache_key = H(structural hash) + H(direct-callee interfaces) + H(program context). Soundness model documented in the module docstring: callee CONTRACT/signature changes invalidate callers (preconditions checked at call sites, postconditions assumed); callee BODY changes do not (bodies never read across the boundary); ADT / alias / effect / import / timeout / file changes invalidate everything (coarse, conservative). - Structural hash folds SPANS in explicitly: Node.span is repr=False, so repr(decl) alone is position-blind — a shifted function would replay stale line numbers. Caught by the new span-shift test before it ever shipped; spans now participate via a walk_nodes digest. - walk_nodes: generic dataclass-field AST walker (no isinstance chain to maintain — the #597 walker-completeness lesson applied structurally). - DischargeCache: bounded FIFO; put() refuses entries containing any timeout-status obligation (hard rail: solver-timeout outcomes are load-dependent and must never replay). vera/verifier.py: public register_program() seam (verify_program now delegates to it); the session drives per-function verification selectively while registration stays whole-program. vera/obligations/session.py: incremental drive in declaration order so replayed and fresh slices interleave into the exact cold output stream; SessionRunStats (replayed_fns / verified_fns) added for cache observability — additive, API otherwise unchanged. Tests (tests/test_obligations.py, +8): - The existing corpus-wide warm-twice oracle now exercises the replay path on all 35 examples + every verify/run conformance program for free (warm2 is all-replay and must equal warm1 == cold). - TestIncrementalInvalidation pins each rule: identical-source full replay; callee-body-edit re-verifies only the callee while the caller replays; callee-contract-edit invalidates the caller (and the tightened precondition surfaces a violated call_pre at the call site — proof of genuine re-verification); span-shift and ADT-edit conservative invalidation; cross-program isolation; timeout-status never cached (monkeypatched solver, second run re-verifies); FIFO eviction bound. Validation: 259 obligation tests + 237 hard-rail verifier tests green; mypy strict clean (63 files); ruff S clean; doc counts consistent (4,220 tests). Part of #222 (Phase B). Release prep follows before the PR. Co-Authored-By: Claude <noreply@anthropic.invalid>

Version bump across all tracked sites, CHANGELOG cut for 0.0.162, HISTORY.md single-sentence row, footer count, regenerated site assets. Skip-changelog: the CHANGELOG cut for 0.0.162 IS this commit's changelog change. Co-Authored-By: Claude <noreply@anthropic.invalid>

coderabbitai · 2026-06-10T15:39:08Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3bff51d8-2af3-4c24-9aad-b49f7de0f967

📥 Commits

Reviewing files that changed from the base of the PR and between 1b85c73 and 5badfa5.

📒 Files selected for processing (2)

TESTING.md
tests/test_obligations.py

📝 Walkthrough

Walkthrough

Implements Phase B incremental verification: per-function discharge cache with deterministic invalidation keys (function structure, direct-callee inputs, program context). VerificationSession iterates top-level functions, replays cached diagnostics/obligations/deltas on key hits, verifies-and-caches on misses. Adds tests and updates release/docs and package versions for v0.0.162.

Changes

Phase B Incremental Verification Caching

Layer / File(s)	Summary
Discharge cache: hashing, keys, and storage `vera/obligations/cache.py`	Deterministic per-function cache keys via `fn_structural_hash` (function + spans), `callee_component` (direct callees' verification inputs excluding bodies), and `program_context_hash` (non-function decls, timeout, file, resolved-module keys). Adds `FnCacheEntry` and `DischargeCache` with FIFO eviction and a rule refusing to cache entries containing any obligation with `status == "timeout"`.
Session integration: incremental per-function verification `vera/obligations/session.py`, `vera/obligations/__init__.py`, `vera/verifier.py`	`VerificationSession.verify_source` refactored to compute a context hash, iterate top-level functions, compute per-function invalidation keys, replay cached `FnCacheEntry` on hits (accumulating diagnostics, obligations, and summary deltas) and verify/cache on misses. Adds `SessionRunStats`, initialises a per-session `DischargeCache`, and exposes `SessionRunStats` through `vera.obligations`. `ContractVerifier.register_program()` provides the registration seam.
Comprehensive cache replay and invalidation tests `tests/test_obligations.py`	Adds `TestIncrementalInvalidation` covering warm-vs-cold equivalence and targeted invalidation scenarios: identical-source replay, callee-body reverify, callee-contract invalidation of callers, conservative span-shift invalidation, ADT invalidation, file-name isolation, timeout non-replay, and DischargeCache FIFO eviction.
Version and documentation updates `CHANGELOG.md`, `HISTORY.md`, `README.md`, `ROADMAP.md`, `TESTING.md`, `pyproject.toml`, `vera/__init__.py`	Bumps version to `0.0.162` and updates release notes and project/test metrics to reflect the new release and Phase B feature.

Sequence Diagram(s)

sequenceDiagram
  participant VerificationSession
  participant DischargeCache
  participant ContractVerifier
  VerificationSession->>ContractVerifier: register_program(program)
  loop each FnDecl
    VerificationSession->>DischargeCache: get(fn_cache_key)
    alt cache hit
      DischargeCache-->>VerificationSession: FnCacheEntry
      VerificationSession->>VerificationSession: append cached diagnostics/obligations/deltas
    else cache miss
      VerificationSession->>ContractVerifier: _verify_fn(decl)
      ContractVerifier-->>VerificationSession: diagnostics, obligations, summary delta
      VerificationSession->>DischargeCache: put(key, entry)
    end
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

aallan/vera#716: Phase A baseline for obligations/session warm re-verification reused and extended by this Phase B incremental cache work.

Suggested labels

compiler, tests, docs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 62.07% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and precisely summarizes the main feature: incremental invalidation and discharge cache implementation (Phase B), matching the substantial core changes across cache.py, session.py, and verifier.py.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/222-phase-b-incremental

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-06-10T15:43:11Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.96%. Comparing base (becb3d9) to head (5badfa5).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #717      +/-   ##
==========================================
+ Coverage   90.92%   90.96%   +0.03%     
==========================================
  Files          63       64       +1     
  Lines       24216    24318     +102     
  Branches      292      292              
==========================================
+ Hits        22019    22121     +102     
  Misses       2190     2190              
  Partials        7        7

Flag	Coverage Δ
javascript	`61.40% <ø> (ø)`
python	`94.70% <100.00%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_obligations.py`:
- Around line 441-465: The tests are too weak: in
test_adt_edit_invalidates_everything and test_cross_program_no_stale_bleed
ensure no warm replay and that edited runs match a cold run. After calling
VerificationSession().verify_source(edited) (and verify_source(other)), assert
session.last_run_stats.replayed_fns == 0 (not just verified_fns checks) and call
self._assert_matches_cold(edited, result) / self._assert_matches_cold(other,
result) respectively to compare edited/other run output against a cold run;
reference VerificationSession, session.verify_source,
last_run_stats.replayed_fns, last_run_stats.verified_fns, and
self._assert_matches_cold to locate where to add these assertions.

In `@vera/obligations/session.py`:
- Around line 175-182: The current context hash uses tuple(repr(m) for m in
resolved_modules) which can embed non-deterministic elements (absolute Path,
differing cwd/symlinks and full source text) and cause cache churn; update the
code that calls program_context_hash to build a deterministic key from each
ResolvedModule by normalising/canonicalising the module path (e.g.
resolved/canonical path or relative repo-root path) and including a stable
digest of the module source (e.g. hash of m.source) rather than repr(m). Locate
uses of resolved_modules and the program_context_hash invocation in session.py
(symbols: ResolvedModule, resolved_modules, program_context_hash) and replace
the repr-based tuple with a tuple of stable primitives like (normalized_path,
source_digest) for each module.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 857abe57-0faa-4d85-874b-78f61eba41f0

📥 Commits

Reviewing files that changed from the base of the PR and between becb3d9 and 0e02aef.

⛔ Files ignored due to path filters (5)

docs/index.html is excluded by !docs/**
docs/index.md is excluded by !docs/**
docs/llms-full.txt is excluded by !docs/**
docs/llms.txt is excluded by !docs/**
uv.lock is excluded by !**/*.lock, !uv.lock

📒 Files selected for processing (12)

CHANGELOG.md
HISTORY.md
README.md
ROADMAP.md
TESTING.md
pyproject.toml
tests/test_obligations.py
vera/__init__.py
vera/obligations/__init__.py
vera/obligations/cache.py
vera/obligations/session.py
vera/verifier.py

Both findings valid: - Stable module keys in the context hash: repr(ResolvedModule) baked in the absolute file_path (canonicalisation-sensitive across cwd/symlinks) and the full parsed AST repr (expensive, derived from source anyway). Now (module path, source sha256) pairs — cheaper and deterministic; the source digest covers every contract/signature change a caller could read. - Test tightening: test_adt_edit_invalidates_everything now checks the edited warm output against cold; test_cross_program_no_stale_bleed rewritten to match its name — identical source under a different file name must fully re-verify (file is baked into cached diagnostic locations), asserting verified==2/replayed==0 plus per-file location correctness, instead of the previous related-program edit that legitimately allowed one replay. Co-Authored-By: Claude <noreply@anthropic.invalid>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

vera/obligations/session.py (1)

154-158: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reset last_run_stats on the type-check short-circuit.

A failing typecheck() returns before this run assigns fresh stats, so VerificationSession.last_run_stats leaks replay/verify counts from the previous successful call. Anything reading cache observability after an error will see stale data.

Proposed fix

         if any(d.severity == "error" for d in check_diags):
             # Type errors: skip verification, matching cmd_verify.
+            self.last_run_stats = SessionRunStats()
             return SessionVerifyResult(
                 ok=False, check_diagnostics=check_diags,
             )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vera/obligations/session.py` around lines 154 - 158, In the typecheck()
short-circuit where you return a SessionVerifyResult on error (the if
any(d.severity == "error" for d in check_diags) branch), reset
VerificationSession.last_run_stats to an empty/default stats value (e.g. zeroed
replay/verify counts or None) before returning so stale counts aren’t leaked;
update the branch that returns SessionVerifyResult(...) to set
self.last_run_stats = <default stats> (or None) prior to the return.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_obligations.py`:
- Around line 458-473: The final assertion is vacuous because
result.verify_diagnostics is empty for a clean fixture; update
test_cross_program_no_stale_bleed so the file-name check is meaningful by
ensuring there is at least one diagnostic before verifying its file: either
assert result.verify_diagnostics is non-empty (e.g., assert
len(result.verify_diagnostics) > 0) or call session.verify_source with input
that produces a diagnostic for "prog_b.vera", then loop over
result.verify_diagnostics to assert d.location.file == "prog_b.vera"; reference
the test function name test_cross_program_no_stale_bleed, the
session.verify_source calls, result.verify_diagnostics, and
last_run_stats.replayed_fns when making the change.

---

Outside diff comments:
In `@vera/obligations/session.py`:
- Around line 154-158: In the typecheck() short-circuit where you return a
SessionVerifyResult on error (the if any(d.severity == "error" for d in
check_diags) branch), reset VerificationSession.last_run_stats to an
empty/default stats value (e.g. zeroed replay/verify counts or None) before
returning so stale counts aren’t leaked; update the branch that returns
SessionVerifyResult(...) to set self.last_run_stats = <default stats> (or None)
prior to the return.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 59aa8c03-17c7-451d-b624-0a929f11a1c2

📥 Commits

Reviewing files that changed from the base of the PR and between 0e02aef and 1b85c73.

📒 Files selected for processing (4)

TESTING.md
tests/test_obligations.py
vera/obligations/cache.py
vera/obligations/session.py

CodeRabbit round 2 (valid): the fixture verified cleanly, so the all(...location.file...) assertion passed over an empty diagnostics list. The source now carries a violated postcondition so both runs produce at least one diagnostic (asserted non-empty), and the location check genuinely proves the second run's output carries prog_b.vera — a stale replay from prog_a would fail it. Co-Authored-By: Claude <noreply@anthropic.invalid>

aallan and others added 2 commits June 10, 2026 16:14

coderabbitai Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread tests/test_obligations.py Outdated

Comment thread vera/obligations/session.py

coderabbitai Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread tests/test_obligations.py

aallan merged commit 19abc7e into main Jun 10, 2026
27 checks passed

aallan deleted the feat/222-phase-b-incremental branch June 10, 2026 17:58

This was referenced Jun 10, 2026

feat: vera/speculativeEdit proof-delta — completes the LSP server (#222 Phase E) #720

Merged

LSP server #222

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: incremental invalidation + discharge cache (#222 Phase B)#717

feat: incremental invalidation + discharge cache (#222 Phase B)#717
aallan merged 4 commits into
mainfrom
feat/222-phase-b-incremental

aallan commented Jun 10, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aallan commented Jun 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The soundness model (vera/obligations/cache.py)

A bug the oracle caught before it shipped

Implementation

Verification

Release prep (in this PR per workflow)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aallan commented Jun 10, 2026 •

edited by coderabbitai Bot

Loading

The soundness model (`vera/obligations/cache.py`)

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading

codecov Bot commented Jun 10, 2026 •

edited

Loading