Skip to content

feat: incremental invalidation + discharge cache (#222 Phase B)#717

Merged
aallan merged 4 commits into
mainfrom
feat/222-phase-b-incremental
Jun 10, 2026
Merged

feat: incremental invalidation + discharge cache (#222 Phase B)#717
aallan merged 4 commits into
mainfrom
feat/222-phase-b-incremental

Conversation

@aallan

@aallan aallan commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Summary

Phase B of the #222 plan: incremental verification behind Phase A's unchanged API. VerificationSession now caches each top-level function's verification output and replays it when the function's invalidation key is unchanged — only invalidated functions re-enter Z3. This PR does not finish #222 (Phases C–E follow; the issue stays open).

The soundness model (vera/obligations/cache.py)

A function's cache key covers everything its verification reads:

Component Covers Rule it implements
fn_structural_hash the full FnDecl subtree including spans byte-identical-at-same-position replay only — cached diagnostics carry location/source_line, obligations carry line/col, so a moved function is a deliberate miss
callee_component direct callees' interfaces (signature + contracts + type params) callee contract change → callers re-verify (call sites check pre / assume post); callee body change → callers replay (bodies are never read across the boundary)
program_context_hash non-fn declarations, imported-module contracts, timeout, file ADT / alias / effect / import / timeout change → invalidate everything (coarse, conservative — per the plan: when in doubt, invalidate more)

Hard rail honoured: entries containing any timeout-status obligation are never cached (DischargeCache.put refuses them) — solver-timeout outcomes are load-dependent.

A bug the oracle caught before it shipped

The first structural hash was sha256(repr(decl)) — but Node.span is declared repr=False, so the hash was position-blind: a function shifted down one line would have replayed cached output with stale line numbers. The targeted span-shift test failed immediately; the hash now folds every node's span in via a generic walk_nodes digest (a dataclass-field walker — no isinstance chain to maintain, the #597 walker-completeness lesson applied structurally).

Implementation

  • vera/obligations/cache.py (new): key derivation, walk_nodes, FnCacheEntry, bounded-FIFO DischargeCache.
  • vera/verifier.py: public register_program() seam (verify_program delegates — behaviour identical); the session drives per-function verification selectively while registration stays whole-program.
  • vera/obligations/session.py: incremental drive in declaration order, so replayed and fresh slices interleave into the exact cold output stream. SessionRunStats (replayed/verified counts) added — additive observability; API otherwise unchanged.

Verification

  • Corpus-wide oracle, now exercising replay: the existing warm-twice differential test runs the second pass entirely from cache across all 35 examples + every verify/run-level conformance program — replay == re-verify == cold verify() on diagnostics, summary, and the obligation stream.
  • Targeted invalidation-rule tests (TestIncrementalInvalidation, +8): identical-source full replay; callee-body-edit re-verifies only the callee while the caller replays; callee-contract-edit invalidates the caller — and the tightened precondition surfaces a violated call_pre at the call site, proving genuine re-verification; span-shift and ADT-edit conservative invalidation; cross-program isolation; timeout-never-cached (monkeypatched solver, second run re-verifies); FIFO eviction bound.
  • Hard rail: test_verifier.py + test_verifier_coverage.py pass unchanged (237/237). mypy vera/ strict clean (63 files); ruff S clean; full non-stress suite green via pre-commit on both commits; doc counts consistent (4,220 tests).

Release prep (in this PR per workflow)

v0.0.162 across all tracked sites, CHANGELOG cut, HISTORY.md single-sentence row, site assets regenerated, tag on branch.

Part of #222 (Phase B of the phased plan — C–E follow in their own PRs).

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Incremental verification that replays per-function cached results when unchanged and a registration-only pass to support incremental runs.
    • Bounded discharge cache with FIFO eviction; solver-timeout outcomes are never replayed and are re-verified.
  • Tests

    • Added tests covering incremental replay, targeted invalidation rules, timeout handling and cache eviction behaviour.
  • Documentation

    • Updated changelog, history, README, roadmap and testing docs; release bumped to v0.0.162.

aallan and others added 2 commits June 10, 2026 16:14
Behind Phase A's unchanged API: VerificationSession now caches each
top-level function's verification output (obligations, diagnostics,
summary deltas) and replays it when the function's invalidation key is
unchanged, re-entering Z3 only for invalidated functions.

New vera/obligations/cache.py:
- fn_cache_key = H(structural hash) + H(direct-callee interfaces) +
  H(program context). Soundness model documented in the module
  docstring: callee CONTRACT/signature changes invalidate callers
  (preconditions checked at call sites, postconditions assumed);
  callee BODY changes do not (bodies never read across the boundary);
  ADT / alias / effect / import / timeout / file changes invalidate
  everything (coarse, conservative).
- Structural hash folds SPANS in explicitly: Node.span is repr=False,
  so repr(decl) alone is position-blind — a shifted function would
  replay stale line numbers. Caught by the new span-shift test before
  it ever shipped; spans now participate via a walk_nodes digest.
- walk_nodes: generic dataclass-field AST walker (no isinstance chain
  to maintain — the #597 walker-completeness lesson applied
  structurally).
- DischargeCache: bounded FIFO; put() refuses entries containing any
  timeout-status obligation (hard rail: solver-timeout outcomes are
  load-dependent and must never replay).

vera/verifier.py: public register_program() seam (verify_program now
delegates to it); the session drives per-function verification
selectively while registration stays whole-program.

vera/obligations/session.py: incremental drive in declaration order so
replayed and fresh slices interleave into the exact cold output stream;
SessionRunStats (replayed_fns / verified_fns) added for cache
observability — additive, API otherwise unchanged.

Tests (tests/test_obligations.py, +8):
- The existing corpus-wide warm-twice oracle now exercises the replay
  path on all 35 examples + every verify/run conformance program for
  free (warm2 is all-replay and must equal warm1 == cold).
- TestIncrementalInvalidation pins each rule: identical-source full
  replay; callee-body-edit re-verifies only the callee while the
  caller replays; callee-contract-edit invalidates the caller (and the
  tightened precondition surfaces a violated call_pre at the call
  site — proof of genuine re-verification); span-shift and ADT-edit
  conservative invalidation; cross-program isolation; timeout-status
  never cached (monkeypatched solver, second run re-verifies); FIFO
  eviction bound.

Validation: 259 obligation tests + 237 hard-rail verifier tests green;
mypy strict clean (63 files); ruff S clean; doc counts consistent
(4,220 tests).

Part of #222 (Phase B). Release prep follows before the PR.

Co-Authored-By: Claude <noreply@anthropic.invalid>
Version bump across all tracked sites, CHANGELOG cut for 0.0.162,
HISTORY.md single-sentence row, footer count, regenerated site assets.

Skip-changelog: the CHANGELOG cut for 0.0.162 IS this commit's changelog change.

Co-Authored-By: Claude <noreply@anthropic.invalid>
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3bff51d8-2af3-4c24-9aad-b49f7de0f967

📥 Commits

Reviewing files that changed from the base of the PR and between 1b85c73 and 5badfa5.

📒 Files selected for processing (2)
  • TESTING.md
  • tests/test_obligations.py

📝 Walkthrough

Walkthrough

Implements Phase B incremental verification: per-function discharge cache with deterministic invalidation keys (function structure, direct-callee inputs, program context). VerificationSession iterates top-level functions, replays cached diagnostics/obligations/deltas on key hits, verifies-and-caches on misses. Adds tests and updates release/docs and package versions for v0.0.162.

Changes

Phase B Incremental Verification Caching

Layer / File(s) Summary
Discharge cache: hashing, keys, and storage
vera/obligations/cache.py
Deterministic per-function cache keys via fn_structural_hash (function + spans), callee_component (direct callees' verification inputs excluding bodies), and program_context_hash (non-function decls, timeout, file, resolved-module keys). Adds FnCacheEntry and DischargeCache with FIFO eviction and a rule refusing to cache entries containing any obligation with status == "timeout".
Session integration: incremental per-function verification
vera/obligations/session.py, vera/obligations/__init__.py, vera/verifier.py
VerificationSession.verify_source refactored to compute a context hash, iterate top-level functions, compute per-function invalidation keys, replay cached FnCacheEntry on hits (accumulating diagnostics, obligations, and summary deltas) and verify/cache on misses. Adds SessionRunStats, initialises a per-session DischargeCache, and exposes SessionRunStats through vera.obligations. ContractVerifier.register_program() provides the registration seam.
Comprehensive cache replay and invalidation tests
tests/test_obligations.py
Adds TestIncrementalInvalidation covering warm-vs-cold equivalence and targeted invalidation scenarios: identical-source replay, callee-body reverify, callee-contract invalidation of callers, conservative span-shift invalidation, ADT invalidation, file-name isolation, timeout non-replay, and DischargeCache FIFO eviction.
Version and documentation updates
CHANGELOG.md, HISTORY.md, README.md, ROADMAP.md, TESTING.md, pyproject.toml, vera/__init__.py
Bumps version to 0.0.162 and updates release notes and project/test metrics to reflect the new release and Phase B feature.

Sequence Diagram(s)

sequenceDiagram
  participant VerificationSession
  participant DischargeCache
  participant ContractVerifier
  VerificationSession->>ContractVerifier: register_program(program)
  loop each FnDecl
    VerificationSession->>DischargeCache: get(fn_cache_key)
    alt cache hit
      DischargeCache-->>VerificationSession: FnCacheEntry
      VerificationSession->>VerificationSession: append cached diagnostics/obligations/deltas
    else cache miss
      VerificationSession->>ContractVerifier: _verify_fn(decl)
      ContractVerifier-->>VerificationSession: diagnostics, obligations, summary delta
      VerificationSession->>DischargeCache: put(key, entry)
    end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

  • aallan/vera#716: Phase A baseline for obligations/session warm re-verification reused and extended by this Phase B incremental cache work.

Suggested labels

compiler, tests, docs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.07% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and precisely summarizes the main feature: incremental invalidation and discharge cache implementation (Phase B), matching the substantial core changes across cache.py, session.py, and verifier.py.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/222-phase-b-incremental

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.96%. Comparing base (becb3d9) to head (5badfa5).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #717      +/-   ##
==========================================
+ Coverage   90.92%   90.96%   +0.03%     
==========================================
  Files          63       64       +1     
  Lines       24216    24318     +102     
  Branches      292      292              
==========================================
+ Hits        22019    22121     +102     
  Misses       2190     2190              
  Partials        7        7              
Flag Coverage Δ
javascript 61.40% <ø> (ø)
python 94.70% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_obligations.py`:
- Around line 441-465: The tests are too weak: in
test_adt_edit_invalidates_everything and test_cross_program_no_stale_bleed
ensure no warm replay and that edited runs match a cold run. After calling
VerificationSession().verify_source(edited) (and verify_source(other)), assert
session.last_run_stats.replayed_fns == 0 (not just verified_fns checks) and call
self._assert_matches_cold(edited, result) / self._assert_matches_cold(other,
result) respectively to compare edited/other run output against a cold run;
reference VerificationSession, session.verify_source,
last_run_stats.replayed_fns, last_run_stats.verified_fns, and
self._assert_matches_cold to locate where to add these assertions.

In `@vera/obligations/session.py`:
- Around line 175-182: The current context hash uses tuple(repr(m) for m in
resolved_modules) which can embed non-deterministic elements (absolute Path,
differing cwd/symlinks and full source text) and cause cache churn; update the
code that calls program_context_hash to build a deterministic key from each
ResolvedModule by normalising/canonicalising the module path (e.g.
resolved/canonical path or relative repo-root path) and including a stable
digest of the module source (e.g. hash of m.source) rather than repr(m). Locate
uses of resolved_modules and the program_context_hash invocation in session.py
(symbols: ResolvedModule, resolved_modules, program_context_hash) and replace
the repr-based tuple with a tuple of stable primitives like (normalized_path,
source_digest) for each module.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 857abe57-0faa-4d85-874b-78f61eba41f0

📥 Commits

Reviewing files that changed from the base of the PR and between becb3d9 and 0e02aef.

⛔ Files ignored due to path filters (5)
  • docs/index.html is excluded by !docs/**
  • docs/index.md is excluded by !docs/**
  • docs/llms-full.txt is excluded by !docs/**
  • docs/llms.txt is excluded by !docs/**
  • uv.lock is excluded by !**/*.lock, !uv.lock
📒 Files selected for processing (12)
  • CHANGELOG.md
  • HISTORY.md
  • README.md
  • ROADMAP.md
  • TESTING.md
  • pyproject.toml
  • tests/test_obligations.py
  • vera/__init__.py
  • vera/obligations/__init__.py
  • vera/obligations/cache.py
  • vera/obligations/session.py
  • vera/verifier.py

Comment thread tests/test_obligations.py Outdated
Comment thread vera/obligations/session.py
Both findings valid:

- Stable module keys in the context hash: repr(ResolvedModule) baked in
  the absolute file_path (canonicalisation-sensitive across
  cwd/symlinks) and the full parsed AST repr (expensive, derived from
  source anyway). Now (module path, source sha256) pairs — cheaper and
  deterministic; the source digest covers every contract/signature
  change a caller could read.

- Test tightening: test_adt_edit_invalidates_everything now checks the
  edited warm output against cold; test_cross_program_no_stale_bleed
  rewritten to match its name — identical source under a different
  file name must fully re-verify (file is baked into cached diagnostic
  locations), asserting verified==2/replayed==0 plus per-file location
  correctness, instead of the previous related-program edit that
  legitimately allowed one replay.

Co-Authored-By: Claude <noreply@anthropic.invalid>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
vera/obligations/session.py (1)

154-158: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reset last_run_stats on the type-check short-circuit.

A failing typecheck() returns before this run assigns fresh stats, so VerificationSession.last_run_stats leaks replay/verify counts from the previous successful call. Anything reading cache observability after an error will see stale data.

Proposed fix
         if any(d.severity == "error" for d in check_diags):
             # Type errors: skip verification, matching cmd_verify.
+            self.last_run_stats = SessionRunStats()
             return SessionVerifyResult(
                 ok=False, check_diagnostics=check_diags,
             )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vera/obligations/session.py` around lines 154 - 158, In the typecheck()
short-circuit where you return a SessionVerifyResult on error (the if
any(d.severity == "error" for d in check_diags) branch), reset
VerificationSession.last_run_stats to an empty/default stats value (e.g. zeroed
replay/verify counts or None) before returning so stale counts aren’t leaked;
update the branch that returns SessionVerifyResult(...) to set
self.last_run_stats = <default stats> (or None) prior to the return.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_obligations.py`:
- Around line 458-473: The final assertion is vacuous because
result.verify_diagnostics is empty for a clean fixture; update
test_cross_program_no_stale_bleed so the file-name check is meaningful by
ensuring there is at least one diagnostic before verifying its file: either
assert result.verify_diagnostics is non-empty (e.g., assert
len(result.verify_diagnostics) > 0) or call session.verify_source with input
that produces a diagnostic for "prog_b.vera", then loop over
result.verify_diagnostics to assert d.location.file == "prog_b.vera"; reference
the test function name test_cross_program_no_stale_bleed, the
session.verify_source calls, result.verify_diagnostics, and
last_run_stats.replayed_fns when making the change.

---

Outside diff comments:
In `@vera/obligations/session.py`:
- Around line 154-158: In the typecheck() short-circuit where you return a
SessionVerifyResult on error (the if any(d.severity == "error" for d in
check_diags) branch), reset VerificationSession.last_run_stats to an
empty/default stats value (e.g. zeroed replay/verify counts or None) before
returning so stale counts aren’t leaked; update the branch that returns
SessionVerifyResult(...) to set self.last_run_stats = <default stats> (or None)
prior to the return.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 59aa8c03-17c7-451d-b624-0a929f11a1c2

📥 Commits

Reviewing files that changed from the base of the PR and between 0e02aef and 1b85c73.

📒 Files selected for processing (4)
  • TESTING.md
  • tests/test_obligations.py
  • vera/obligations/cache.py
  • vera/obligations/session.py

Comment thread tests/test_obligations.py
CodeRabbit round 2 (valid): the fixture verified cleanly, so the
all(...location.file...) assertion passed over an empty diagnostics
list. The source now carries a violated postcondition so both runs
produce at least one diagnostic (asserted non-empty), and the location
check genuinely proves the second run's output carries prog_b.vera —
a stale replay from prog_a would fail it.

Co-Authored-By: Claude <noreply@anthropic.invalid>
@aallan aallan merged commit 19abc7e into main Jun 10, 2026
27 checks passed
@aallan aallan deleted the feat/222-phase-b-incremental branch June 10, 2026 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant