feat: incremental invalidation + discharge cache (#222 Phase B)#717
Conversation
Behind Phase A's unchanged API: VerificationSession now caches each top-level function's verification output (obligations, diagnostics, summary deltas) and replays it when the function's invalidation key is unchanged, re-entering Z3 only for invalidated functions. New vera/obligations/cache.py: - fn_cache_key = H(structural hash) + H(direct-callee interfaces) + H(program context). Soundness model documented in the module docstring: callee CONTRACT/signature changes invalidate callers (preconditions checked at call sites, postconditions assumed); callee BODY changes do not (bodies never read across the boundary); ADT / alias / effect / import / timeout / file changes invalidate everything (coarse, conservative). - Structural hash folds SPANS in explicitly: Node.span is repr=False, so repr(decl) alone is position-blind — a shifted function would replay stale line numbers. Caught by the new span-shift test before it ever shipped; spans now participate via a walk_nodes digest. - walk_nodes: generic dataclass-field AST walker (no isinstance chain to maintain — the #597 walker-completeness lesson applied structurally). - DischargeCache: bounded FIFO; put() refuses entries containing any timeout-status obligation (hard rail: solver-timeout outcomes are load-dependent and must never replay). vera/verifier.py: public register_program() seam (verify_program now delegates to it); the session drives per-function verification selectively while registration stays whole-program. vera/obligations/session.py: incremental drive in declaration order so replayed and fresh slices interleave into the exact cold output stream; SessionRunStats (replayed_fns / verified_fns) added for cache observability — additive, API otherwise unchanged. Tests (tests/test_obligations.py, +8): - The existing corpus-wide warm-twice oracle now exercises the replay path on all 35 examples + every verify/run conformance program for free (warm2 is all-replay and must equal warm1 == cold). - TestIncrementalInvalidation pins each rule: identical-source full replay; callee-body-edit re-verifies only the callee while the caller replays; callee-contract-edit invalidates the caller (and the tightened precondition surfaces a violated call_pre at the call site — proof of genuine re-verification); span-shift and ADT-edit conservative invalidation; cross-program isolation; timeout-status never cached (monkeypatched solver, second run re-verifies); FIFO eviction bound. Validation: 259 obligation tests + 237 hard-rail verifier tests green; mypy strict clean (63 files); ruff S clean; doc counts consistent (4,220 tests). Part of #222 (Phase B). Release prep follows before the PR. Co-Authored-By: Claude <noreply@anthropic.invalid>
Version bump across all tracked sites, CHANGELOG cut for 0.0.162, HISTORY.md single-sentence row, footer count, regenerated site assets. Skip-changelog: the CHANGELOG cut for 0.0.162 IS this commit's changelog change. Co-Authored-By: Claude <noreply@anthropic.invalid>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughImplements Phase B incremental verification: per-function discharge cache with deterministic invalidation keys (function structure, direct-callee inputs, program context). VerificationSession iterates top-level functions, replays cached diagnostics/obligations/deltas on key hits, verifies-and-caches on misses. Adds tests and updates release/docs and package versions for v0.0.162. ChangesPhase B Incremental Verification Caching
Sequence Diagram(s)sequenceDiagram
participant VerificationSession
participant DischargeCache
participant ContractVerifier
VerificationSession->>ContractVerifier: register_program(program)
loop each FnDecl
VerificationSession->>DischargeCache: get(fn_cache_key)
alt cache hit
DischargeCache-->>VerificationSession: FnCacheEntry
VerificationSession->>VerificationSession: append cached diagnostics/obligations/deltas
else cache miss
VerificationSession->>ContractVerifier: _verify_fn(decl)
ContractVerifier-->>VerificationSession: diagnostics, obligations, summary delta
VerificationSession->>DischargeCache: put(key, entry)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #717 +/- ##
==========================================
+ Coverage 90.92% 90.96% +0.03%
==========================================
Files 63 64 +1
Lines 24216 24318 +102
Branches 292 292
==========================================
+ Hits 22019 22121 +102
Misses 2190 2190
Partials 7 7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test_obligations.py`:
- Around line 441-465: The tests are too weak: in
test_adt_edit_invalidates_everything and test_cross_program_no_stale_bleed
ensure no warm replay and that edited runs match a cold run. After calling
VerificationSession().verify_source(edited) (and verify_source(other)), assert
session.last_run_stats.replayed_fns == 0 (not just verified_fns checks) and call
self._assert_matches_cold(edited, result) / self._assert_matches_cold(other,
result) respectively to compare edited/other run output against a cold run;
reference VerificationSession, session.verify_source,
last_run_stats.replayed_fns, last_run_stats.verified_fns, and
self._assert_matches_cold to locate where to add these assertions.
In `@vera/obligations/session.py`:
- Around line 175-182: The current context hash uses tuple(repr(m) for m in
resolved_modules) which can embed non-deterministic elements (absolute Path,
differing cwd/symlinks and full source text) and cause cache churn; update the
code that calls program_context_hash to build a deterministic key from each
ResolvedModule by normalising/canonicalising the module path (e.g.
resolved/canonical path or relative repo-root path) and including a stable
digest of the module source (e.g. hash of m.source) rather than repr(m). Locate
uses of resolved_modules and the program_context_hash invocation in session.py
(symbols: ResolvedModule, resolved_modules, program_context_hash) and replace
the repr-based tuple with a tuple of stable primitives like (normalized_path,
source_digest) for each module.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 857abe57-0faa-4d85-874b-78f61eba41f0
⛔ Files ignored due to path filters (5)
docs/index.htmlis excluded by!docs/**docs/index.mdis excluded by!docs/**docs/llms-full.txtis excluded by!docs/**docs/llms.txtis excluded by!docs/**uv.lockis excluded by!**/*.lock,!uv.lock
📒 Files selected for processing (12)
CHANGELOG.mdHISTORY.mdREADME.mdROADMAP.mdTESTING.mdpyproject.tomltests/test_obligations.pyvera/__init__.pyvera/obligations/__init__.pyvera/obligations/cache.pyvera/obligations/session.pyvera/verifier.py
Both findings valid: - Stable module keys in the context hash: repr(ResolvedModule) baked in the absolute file_path (canonicalisation-sensitive across cwd/symlinks) and the full parsed AST repr (expensive, derived from source anyway). Now (module path, source sha256) pairs — cheaper and deterministic; the source digest covers every contract/signature change a caller could read. - Test tightening: test_adt_edit_invalidates_everything now checks the edited warm output against cold; test_cross_program_no_stale_bleed rewritten to match its name — identical source under a different file name must fully re-verify (file is baked into cached diagnostic locations), asserting verified==2/replayed==0 plus per-file location correctness, instead of the previous related-program edit that legitimately allowed one replay. Co-Authored-By: Claude <noreply@anthropic.invalid>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
vera/obligations/session.py (1)
154-158:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReset
last_run_statson the type-check short-circuit.A failing
typecheck()returns before this run assigns fresh stats, soVerificationSession.last_run_statsleaks replay/verify counts from the previous successful call. Anything reading cache observability after an error will see stale data.Proposed fix
if any(d.severity == "error" for d in check_diags): # Type errors: skip verification, matching cmd_verify. + self.last_run_stats = SessionRunStats() return SessionVerifyResult( ok=False, check_diagnostics=check_diags, )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@vera/obligations/session.py` around lines 154 - 158, In the typecheck() short-circuit where you return a SessionVerifyResult on error (the if any(d.severity == "error" for d in check_diags) branch), reset VerificationSession.last_run_stats to an empty/default stats value (e.g. zeroed replay/verify counts or None) before returning so stale counts aren’t leaked; update the branch that returns SessionVerifyResult(...) to set self.last_run_stats = <default stats> (or None) prior to the return.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test_obligations.py`:
- Around line 458-473: The final assertion is vacuous because
result.verify_diagnostics is empty for a clean fixture; update
test_cross_program_no_stale_bleed so the file-name check is meaningful by
ensuring there is at least one diagnostic before verifying its file: either
assert result.verify_diagnostics is non-empty (e.g., assert
len(result.verify_diagnostics) > 0) or call session.verify_source with input
that produces a diagnostic for "prog_b.vera", then loop over
result.verify_diagnostics to assert d.location.file == "prog_b.vera"; reference
the test function name test_cross_program_no_stale_bleed, the
session.verify_source calls, result.verify_diagnostics, and
last_run_stats.replayed_fns when making the change.
---
Outside diff comments:
In `@vera/obligations/session.py`:
- Around line 154-158: In the typecheck() short-circuit where you return a
SessionVerifyResult on error (the if any(d.severity == "error" for d in
check_diags) branch), reset VerificationSession.last_run_stats to an
empty/default stats value (e.g. zeroed replay/verify counts or None) before
returning so stale counts aren’t leaked; update the branch that returns
SessionVerifyResult(...) to set self.last_run_stats = <default stats> (or None)
prior to the return.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 59aa8c03-17c7-451d-b624-0a929f11a1c2
📒 Files selected for processing (4)
TESTING.mdtests/test_obligations.pyvera/obligations/cache.pyvera/obligations/session.py
CodeRabbit round 2 (valid): the fixture verified cleanly, so the all(...location.file...) assertion passed over an empty diagnostics list. The source now carries a violated postcondition so both runs produce at least one diagnostic (asserted non-empty), and the location check genuinely proves the second run's output carries prog_b.vera — a stale replay from prog_a would fail it. Co-Authored-By: Claude <noreply@anthropic.invalid>
Summary
Phase B of the #222 plan: incremental verification behind Phase A's unchanged API.
VerificationSessionnow caches each top-level function's verification output and replays it when the function's invalidation key is unchanged — only invalidated functions re-enter Z3. This PR does not finish #222 (Phases C–E follow; the issue stays open).The soundness model (
vera/obligations/cache.py)A function's cache key covers everything its verification reads:
fn_structural_hashFnDeclsubtree including spanslocation/source_line, obligations carry line/col, so a moved function is a deliberate misscallee_componentprogram_context_hashHard rail honoured: entries containing any timeout-status obligation are never cached (
DischargeCache.putrefuses them) — solver-timeout outcomes are load-dependent.A bug the oracle caught before it shipped
The first structural hash was
sha256(repr(decl))— butNode.spanis declaredrepr=False, so the hash was position-blind: a function shifted down one line would have replayed cached output with stale line numbers. The targeted span-shift test failed immediately; the hash now folds every node's span in via a genericwalk_nodesdigest (a dataclass-field walker — no isinstance chain to maintain, the #597 walker-completeness lesson applied structurally).Implementation
vera/obligations/cache.py(new): key derivation,walk_nodes,FnCacheEntry, bounded-FIFODischargeCache.vera/verifier.py: publicregister_program()seam (verify_programdelegates — behaviour identical); the session drives per-function verification selectively while registration stays whole-program.vera/obligations/session.py: incremental drive in declaration order, so replayed and fresh slices interleave into the exact cold output stream.SessionRunStats(replayed/verified counts) added — additive observability; API otherwise unchanged.Verification
verify()on diagnostics, summary, and the obligation stream.TestIncrementalInvalidation, +8): identical-source full replay; callee-body-edit re-verifies only the callee while the caller replays; callee-contract-edit invalidates the caller — and the tightened precondition surfaces a violatedcall_preat the call site, proving genuine re-verification; span-shift and ADT-edit conservative invalidation; cross-program isolation; timeout-never-cached (monkeypatched solver, second run re-verifies); FIFO eviction bound.test_verifier.py+test_verifier_coverage.pypass unchanged (237/237).mypy vera/strict clean (63 files); ruff S clean; full non-stress suite green via pre-commit on both commits; doc counts consistent (4,220 tests).Release prep (in this PR per workflow)
v0.0.162 across all tracked sites, CHANGELOG cut, HISTORY.md single-sentence row, site assets regenerated, tag on branch.
Part of #222 (Phase B of the phased plan — C–E follow in their own PRs).
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests
Documentation