feat: reify proof obligations + warm Z3 session (#222 Phase A)#716
Conversation
Phase A of the LSP / obligation-core plan on #222: obligations become first-class records and a long-lived session reuses one z3.Solver across a whole program. Behavior-preserving by construction and by test: reification is observational (records created at the existing discharge sites, in discharge order, never altering solver state), and the new differential oracle pins warm == cold across the full corpus. New package vera/obligations/: - core.py: ProofObligation dataclass — identity (fn, kind, expr text, span, content_key() sha256 digest) + outcome (status, counterexample, error code). Kinds: requires / ensures / decreases / nat_sub / call_pre. Status vocabulary mirrors the verifier's tier bookkeeping (verified <-> tier1; tier3+timeout <-> tier3_runtime; violated <-> error diagnostic, excluded from totals). - session.py: VerificationSession — the Phase A daemon object. Owns one SmtContext lazily; verify_source() runs parse -> typecheck -> ContractVerifier(shared_smt=...) with reset() between functions and ADT-registry resync between programs; caches last_program for Phase B's diffing. Single-file project model (imports resolve from disk); not thread-safe by design (Phase D serialises through one worker). - __init__.py: core exported eagerly; session symbols lazily via PEP 562 to break the verifier <-> session import cycle. vera/verifier.py: - VerifyResult gains obligations (default empty — source-compatible). - ContractVerifier gains shared_smt hook; cold path (None) is byte-identical to before: fresh SmtContext per function. - All six discharge sites record obligations: generic-fn early return (E520), requires (assumed; E521 fallback), ensures (E522/E523 fallbacks, E524 timeout, violations with counterexample), decreases (E525 fallback), @Nat-@Nat sites (E502 violations), call-site precondition violations (Phase A records call_pre only on violation; successful call-pre checks discharge silently inside the SMT layer's _translate_call — enumerating successes is a Phase B extension). vera/smt.py: - reset() hardened for actual use (it previously had zero callers): now also clears _path_conditions, _index_fns, _array_element_sorts, and _length_fns (re-seeded with the Int default), and re-applies the solver timeout. The _length_fns clear is load-bearing: get_rank_fn asserts its ForAll rank(x) >= 0 axiom only at dict-miss, so a surviving cache entry after solver.reset() would silently skip the axiom and ADT-measure decreases checks would diverge warm-vs-cold. Found by inspection while wiring the warm path; pinned by the oracle. tests/test_obligations.py (250 tests): - Differential oracle: warm VerificationSession == cold verify() on diagnostics (content + order), summary (field-for-field), and the obligation stream (identity + outcome), plus warm-twice determinism, parametrized over all 35 examples and every verify/run-level conformance program. - Summary <-> obligation consistency on every corpus program. - Unit tests per kind/status: violated ensures carries counterexample; violated nat_sub carries E502; generic-fn contracts fall to tier3 with E520; trivial contracts enumerate as verified; content keys are stable across runs and distinguish identical text at two sites; session reuses one solver, caches last_program, short-circuits on type errors, and resyncs the ADT registry between programs. Existing suites pass unchanged (hard rail): test_verifier.py + test_verifier_coverage.py 237/237. mypy strict clean (62 files). Doc counts updated (3,961 -> 4,211 tests; 32 -> 33 files; new TESTING.md row); site assets regenerated. Part of #222 (Phase A). Release prep and PR follow in the next commit. Co-Authored-By: Claude <noreply@anthropic.invalid>
Version bump across all tracked sites (vera/__init__.py, pyproject.toml, README.md, docs/index.html, uv.lock), CHANGELOG cut for 0.0.161 (Phase A Added entry + reset() staleness Fixed entry; the pending codecov Security entry from the previous cycle rides along), HISTORY.md single-sentence row, footer counts, regenerated site assets. Skip-changelog: the CHANGELOG cut for 0.0.161 IS this commit's changelog change. Co-Authored-By: Claude <noreply@anthropic.invalid>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds Phase A proof-obligations: reified ProofObligation records, VerifyResult.obligations, obligation recording at all discharge sites, a warm VerificationSession that reuses a single SmtContext/solver with safe reset logic, comprehensive tests validating warm vs cold equivalence and session invariants, plus version/docs updates. ChangesPhase A Proof Obligations Infrastructure
Sequence DiagramssequenceDiagram
participant Client
participant VerificationSession
participant SmtContext
participant ContractVerifier
participant Z3Solver
Client->>VerificationSession: verify_source(source_1)
VerificationSession->>SmtContext: _acquire_smt() [lazy init]
SmtContext->>Z3Solver: create()
SmtContext->>SmtContext: reset() [clear caches, reapply timeout]
VerificationSession->>VerificationSession: parse, transform, typecheck
VerificationSession->>ContractVerifier: __init__(shared_smt=SmtContext)
ContractVerifier->>Z3Solver: verify_program() [reuse solver]
Note over ContractVerifier: record obligations: verified / violated / tier3 / timeout
VerificationSession->>Client: SessionVerifyResult(ok, diagnostics, obligations)
Client->>VerificationSession: verify_source(source_2)
Note over SmtContext: reset() [clear per-function caches, preserve ADT registries]
VerificationSession->>ContractVerifier: __init__(shared_smt=SmtContext)
ContractVerifier->>Z3Solver: verify_program() [reuse same solver]
VerificationSession->>Client: SessionVerifyResult(ok, diagnostics, obligations)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #716 +/- ##
==========================================
+ Coverage 90.84% 90.92% +0.08%
==========================================
Files 60 63 +3
Lines 24086 24216 +130
Branches 292 292
==========================================
+ Hits 21880 22019 +139
+ Misses 2199 2190 -9
Partials 7 7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
vera/verifier.py (1)
772-784:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftPreserve the actual decreases result in the recorded obligation.
_verify_decreases()currently returns onlybool, so every non-verified decreases check is recorded here astier3/E525. Ifcheck_valid()found a concrete non-decreasing recursive call, thatviolatedoutcome and its counterexample are lost before the obligation is emitted. The new API then misreports the discharge result even though the solver already knew more.Suggested direction
- def _verify_decreases(...) -> bool: + def _verify_decreases( + ... + ) -> tuple[ObligationStatus, dict[str, str] | None]: @@ - if result.status != "verified": - return False + if result.status != "verified": + return result.status, result.counterexample @@ - return True + return "verified", NoneThen map
verified/violated/timeoutat the call site when recording the obligation. If you want to keep the existing warning-only UX for now, you can still do that separately; the obligation stream should not flatten the solver outcome.Based on the Phase A contract in the PR summary, these records are meant to capture per-obligation discharge outcomes, not just “verified vs everything else”.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@vera/verifier.py` around lines 772 - 784, The call site flattens every non-verified result from _verify_decreases into a tier3/E525 record, losing concrete violated/timeout outcomes and counterexamples; change _verify_decreases to return a richer outcome (e.g., enum/string "verified"|"violated"|"timeout" plus optional counterexample/trace) and then update this call site to switch on that outcome: increment summary.tier1_verified and record "verified" when outcome == "verified"; increment the appropriate counter (e.g., summary.tier3_runtime or a new summary.tier3_violated) and call _record_obligation with "violated" and include the counterexample when outcome == "violated"; map solver timeout to "timeout" with its own record and error_code if needed; leave UX warnings separate but ensure _record_obligation receives the actual outcome and any counterexample from _verify_decreases (use symbols _verify_decreases, _record_obligation, and any check_valid-produced counterexample object).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@vera/verifier.py`:
- Around line 659-662: The recorded obligation is using v.precondition (the
callee contract text/span) which collapses distinct call-site obligations;
change the _record_obligation invocation in verifier.py to key the obligation by
the call-site expression/span (use the field on v that represents the call site,
e.g. v.call_expr or v.call_span) instead of v.precondition so
ProofObligation.content_key() sees the call-site span and disambiguates calls,
and also include the original violation code emitted by _report_call_violation
(E501 or v.code) when calling _record_obligation so the E501 code is not
dropped.
---
Outside diff comments:
In `@vera/verifier.py`:
- Around line 772-784: The call site flattens every non-verified result from
_verify_decreases into a tier3/E525 record, losing concrete violated/timeout
outcomes and counterexamples; change _verify_decreases to return a richer
outcome (e.g., enum/string "verified"|"violated"|"timeout" plus optional
counterexample/trace) and then update this call site to switch on that outcome:
increment summary.tier1_verified and record "verified" when outcome ==
"verified"; increment the appropriate counter (e.g., summary.tier3_runtime or a
new summary.tier3_violated) and call _record_obligation with "violated" and
include the counterexample when outcome == "violated"; map solver timeout to
"timeout" with its own record and error_code if needed; leave UX warnings
separate but ensure _record_obligation receives the actual outcome and any
counterexample from _verify_decreases (use symbols _verify_decreases,
_record_obligation, and any check_valid-produced counterexample object).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: f02a9a3f-0798-4103-b4b6-158a1740b1aa
⛔ Files ignored due to path filters (6)
docs/index.htmlis excluded by!docs/**docs/index.mdis excluded by!docs/**docs/llms-full.txtis excluded by!docs/**docs/llms.txtis excluded by!docs/**docs/sitemap.xmlis excluded by!docs/**uv.lockis excluded by!**/*.lock,!uv.lock
📒 Files selected for processing (13)
CHANGELOG.mdHISTORY.mdREADME.mdROADMAP.mdTESTING.mdpyproject.tomltests/test_obligations.pyvera/__init__.pyvera/obligations/__init__.pyvera/obligations/core.pyvera/obligations/session.pyvera/smt.pyvera/verifier.py
Two findings, both valid: - ruff S110 (security lint, CI lint job): expr_text_for wrapped format_expr in try/except-pass as a defensive guard. format_expr is total (it ends in a '<expr>' fallback), so the guard was dead code — removed it rather than noqa'ing it. The class-name fallback stays for the non-Expr contract shape (Invariant, #686, which never reaches the function-contract path today). - CodeRabbit (Major): call_pre obligations were keyed on the callee's contract node, so two call sites violating the same precondition inherited the contract's span and collapsed to one content_key — breaking Phase B cache identity — and dropped the E501 code that _report_call_violation emits. _record_obligation gains a span_node override; call_pre now records the CALL SITE span, the callee's contract text, and error_code=E501. New test pins the scenario: two violating calls -> two obligations, same expr_text, distinct content_keys, both E501, spans at the caller not the contract. Validation: ruff S clean, mypy clean (62 files), 251 obligation tests + 237 hard-rail verifier tests green, doc counts updated (4,212). Co-Authored-By: Claude <noreply@anthropic.invalid>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test_obligations.py`:
- Around line 320-325: The current assertions only compare two results and
exclude line 2 but don't pin the actual callee text or the precise call-site
line; update the test to assert the concrete expected callee contract text and
the exact caller call-site line to prevent false positives: add an assertion
that call_pres[0].expr_text (and by extension call_pres[1].expr_text) equals the
known expected callee contract string (use a local variable like
expected_callee_text) and assert that every o in call_pres has o.line ==
expected_call_site_line (use a local expected_call_site_line integer), keeping
the existing content_key() inequality check.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5f570690-849c-442f-bef5-312c419d1133
📒 Files selected for processing (6)
README.mdROADMAP.mdTESTING.mdtests/test_obligations.pyvera/obligations/core.pyvera/verifier.py
CodeRabbit round 2 (valid): the call-site keying test asserted only
relational properties (texts equal each other, line != 2), which an
incorrect mapping could still satisfy. Now pins the exact rendered
contract text ('@Int.0 > 0' — format_expr renders bare, without the
backticks CodeRabbit's suggested diff guessed), the exact caller line
(14) for both obligations, and distinct columns for the two call sites.
Co-Authored-By: Claude <noreply@anthropic.invalid>
Summary
Phase A of the #222 LSP / obligation-core plan (per the implementation plan): proof obligations become first-class records, and a warm
VerificationSessionre-verifies full programs on one long-lived Z3 solver. Behavior-preserving by construction and pinned by a full-corpus differential oracle. This PR does not finish #222 — Phases B–E follow; the issue stays open.What's in it
New
vera/obligations/packagecore.py—ProofObligation: identity (owning fn, kind, expression text, span, sha256content_key()) + outcome (status, counterexample, error code). Kinds:requires/ensures/decreases/nat_sub/call_pre. Status vocabulary mirrors the verifier's tier bookkeeping exactly (verified↔ tier-1;tier3+timeout↔ tier-3;violated↔ error diagnostic, excluded from totals per the existingsummary.total -= 1convention).session.py—VerificationSession, the Phase A daemon: owns oneSmtContextlazily,reset()between functions, ADT-registry resync between programs, cacheslast_programfor Phase B's diffing. Single-file project model; not thread-safe by design (Phase D serialises through one worker).__init__.py—coreeager, session symbols lazy via PEP 562 (breaks the verifier ↔ session import cycle).vera/verifier.py—VerifyResult.obligations(default-empty, source-compatible);ContractVerifier(shared_smt=...)warm hook (cold path byte-identical: fresh context per function); all six discharge sites record obligations observationally, in discharge order, never touching solver state.vera/smt.py—reset()hardened for its first-ever caller: now clears_path_conditions/_index_fns/_array_element_sorts/_length_fns(re-seeded) and re-applies the solver timeout. The_length_fnsclear is load-bearing:get_rank_fnasserts itsForAll rank(x) >= 0axiom only at dict-miss, so a surviving cache entry aftersolver.reset()silently dropped the axiom and ADT-measuredecreaseschecks diverged warm-vs-cold. Latent-only until now (zero callers); found while wiring the warm path; pinned by the oracle.Design note: reification granularity
The plan phrase "enumerate then discharge" is implemented at per-obligation granularity (construct record → discharge → record outcome), not as a global two-pass (translate all goals, then check all). A global two-pass cannot be behavior-preserving here:
@Nat-sub obligations discharge against transientsmt._path_conditionsmid-walk, call-site preconditions are checked insidetranslate_expr, and ensures translation mutatesset_result_var. Per-site reification preserves discharge order byte-for-byte while giving Phases B/E exactly what they need: an enumerable obligation stream with stable identities and outcomes. Phase A recordscall_preonly on violation (successful call-site checks discharge silently inside the SMT layer); enumerating successes is a Phase B extension where the cache needs them.Verification
tests/test_obligations.py, 250 tests): warm session == coldverify()on diagnostics (content + order), summary (field-for-field), and obligation stream (identity + outcome), plus warm-twice determinism — across all 35 examples and every verify/run-level conformance program. Runs in ~2.5 s.test_verifier.py+test_verifier_coverage.pypass unchanged (237/237).nat_subcarries E502; generic-fn contracts → tier3/E520; trivial contracts enumerate as verified; content keys stable across runs and distinct for identical text at two sites; session reuses one solver, short-circuits on type errors, resyncs ADT registry between programs.mypy vera/strict clean (62 files); full non-stress suite green via pre-commit on both commits; doc counts consistent (3,961 → 4,211 tests, 32 → 33 files).Release prep (in this PR per workflow)
v0.0.161 across all tracked sites, CHANGELOG cut (Phase A entry +
reset()fix; the pending codecov### Securityentry from the previous cycle rides along), HISTORY.md single-sentence row, site assets regenerated, tag on branch.Part of #222 (Phase A of the phased plan — B–E follow in their own PRs).
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores