Skip to content

feat: reify proof obligations + warm Z3 session (#222 Phase A)#716

Merged
aallan merged 4 commits into
mainfrom
feat/222-phase-a-obligation-core
Jun 10, 2026
Merged

feat: reify proof obligations + warm Z3 session (#222 Phase A)#716
aallan merged 4 commits into
mainfrom
feat/222-phase-a-obligation-core

Conversation

@aallan

@aallan aallan commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Summary

Phase A of the #222 LSP / obligation-core plan (per the implementation plan): proof obligations become first-class records, and a warm VerificationSession re-verifies full programs on one long-lived Z3 solver. Behavior-preserving by construction and pinned by a full-corpus differential oracle. This PR does not finish #222 — Phases B–E follow; the issue stays open.

What's in it

New vera/obligations/ package

  • core.pyProofObligation: identity (owning fn, kind, expression text, span, sha256 content_key()) + outcome (status, counterexample, error code). Kinds: requires / ensures / decreases / nat_sub / call_pre. Status vocabulary mirrors the verifier's tier bookkeeping exactly (verified ↔ tier-1; tier3+timeout ↔ tier-3; violated ↔ error diagnostic, excluded from totals per the existing summary.total -= 1 convention).
  • session.pyVerificationSession, the Phase A daemon: owns one SmtContext lazily, reset() between functions, ADT-registry resync between programs, caches last_program for Phase B's diffing. Single-file project model; not thread-safe by design (Phase D serialises through one worker).
  • __init__.pycore eager, session symbols lazy via PEP 562 (breaks the verifier ↔ session import cycle).

vera/verifier.pyVerifyResult.obligations (default-empty, source-compatible); ContractVerifier(shared_smt=...) warm hook (cold path byte-identical: fresh context per function); all six discharge sites record obligations observationally, in discharge order, never touching solver state.

vera/smt.pyreset() hardened for its first-ever caller: now clears _path_conditions / _index_fns / _array_element_sorts / _length_fns (re-seeded) and re-applies the solver timeout. The _length_fns clear is load-bearing: get_rank_fn asserts its ForAll rank(x) >= 0 axiom only at dict-miss, so a surviving cache entry after solver.reset() silently dropped the axiom and ADT-measure decreases checks diverged warm-vs-cold. Latent-only until now (zero callers); found while wiring the warm path; pinned by the oracle.

Design note: reification granularity

The plan phrase "enumerate then discharge" is implemented at per-obligation granularity (construct record → discharge → record outcome), not as a global two-pass (translate all goals, then check all). A global two-pass cannot be behavior-preserving here: @Nat-sub obligations discharge against transient smt._path_conditions mid-walk, call-site preconditions are checked inside translate_expr, and ensures translation mutates set_result_var. Per-site reification preserves discharge order byte-for-byte while giving Phases B/E exactly what they need: an enumerable obligation stream with stable identities and outcomes. Phase A records call_pre only on violation (successful call-site checks discharge silently inside the SMT layer); enumerating successes is a Phase B extension where the cache needs them.

Verification

  • Differential oracle (tests/test_obligations.py, 250 tests): warm session == cold verify() on diagnostics (content + order), summary (field-for-field), and obligation stream (identity + outcome), plus warm-twice determinism — across all 35 examples and every verify/run-level conformance program. Runs in ~2.5 s.
  • Hard rail: test_verifier.py + test_verifier_coverage.py pass unchanged (237/237).
  • Summary ↔ obligation consistency asserted on every corpus program (tier counters must equal obligation-status counts).
  • Unit coverage per kind/status: violated ensures carries counterexample; violated nat_sub carries E502; generic-fn contracts → tier3/E520; trivial contracts enumerate as verified; content keys stable across runs and distinct for identical text at two sites; session reuses one solver, short-circuits on type errors, resyncs ADT registry between programs.
  • mypy vera/ strict clean (62 files); full non-stress suite green via pre-commit on both commits; doc counts consistent (3,961 → 4,211 tests, 32 → 33 files).

Release prep (in this PR per workflow)

v0.0.161 across all tracked sites, CHANGELOG cut (Phase A entry + reset() fix; the pending codecov ### Security entry from the previous cycle rides along), HISTORY.md single-sentence row, site assets regenerated, tag on branch.

Part of #222 (Phase A of the phased plan — B–E follow in their own PRs).

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added first‑class proof‑obligation tracking and a warm verification session to reuse solver state across runs.
  • Bug Fixes

    • Fixed stale solver state and ensured solver timeout is reapplied when reusing the solver.
  • Tests

    • Added ~250 tests covering obligation recording, warm-vs-cold determinism, session caching and resynchronisation.
  • Chores

    • Bumped package version and updated changelog/history/README/roadmap/testing summaries.

aallan and others added 2 commits June 10, 2026 13:11
Phase A of the LSP / obligation-core plan on #222: obligations become
first-class records and a long-lived session reuses one z3.Solver
across a whole program. Behavior-preserving by construction and by
test: reification is observational (records created at the existing
discharge sites, in discharge order, never altering solver state), and
the new differential oracle pins warm == cold across the full corpus.

New package vera/obligations/:
- core.py: ProofObligation dataclass — identity (fn, kind, expr text,
  span, content_key() sha256 digest) + outcome (status, counterexample,
  error code). Kinds: requires / ensures / decreases / nat_sub /
  call_pre. Status vocabulary mirrors the verifier's tier bookkeeping
  (verified <-> tier1; tier3+timeout <-> tier3_runtime; violated <->
  error diagnostic, excluded from totals).
- session.py: VerificationSession — the Phase A daemon object. Owns one
  SmtContext lazily; verify_source() runs parse -> typecheck ->
  ContractVerifier(shared_smt=...) with reset() between functions and
  ADT-registry resync between programs; caches last_program for Phase
  B's diffing. Single-file project model (imports resolve from disk);
  not thread-safe by design (Phase D serialises through one worker).
- __init__.py: core exported eagerly; session symbols lazily via PEP
  562 to break the verifier <-> session import cycle.

vera/verifier.py:
- VerifyResult gains obligations (default empty — source-compatible).
- ContractVerifier gains shared_smt hook; cold path (None) is
  byte-identical to before: fresh SmtContext per function.
- All six discharge sites record obligations: generic-fn early return
  (E520), requires (assumed; E521 fallback), ensures (E522/E523
  fallbacks, E524 timeout, violations with counterexample), decreases
  (E525 fallback), @Nat-@Nat sites (E502 violations), call-site
  precondition violations (Phase A records call_pre only on violation;
  successful call-pre checks discharge silently inside the SMT layer's
  _translate_call — enumerating successes is a Phase B extension).

vera/smt.py:
- reset() hardened for actual use (it previously had zero callers):
  now also clears _path_conditions, _index_fns, _array_element_sorts,
  and _length_fns (re-seeded with the Int default), and re-applies the
  solver timeout. The _length_fns clear is load-bearing: get_rank_fn
  asserts its ForAll rank(x) >= 0 axiom only at dict-miss, so a
  surviving cache entry after solver.reset() would silently skip the
  axiom and ADT-measure decreases checks would diverge warm-vs-cold.
  Found by inspection while wiring the warm path; pinned by the oracle.

tests/test_obligations.py (250 tests):
- Differential oracle: warm VerificationSession == cold verify() on
  diagnostics (content + order), summary (field-for-field), and the
  obligation stream (identity + outcome), plus warm-twice determinism,
  parametrized over all 35 examples and every verify/run-level
  conformance program.
- Summary <-> obligation consistency on every corpus program.
- Unit tests per kind/status: violated ensures carries counterexample;
  violated nat_sub carries E502; generic-fn contracts fall to tier3
  with E520; trivial contracts enumerate as verified; content keys are
  stable across runs and distinguish identical text at two sites;
  session reuses one solver, caches last_program, short-circuits on
  type errors, and resyncs the ADT registry between programs.

Existing suites pass unchanged (hard rail): test_verifier.py +
test_verifier_coverage.py 237/237. mypy strict clean (62 files).
Doc counts updated (3,961 -> 4,211 tests; 32 -> 33 files; new
TESTING.md row); site assets regenerated.

Part of #222 (Phase A). Release prep and PR follow in the next commit.

Co-Authored-By: Claude <noreply@anthropic.invalid>
Version bump across all tracked sites (vera/__init__.py, pyproject.toml,
README.md, docs/index.html, uv.lock), CHANGELOG cut for 0.0.161 (Phase A
Added entry + reset() staleness Fixed entry; the pending codecov Security
entry from the previous cycle rides along), HISTORY.md single-sentence
row, footer counts, regenerated site assets.

Skip-changelog: the CHANGELOG cut for 0.0.161 IS this commit's changelog change.

Co-Authored-By: Claude <noreply@anthropic.invalid>
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1f101504-ade0-4179-8438-d33f19435708

📥 Commits

Reviewing files that changed from the base of the PR and between 39d9ff6 and b9435bc.

📒 Files selected for processing (2)
  • TESTING.md
  • tests/test_obligations.py

📝 Walkthrough

Walkthrough

Adds Phase A proof-obligations: reified ProofObligation records, VerifyResult.obligations, obligation recording at all discharge sites, a warm VerificationSession that reuses a single SmtContext/solver with safe reset logic, comprehensive tests validating warm vs cold equivalence and session invariants, plus version/docs updates.

Changes

Phase A Proof Obligations Infrastructure

Layer / File(s) Summary
Obligation data model and expression rendering
vera/obligations/core.py
Adds ObligationKind/ObligationStatus, ProofObligation dataclass (identity fields, outcome fields, content_key() SHA-256) and expr_text_for() rendering for contract nodes.
Package initialization and lazy exports
vera/obligations/__init__.py
Creates package initializer re-exporting core types and implements PEP 562 __getattr__ to lazily expose SessionVerifyResult/VerificationSession to avoid circular imports.
Warm verification session and result wrapper
vera/obligations/session.py
Adds VerificationSession that lazily acquires a shared SmtContext, clears per-program ADT registries, runs parse→transform→typecheck→verify, and returns SessionVerifyResult with separated check/verify diagnostics, summary and obligation stream.
SMT context warm-reuse reset
vera/smt.py
Stores Z3 timeout in _timeout_ms and expands reset() to clear solver state, reapply timeout, reinitialize per-function translation state and purge inter-function Z3 metadata while preserving ADT registries for per-program clearing.
Obligation recording and verifier changes
vera/verifier.py
Adds VerifyResult.obligations, ContractVerifier.shared_smt (warm reuse), _record_obligation() and _contract_kind(). Records obligations at generic/requires/ensures/call-site/decreases/nat-sub discharge sites, including status, error codes (E501–E525, E502), and Z3 counterexamples where applicable.
Comprehensive differential and unit test suite
tests/test_obligations.py
Adds corpus-based differential oracle and unit tests (≈250 tests) checking warm vs cold equivalence, determinism, obligation identity/stability and error-code semantics, plus session invariants: solver reuse, last_program caching, ADT registry resync and type-error short-circuit.
Version bump and documentation updates
pyproject.toml, vera/__init__.py, CHANGELOG.md, HISTORY.md, README.md, ROADMAP.md, TESTING.md
Bumps version to 0.0.161, updates CHANGELOG with Phase A notes and SmtContext.reset() fix, HISTORY stage entry and day counts, README/ROADMAP/TESTING metrics and test-file table entries.

Sequence Diagrams

sequenceDiagram
  participant Client
  participant VerificationSession
  participant SmtContext
  participant ContractVerifier
  participant Z3Solver
  Client->>VerificationSession: verify_source(source_1)
  VerificationSession->>SmtContext: _acquire_smt() [lazy init]
  SmtContext->>Z3Solver: create()
  SmtContext->>SmtContext: reset() [clear caches, reapply timeout]
  VerificationSession->>VerificationSession: parse, transform, typecheck
  VerificationSession->>ContractVerifier: __init__(shared_smt=SmtContext)
  ContractVerifier->>Z3Solver: verify_program() [reuse solver]
  Note over ContractVerifier: record obligations: verified / violated / tier3 / timeout
  VerificationSession->>Client: SessionVerifyResult(ok, diagnostics, obligations)
  Client->>VerificationSession: verify_source(source_2)
  Note over SmtContext: reset() [clear per-function caches, preserve ADT registries]
  VerificationSession->>ContractVerifier: __init__(shared_smt=SmtContext)
  ContractVerifier->>Z3Solver: verify_program() [reuse same solver]
  VerificationSession->>Client: SessionVerifyResult(ok, diagnostics, obligations)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • aallan/vera#554: Related prior work on @Nat - @Nat`` underflow obligations and E502 error handling; this PR refactors/centralises obligation emission into the reified ProofObligation and VerifyResult.obligations architecture.

Suggested labels

compiler, tests, docs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the main changes: reification of proof obligations and introduction of a warm Z3 session for Phase A of issue #222, matching the core objectives.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/222-phase-a-obligation-core

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.49624% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.92%. Comparing base (d9fb8e0) to head (b9435bc).

Files with missing lines Patch % Lines
vera/obligations/session.py 98.03% 1 Missing ⚠️
vera/verifier.py 97.29% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #716      +/-   ##
==========================================
+ Coverage   90.84%   90.92%   +0.08%     
==========================================
  Files          60       63       +3     
  Lines       24086    24216     +130     
  Branches      292      292              
==========================================
+ Hits        21880    22019     +139     
+ Misses       2199     2190       -9     
  Partials        7        7              
Flag Coverage Δ
javascript 61.40% <ø> (ø)
python 94.67% <98.49%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
vera/verifier.py (1)

772-784: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Preserve the actual decreases result in the recorded obligation.

_verify_decreases() currently returns only bool, so every non-verified decreases check is recorded here as tier3/E525. If check_valid() found a concrete non-decreasing recursive call, that violated outcome and its counterexample are lost before the obligation is emitted. The new API then misreports the discharge result even though the solver already knew more.

Suggested direction
-    def _verify_decreases(...) -> bool:
+    def _verify_decreases(
+        ...
+    ) -> tuple[ObligationStatus, dict[str, str] | None]:
@@
-            if result.status != "verified":
-                return False
+            if result.status != "verified":
+                return result.status, result.counterexample
@@
-        return True
+        return "verified", None

Then map verified / violated / timeout at the call site when recording the obligation. If you want to keep the existing warning-only UX for now, you can still do that separately; the obligation stream should not flatten the solver outcome.

Based on the Phase A contract in the PR summary, these records are meant to capture per-obligation discharge outcomes, not just “verified vs everything else”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vera/verifier.py` around lines 772 - 784, The call site flattens every
non-verified result from _verify_decreases into a tier3/E525 record, losing
concrete violated/timeout outcomes and counterexamples; change _verify_decreases
to return a richer outcome (e.g., enum/string "verified"|"violated"|"timeout"
plus optional counterexample/trace) and then update this call site to switch on
that outcome: increment summary.tier1_verified and record "verified" when
outcome == "verified"; increment the appropriate counter (e.g.,
summary.tier3_runtime or a new summary.tier3_violated) and call
_record_obligation with "violated" and include the counterexample when outcome
== "violated"; map solver timeout to "timeout" with its own record and
error_code if needed; leave UX warnings separate but ensure _record_obligation
receives the actual outcome and any counterexample from _verify_decreases (use
symbols _verify_decreases, _record_obligation, and any check_valid-produced
counterexample object).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@vera/verifier.py`:
- Around line 659-662: The recorded obligation is using v.precondition (the
callee contract text/span) which collapses distinct call-site obligations;
change the _record_obligation invocation in verifier.py to key the obligation by
the call-site expression/span (use the field on v that represents the call site,
e.g. v.call_expr or v.call_span) instead of v.precondition so
ProofObligation.content_key() sees the call-site span and disambiguates calls,
and also include the original violation code emitted by _report_call_violation
(E501 or v.code) when calling _record_obligation so the E501 code is not
dropped.

---

Outside diff comments:
In `@vera/verifier.py`:
- Around line 772-784: The call site flattens every non-verified result from
_verify_decreases into a tier3/E525 record, losing concrete violated/timeout
outcomes and counterexamples; change _verify_decreases to return a richer
outcome (e.g., enum/string "verified"|"violated"|"timeout" plus optional
counterexample/trace) and then update this call site to switch on that outcome:
increment summary.tier1_verified and record "verified" when outcome ==
"verified"; increment the appropriate counter (e.g., summary.tier3_runtime or a
new summary.tier3_violated) and call _record_obligation with "violated" and
include the counterexample when outcome == "violated"; map solver timeout to
"timeout" with its own record and error_code if needed; leave UX warnings
separate but ensure _record_obligation receives the actual outcome and any
counterexample from _verify_decreases (use symbols _verify_decreases,
_record_obligation, and any check_valid-produced counterexample object).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f02a9a3f-0798-4103-b4b6-158a1740b1aa

📥 Commits

Reviewing files that changed from the base of the PR and between d9fb8e0 and a3f580e.

⛔ Files ignored due to path filters (6)
  • docs/index.html is excluded by !docs/**
  • docs/index.md is excluded by !docs/**
  • docs/llms-full.txt is excluded by !docs/**
  • docs/llms.txt is excluded by !docs/**
  • docs/sitemap.xml is excluded by !docs/**
  • uv.lock is excluded by !**/*.lock, !uv.lock
📒 Files selected for processing (13)
  • CHANGELOG.md
  • HISTORY.md
  • README.md
  • ROADMAP.md
  • TESTING.md
  • pyproject.toml
  • tests/test_obligations.py
  • vera/__init__.py
  • vera/obligations/__init__.py
  • vera/obligations/core.py
  • vera/obligations/session.py
  • vera/smt.py
  • vera/verifier.py

Comment thread vera/verifier.py
Two findings, both valid:

- ruff S110 (security lint, CI lint job): expr_text_for wrapped
  format_expr in try/except-pass as a defensive guard. format_expr is
  total (it ends in a '<expr>' fallback), so the guard was dead code —
  removed it rather than noqa'ing it. The class-name fallback stays for
  the non-Expr contract shape (Invariant, #686, which never reaches the
  function-contract path today).

- CodeRabbit (Major): call_pre obligations were keyed on the callee's
  contract node, so two call sites violating the same precondition
  inherited the contract's span and collapsed to one content_key —
  breaking Phase B cache identity — and dropped the E501 code that
  _report_call_violation emits. _record_obligation gains a span_node
  override; call_pre now records the CALL SITE span, the callee's
  contract text, and error_code=E501. New test pins the scenario: two
  violating calls -> two obligations, same expr_text, distinct
  content_keys, both E501, spans at the caller not the contract.

Validation: ruff S clean, mypy clean (62 files), 251 obligation tests +
237 hard-rail verifier tests green, doc counts updated (4,212).

Co-Authored-By: Claude <noreply@anthropic.invalid>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_obligations.py`:
- Around line 320-325: The current assertions only compare two results and
exclude line 2 but don't pin the actual callee text or the precise call-site
line; update the test to assert the concrete expected callee contract text and
the exact caller call-site line to prevent false positives: add an assertion
that call_pres[0].expr_text (and by extension call_pres[1].expr_text) equals the
known expected callee contract string (use a local variable like
expected_callee_text) and assert that every o in call_pres has o.line ==
expected_call_site_line (use a local expected_call_site_line integer), keeping
the existing content_key() inequality check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5f570690-849c-442f-bef5-312c419d1133

📥 Commits

Reviewing files that changed from the base of the PR and between a3f580e and 39d9ff6.

📒 Files selected for processing (6)
  • README.md
  • ROADMAP.md
  • TESTING.md
  • tests/test_obligations.py
  • vera/obligations/core.py
  • vera/verifier.py

Comment thread tests/test_obligations.py Outdated
CodeRabbit round 2 (valid): the call-site keying test asserted only
relational properties (texts equal each other, line != 2), which an
incorrect mapping could still satisfy. Now pins the exact rendered
contract text ('@Int.0 > 0' — format_expr renders bare, without the
backticks CodeRabbit's suggested diff guessed), the exact caller line
(14) for both obligations, and distinct columns for the two call sites.

Co-Authored-By: Claude <noreply@anthropic.invalid>
@aallan aallan merged commit becb3d9 into main Jun 10, 2026
27 checks passed
@aallan aallan deleted the feat/222-phase-a-obligation-core branch June 10, 2026 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant