Docs sweep: address compiler-review findings 2026-05-18 (5 issues filed)#684
Conversation
A strategic-mode review on 2026-05-18 surfaced 6 findings. All 6 independently verified against HEAD (v0.0.155, commit 08c1197). This PR addresses the docs half — code fixes land separately under the filed issues. ## What changed **README.md** — softened the "Contracts the compiler proves" section. The previous wording ("Division by zero is not a runtime error — it is a type error. The compiler checks every call site to prove the divisor is non-zero.") overclaimed: the verifier checks contracts the programmer wrote, not auto-synthesised obligations on primitive operations. New wording correctly describes what's static (declared preconditions discharged at every call site) and what's runtime (unguarded primitives trap with Vera-native diagnostic), with a forward reference to #680 which tracks the implementation work to make the original claim true. **FAQ.md** — new Q&A "Does the compiler prove division-by-zero, out-of-bounds indexing, etc. can't happen?". Explicitly walks through the static/runtime split, references the #520 Nat- subtraction precedent (the one auto-synthesised obligation that exists today), and points at #680 + #427 as the long-term path. **spec/06-contracts.md** — new §6.4.3 "Primitive Operation Safety" between Call Site Verification and SMT Solver Integration. States explicitly that obligations on `a / b`, `a % b`, `arr[i]`, `string_at(s, i)`, etc. are not auto-synthesised; the exception is `@Nat` subtraction (#520). Cross-references #680 and #427. Subsequent sections re-numbered (§6.4.4 SMT Solver Integration, §6.4.5 Counterexample Reporting — was §6.4.3 and §6.4.4). **KNOWN_ISSUES.md** — restored the Bugs section with #681 (`vera test` reports refuted functions as VERIFIED). This was confirmed by the review with a clean reproducer: `vera verify` correctly emits E500 with counterexample, but `vera test` reports the buggy function as `VERIFIED (Tier 1)` with exit code 0. The classifier at `vera/tester.py:393` only harvests `severity == "warning"` Tier 3 codes; `severity == "error"` codes flow past untouched. Workaround documented: run `vera verify` before trusting `vera test` results. Fix is half a day's work, landing separately. **ROADMAP.md** — three new tracking items added under CI tooling (#682 diagnostic-tagging discipline, #683 grammar drift), one under verification depth (#680 auto-inject primitive obligations), one under testing gaps (#679 Chapter 8 conformance programs). **Auxiliary**: - `scripts/check_faq_examples.py` and `scripts/check_spec_examples.py` allowlist line numbers updated (auto-fixed via `python scripts/fix_allowlists.py --fix`) — the Abilities snippet in FAQ shifted by ~11 lines from the new Q&A insertion; the spec example in §6.4.5 shifted from line 323 to 338 from the new §6.4.3. - Site assets regenerated (`docs/llms-full.txt`, `docs/sitemap.xml`). ## Findings deferred from this PR | Finding | Status | |---|---| | **P0-1** (`vera test` mis-classification) | #681 filed; documented in KNOWN_ISSUES.md; fix is a separate ~half-day PR | | **P1-1** (diagnostic tagging) | #682 filed; ROADMAP entry added; multi-day mechanical work scoped for a follow-up PR | | **P1-2** (Ch 8 conformance) | #679 filed; ROADMAP entry added; ~day of work | | **P2-1** (grammar drift) | #683 filed; scoped to nominal renames only (expression-precedence ladders are deliberate notational difference) | | **P2-2** (Tier 2 / `invariant()` strategic call) | Tier 2 (#427) genuinely long-running; `invariant()` doc-state already resolved by PR #650. Strategic call deferred — flagged to Alasdair for direction. | ## Validation - [x] `pytest tests/` — 3,875 passed, 14 skipped, 16 stress-deselected - [x] `python scripts/check_conformance.py` — 86/86 - [x] `python scripts/check_examples.py` — 34/34 - [x] `python scripts/check_doc_counts.py` — consistent - [x] `python scripts/check_limitations_sync.py` — consistent (30 in KNOWN_ISSUES, 18 in vera/README, 4 across spec) - [x] `python scripts/check_readme_examples.py` — pass - [x] `python scripts/check_faq_examples.py` — pass (after allowlist update) - [x] `python scripts/check_spec_examples.py` — pass (after allowlist update) - [x] `python scripts/check_skill_examples.py` — pass - [x] `python scripts/check_site_assets.py` — up-to-date Co-Authored-By: Claude <noreply@anthropic.invalid>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR clarifies verifier behaviour for primitive operations: the verifier does not auto-synthesise well‑definedness obligations (division/modulo/indexing); programmers must supply ChangesPrimitive Operation Safety Clarification
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #684 +/- ##
=======================================
Coverage 91.02% 91.02%
=======================================
Files 60 60
Lines 23413 23413
Branches 259 259
=======================================
Hits 21312 21312
Misses 2094 2094
Partials 7 7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
External contributor @rzyns filed two careful reports on 2026-05-15 (#674 and #675). This commit addresses #675 inline with the existing docs PR (#684) and reassigns the KNOWN_ISSUES.md entry from #681 to #674 (my dupe of #674 from today's compiler-review sweep — closed as duplicate). ## #675 — E500 fix text names all three repair classes Pre-fix `fix=` text at `vera/verifier.py:1306-1308` named only two repair classes: strengthen `requires(...)` or weaken `ensures(...)`. It missed the third (often most common) class: fix the implementation to satisfy the existing contract. When E500 catches a typo in the function body (e.g. `@Int.0 + @Int.0 + 1` when the contract says `@Int.0 + @Int.0`), the user's intended repair is almost always "fix the typo" — the previous wording implicitly biased them toward weakening the contract instead. New wording lists all three repair classes neutrally: > Resolve the mismatch between the function body and its > contract: fix the implementation so it satisfies this > ensures() clause, strengthen requires(...) if the > counterexample is outside the intended input domain, or > weaken/change ensures(...) if the postcondition overstates > the intended guarantee. Implementation-repair comes first because it's the most common case in practice; the verifier is otherwise neutral about which artifact is "the source of truth". Tightened `tests/test_verifier.py::TestCounterexamples:: test_violation_has_fix_suggestion` to assert all three classes appear in the lowercased fix text. The pre-existing assertion (`"precondition" or "postcondition" in fix.lower()`) survives the rewrite but would not have caught a regression dropping "implementation"; the new assertion does. Closes #675. ## #674 / #681 reassignment #674 was filed by @rzyns on 2026-05-15 reporting that `vera test --json` reports `category: "verified"` for functions the verifier has refuted with E500. I filed the same bug as #681 today (2026-05-18) from a strategic compiler-review session without realising #674 existed. #681 is closed as a duplicate; the `KNOWN_ISSUES.md` Bugs entry now references #674 with a polished reproducer (@rzyns's `@Int + @int + 1` form is cleaner than my `@Nat * 2 + 1`) and explicit mention of the CI-consumes-JSON failure mode. The underlying bug remains open; the fix is a separate code PR (half a day's work — extend `vera/tester.py::_classify_ functions` to harvest `severity == "error"` codes too, add a `"failed"` category, set non-zero exit code). ## Validation - [x] `pytest tests/test_verifier.py::TestCounterexamples::test_violation_has_fix_suggestion -v` passes - [x] `pytest tests/` — 3,875 passed, 14 skipped, 16 stress-deselected - [x] `mypy vera/` clean - [x] `ruff check --select S vera/` clean - [x] `check_doc_counts.py`, `check_limitations_sync.py`, `check_readme_examples.py`, `check_faq_examples.py`, `check_spec_examples.py`, `check_site_assets.py` all consistent - [x] TESTING.md test_verifier.py line count: 2,094 → 2,124 Co-Authored-By: Claude <noreply@anthropic.invalid>
CI's check_changelog_updated.py correctly flagged that PR #684 touches vera/verifier.py and spec/06-contracts.md without a matching CHANGELOG.md entry. Adds an [Unreleased] section covering: - Fixed: #675 E500 fix-text rewording (external report from @rzyns) - Documentation: the 2026-05-18 compiler-review docs sweep (README softening, FAQ Q&A, spec section 6.4.3, KNOWN_ISSUES #674 reassignment, ROADMAP additions for #679/#680/#682/#683) Co-Authored-By: Claude <noreply@anthropic.invalid>
CodeRabbit posted three findings on rzyns's PR plus there's a CI gate (doc-counts) and a project convention (empty Bugs table) to address. Two additional minor refactors (DRY constant, public helper rename) included since they touch the same files. **CR Finding 1: _bad_vera missing encoding="utf-8"** (test portability). Pre-existing helper used 30+ times in test_cli.py since long before aallan#685, but CR caught it now because rzyns's new tests added more callers. Windows runs would fall back to cp1252 on the default text-mode encoding and corrupt UTF-8 sequences in Vera fixtures. CLAUDE.md "Cross-platform pitfalls" already flags this as a convention. One-line fix on the helper benefits every caller. **CR Finding 2: needs_pos.category != "failed" too weak** (test_tester.py). In `test_call_precondition_error_fails_caller_ not_callee`, the negative assertion would still pass if needs_pos were classified as "skipped" or any other non-failed bucket. Strengthened to `== "verified"` so the test actually pins that the callee is correctly proven, not just that it avoided the failed bucket. **CR Finding 3: failed_fns dict overwrite is order-dependent** (vera/tester.py). When a single function attracts multiple verifier errors (e.g. `ensures(false)` produces E500 AND `@Nat - @Nat` produces E502 on the same body — the `test_displayed_failed_function_does_not_double_count_verifier_ errors` test exercises exactly this shape), the loop overwrites the dict entry and the displayed `reason` flips with diagnostic iteration order. Switched to first-hit (`if name not in failed_fns`) — `setdefault`-style semantics, no judgement call on which code "should" win. **Doc counts: 7 stale numbers** (CI gate would fail). `check_doc_counts.py` flagged TESTING.md total, test_cli.py + test_tester.py + test_tester_coverage.py rows, and ROADMAP.md test count. Updated all 7 plus README.md test count for consistency: 3,905 → 3,916 total. All counts now consistent. **KNOWN_ISSUES.md empty table** (project convention). PR removed the aallan#674 row but left the table header `| Bug | Issue |` with no body. Per the convention established at aallan#673's release ("retain the Bugs section, and just have some text in it to say 'No known bugs'"), replaced the empty table with the convention line. **Nit 1: DRY** — `{"E500", "E501", "E502"}` was duplicated between vera/tester.py and vera/cli.py. Extracted as `VERIFICATION_ERROR_CODES = frozenset(...)` module-level in vera/tester.py; cli.py imports it. `frozenset` over plain set so the constant is immutable (matches its semantic role as a classifier-set constant). **Nit 2: private name reaching across modules** — `cli.py` was importing `_failed_function_name` from vera/tester.py. The single-underscore prefix signals "private by convention" but cli.py's reliance on it was a leak. Renamed `_failed_function_name` → `failed_function_name` (public); docstring updated to mention the cli.py call site explicitly. Validation: - pytest 3,886 / 3,886 (no test count change — the renames don't affect test behaviour, just the names) - mypy clean - ruff S clean - conformance 86/86, examples 34/34 - check_doc_counts.py consistent - limitations sync consistent - check_changelog_updated.py passes (existing [Unreleased] entry from rzyns's commit + the additional carry-over from PR aallan#684) - check_site_assets.py up-to-date Co-Authored-By: Claude <noreply@anthropic.invalid>
Summary
Docs sweep addressing findings from a strategic-mode compiler review on 2026-05-18, plus a small inline fix for #675 (E500 fix-text wording, reported externally by @rzyns on 2026-05-15) that came up while writing the PR. All 6 review findings independently verified against HEAD (v0.0.155, commit
08c1197) before acting.Findings handled
vera testmis-classifies refuted functionsvera/tester.py:393only harvestsseverity == "warning"Tier 3 codesKNOWN_ISSUES.mdBugs section. Reassigned from my dup #681 to @rzyns's earlier #674 (their report came 3 days first and is better-written). Fix is a separate ~half-day PR.@Int.1 / @Int.0verifies cleanlyspec/06.4.3paragraph; filed #680 for the implementation workFix:invariant()strategic callinvariant()already doc-resolved via PR #650Bonus inline fix: #675 (E500 fix text)
@rzyns also reported #675 — the E500
fix=text named only two repair classes (strengthenrequires(...)/ weakenensures(...)) and missed the most common third class: fix the implementation to satisfy the existing contract. When E500 catches a typo in the function body, the user's intended repair is almost always "fix the typo" — the previous wording implicitly biased them toward weakening the contract instead.New wording at
vera/verifier.py:1305-1322:Implementation-repair comes first because it's the most common case in practice; the verifier is otherwise neutral about which artifact is the source of truth. Tightened
test_violation_has_fix_suggestionto pin all three repair classes (the pre-existing assertion"precondition" or "postcondition" in fix.lower()survives the rewrite but would not have caught a regression dropping "implementation"; the new assertion does).What changed
fix=text rewritten (3-class enumeration per E500 postcondition diagnostic fix text should mention implementation repair #675).test_violation_has_fix_suggestiontightened to pin all three repair classes.Strategic call deferred
P2-2 (Tier 2 verification /
invariant()clauses) needs a direction:invariant()— substantial work (2–4 weeks), closes spec-vs-implementation gapThe review pushed for option 2 as the cheapest move; option 1 as a stretch goal. No action in this PR pending direction.
Validation
pytest tests/— 3,875 passed, 14 skipped, 16 stress-deselectedmypy vera/— cleanruff check --select S vera/— cleanpython scripts/check_conformance.py— 86/86python scripts/check_examples.py— 34/34python scripts/check_doc_counts.py— consistent (3,905 tests)python scripts/check_limitations_sync.py— consistentcheck_readme_examples.py,check_faq_examples.py,check_spec_examples.py,check_skill_examples.py— passpython scripts/check_site_assets.py— up-to-dateTest plan
Closes #675.
🤖 Generated with Claude Code
Summary by CodeRabbit