Skip to content

Docs sweep: address compiler-review findings 2026-05-18 (5 issues filed)#684

Merged
aallan merged 3 commits into
mainfrom
docs-2026-05-18-compiler-review-followups
May 18, 2026
Merged

Docs sweep: address compiler-review findings 2026-05-18 (5 issues filed)#684
aallan merged 3 commits into
mainfrom
docs-2026-05-18-compiler-review-followups

Conversation

@aallan

@aallan aallan commented May 18, 2026

Copy link
Copy Markdown
Owner

Summary

Docs sweep addressing findings from a strategic-mode compiler review on 2026-05-18, plus a small inline fix for #675 (E500 fix-text wording, reported externally by @rzyns on 2026-05-15) that came up while writing the PR. All 6 review findings independently verified against HEAD (v0.0.155, commit 08c1197) before acting.

Findings handled

Finding Verified? Action
P0-1vera test mis-classifies refuted functions ✅ Reproduced exactly; classifier at vera/tester.py:393 only harvests severity == "warning" Tier 3 codes Restored to KNOWN_ISSUES.md Bugs section. Reassigned from my dup #681 to @rzyns's earlier #674 (their report came 3 days first and is better-written). Fix is a separate ~half-day PR.
P0-2 — README overclaims primitive-safety guarantees ✅ Unguarded @Int.1 / @Int.0 verifies cleanly This PR: softened README, added FAQ Q&A, new spec/06.4.3 paragraph; filed #680 for the implementation work
P1-1 — diagnostic-tagging discipline gap ✅ 22/98 sites fully tagged; E140 confirmed missing Fix: Filed #682; added to ROADMAP CI tooling
P1-2 — Chapter 8 conformance gap ✅ All chapters 1–7, 9–10 present; ch08 absent Filed #679; added to ROADMAP testing gaps
P2-1 — spec EBNF / Lark drift ⚠ Partially confirmed — nominal renames real; expression-precedence "drift" is deliberate notational choice Filed #683 scoped to nominal renames
P2-2 — Tier 2 / invariant() strategic call ✅ Tier 2 (#427) genuinely long-running; invariant() already doc-resolved via PR #650 Strategic call deferred — flagged below for direction

Bonus inline fix: #675 (E500 fix text)

@rzyns also reported #675 — the E500 fix= text named only two repair classes (strengthen requires(...) / weaken ensures(...)) and missed the most common third class: fix the implementation to satisfy the existing contract. When E500 catches a typo in the function body, the user's intended repair is almost always "fix the typo" — the previous wording implicitly biased them toward weakening the contract instead.

New wording at vera/verifier.py:1305-1322:

Resolve the mismatch between the function body and its contract: fix the implementation so it satisfies this ensures() clause, strengthen requires(...) if the counterexample is outside the intended input domain, or weaken/change ensures(...) if the postcondition overstates the intended guarantee.

Implementation-repair comes first because it's the most common case in practice; the verifier is otherwise neutral about which artifact is the source of truth. Tightened test_violation_has_fix_suggestion to pin all three repair classes (the pre-existing assertion "precondition" or "postcondition" in fix.lower() survives the rewrite but would not have caught a regression dropping "implementation"; the new assertion does).

What changed

  • README.md — "Contracts the compiler proves" section reworded. Previous text overclaimed; new text correctly describes static vs runtime split with forward reference to #680.
  • FAQ.md — new Q&A "Does the compiler prove division-by-zero, out-of-bounds indexing, etc. can't happen?" walks through the static/runtime split, references #520 as the existing precedent.
  • spec/06-contracts.md — new §6.4.3 "Primitive Operation Safety". Subsequent sections renumbered.
  • KNOWN_ISSUES.md — Bugs section restored with #674 (using @rzyns's cleaner reproducer + CI-failure-mode explanation).
  • ROADMAP.md — added #679 (Ch 8 conformance), #680 (auto-inject obligations), #682 (diagnostic-tagging), #683 (grammar drift) under appropriate sections.
  • vera/verifier.py — E500 fix= text rewritten (3-class enumeration per E500 postcondition diagnostic fix text should mention implementation repair #675).
  • tests/test_verifier.pytest_violation_has_fix_suggestion tightened to pin all three repair classes.
  • Auxiliary — allowlist line-numbers updated for shifted FAQ/spec code blocks; site assets regenerated; TESTING.md test_verifier.py line count refreshed.

Strategic call deferred

P2-2 (Tier 2 verification / invariant() clauses) needs a direction:

  1. Ship Tier 2 + invariant() — substantial work (2–4 weeks), closes spec-vs-implementation gap
  2. Demote both to a "Future Features" appendix — one-day mechanical move; spec describes what the compiler does today
  3. Leave as-is — status quo

The review pushed for option 2 as the cheapest move; option 1 as a stretch goal. No action in this PR pending direction.

Validation

  • pytest tests/3,875 passed, 14 skipped, 16 stress-deselected
  • mypy vera/ — clean
  • ruff check --select S vera/ — clean
  • python scripts/check_conformance.py — 86/86
  • python scripts/check_examples.py — 34/34
  • python scripts/check_doc_counts.py — consistent (3,905 tests)
  • python scripts/check_limitations_sync.py — consistent
  • check_readme_examples.py, check_faq_examples.py, check_spec_examples.py, check_skill_examples.py — pass
  • python scripts/check_site_assets.py — up-to-date
  • All 18 pre-commit hooks green

Test plan

Closes #675.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation
    • Added FAQ and updated README/spec/roadmap to clarify how preconditions affect division-by-zero, indexing and runtime traps, describe trap diagnostics with “Fix:” guidance, and note planned auto-injection of primitive-operation obligations.
  • Known Issues
    • Documented misclassification where test can report functions as verified despite verifier counterexamples; added workaround.
  • Tests
    • Strengthened regression test to require comprehensive repair-suggestion text in diagnostics.
  • Changelog
    • Recorded documentation and diagnostic-fix entries.

Review Change Stack

A strategic-mode review on 2026-05-18 surfaced 6 findings.  All 6
independently verified against HEAD (v0.0.155, commit 08c1197).
This PR addresses the docs half — code fixes land separately
under the filed issues.

## What changed

**README.md** — softened the "Contracts the compiler proves"
section.  The previous wording ("Division by zero is not a runtime
error — it is a type error.  The compiler checks every call site
to prove the divisor is non-zero.") overclaimed: the verifier
checks contracts the programmer wrote, not auto-synthesised
obligations on primitive operations.  New wording correctly
describes what's static (declared preconditions discharged at
every call site) and what's runtime (unguarded primitives trap
with Vera-native diagnostic), with a forward reference to #680
which tracks the implementation work to make the original claim
true.

**FAQ.md** — new Q&A "Does the compiler prove division-by-zero,
out-of-bounds indexing, etc. can't happen?".  Explicitly walks
through the static/runtime split, references the #520 Nat-
subtraction precedent (the one auto-synthesised obligation that
exists today), and points at #680 + #427 as the long-term path.

**spec/06-contracts.md** — new §6.4.3 "Primitive Operation Safety"
between Call Site Verification and SMT Solver Integration.  States
explicitly that obligations on `a / b`, `a % b`, `arr[i]`,
`string_at(s, i)`, etc. are not auto-synthesised; the exception
is `@Nat` subtraction (#520).  Cross-references #680 and #427.
Subsequent sections re-numbered (§6.4.4 SMT Solver Integration,
§6.4.5 Counterexample Reporting — was §6.4.3 and §6.4.4).

**KNOWN_ISSUES.md** — restored the Bugs section with #681 (`vera
test` reports refuted functions as VERIFIED).  This was confirmed
by the review with a clean reproducer:

  `vera verify` correctly emits E500 with counterexample, but
  `vera test` reports the buggy function as `VERIFIED (Tier 1)`
  with exit code 0.

The classifier at `vera/tester.py:393` only harvests
`severity == "warning"` Tier 3 codes; `severity == "error"`
codes flow past untouched.  Workaround documented: run
`vera verify` before trusting `vera test` results.  Fix is half
a day's work, landing separately.

**ROADMAP.md** — three new tracking items added under CI tooling
(#682 diagnostic-tagging discipline, #683 grammar drift), one
under verification depth (#680 auto-inject primitive obligations),
one under testing gaps (#679 Chapter 8 conformance programs).

**Auxiliary**:
- `scripts/check_faq_examples.py` and `scripts/check_spec_examples.py`
  allowlist line numbers updated (auto-fixed via
  `python scripts/fix_allowlists.py --fix`) — the Abilities snippet
  in FAQ shifted by ~11 lines from the new Q&A insertion; the spec
  example in §6.4.5 shifted from line 323 to 338 from the new §6.4.3.
- Site assets regenerated (`docs/llms-full.txt`, `docs/sitemap.xml`).

## Findings deferred from this PR

| Finding | Status |
|---|---|
| **P0-1** (`vera test` mis-classification) | #681 filed; documented in KNOWN_ISSUES.md; fix is a separate ~half-day PR |
| **P1-1** (diagnostic tagging) | #682 filed; ROADMAP entry added; multi-day mechanical work scoped for a follow-up PR |
| **P1-2** (Ch 8 conformance) | #679 filed; ROADMAP entry added; ~day of work |
| **P2-1** (grammar drift) | #683 filed; scoped to nominal renames only (expression-precedence ladders are deliberate notational difference) |
| **P2-2** (Tier 2 / `invariant()` strategic call) | Tier 2 (#427) genuinely long-running; `invariant()` doc-state already resolved by PR #650.  Strategic call deferred — flagged to Alasdair for direction. |

## Validation

- [x] `pytest tests/` — 3,875 passed, 14 skipped, 16 stress-deselected
- [x] `python scripts/check_conformance.py` — 86/86
- [x] `python scripts/check_examples.py` — 34/34
- [x] `python scripts/check_doc_counts.py` — consistent
- [x] `python scripts/check_limitations_sync.py` — consistent (30 in KNOWN_ISSUES, 18 in vera/README, 4 across spec)
- [x] `python scripts/check_readme_examples.py` — pass
- [x] `python scripts/check_faq_examples.py` — pass (after allowlist update)
- [x] `python scripts/check_spec_examples.py` — pass (after allowlist update)
- [x] `python scripts/check_skill_examples.py` — pass
- [x] `python scripts/check_site_assets.py` — up-to-date

Co-Authored-By: Claude <noreply@anthropic.invalid>
@coderabbitai

coderabbitai Bot commented May 18, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: dec58810-2b44-41ef-b122-789eb7909316

📥 Commits

Reviewing files that changed from the base of the PR and between 77445b6 and 9417e32.

📒 Files selected for processing (1)
  • CHANGELOG.md

📝 Walkthrough

Walkthrough

This PR clarifies verifier behaviour for primitive operations: the verifier does not auto-synthesise well‑definedness obligations (division/modulo/indexing); programmers must supply requires or refinements. Spec, FAQ, README, roadmap, scripts, KNOWN_ISSUES, tests, and E500 diagnostic text were updated; runtime traps include a labelled "Fix".

Changes

Primitive Operation Safety Clarification

Layer / File(s) Summary
Specification of primitive operation safety
spec/06-contracts.md
New "Primitive Operation Safety" subsection defines which well-definedness obligations are auto-synthesised (none for general primitives; special-case @Nat subtraction). SMT solver integration and counterexample reporting headings renumbered.
User-facing documentation (FAQ and README)
FAQ.md, README.md
FAQ answers whether the compiler guarantees non‑trapping (depends on requires/refinements). README rephrases contract verification as per-call-site SMT checks and documents runtime trapping with a Vera diagnostic and reference to issue #680.
Roadmap and known issues tracking
ROADMAP.md, KNOWN_ISSUES.md
ROADMAP adds CI/verification-depth work items (diagnostic-tagging hook, grammar alignment, Tier 2 Z3 proving, auto-injection of primitive obligations) and a Chapter 8 conformance gap. KNOWN_ISSUES adds an vera test E500 misclassification entry (issue #674).
Script allowlist updates
scripts/check_faq_examples.py, scripts/check_spec_examples.py
ALLOWLIST entries adjusted to new code-fence start lines after content shifts.
Tests and verifier diagnostic text
tests/test_verifier.py, vera/verifier.py, TESTING.md
E500 _report_violation fix message updated to enumerate three repair classes and tests strengthened to assert presence of "implementation", "requires", and "ensures"/"postcondition" in the suggested fix; TESTING.md line count updated.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

  • #680: docs and ROADMAP explicitly reference planned auto-injection of primitive-safety obligations.
  • #674: KNOWN_ISSUES documents vera test misclassifying E500 counterexamples (reproducer and workaround added).

Suggested labels

compiler, tests, spec, ci, docs

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarises the main purpose of the PR: a comprehensive documentation update addressing compiler-review findings from 2026-05-18, with five new tracking issues filed.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs-2026-05-18-compiler-review-followups

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented May 18, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.02%. Comparing base (08c1197) to head (9417e32).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #684   +/-   ##
=======================================
  Coverage   91.02%   91.02%           
=======================================
  Files          60       60           
  Lines       23413    23413           
  Branches      259      259           
=======================================
  Hits        21312    21312           
  Misses       2094     2094           
  Partials        7        7           
Flag Coverage Δ
javascript 57.36% <ø> (ø)
python 94.77% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

External contributor @rzyns filed two careful reports on 2026-05-15
(#674 and #675).  This commit addresses #675 inline with the
existing docs PR (#684) and reassigns the KNOWN_ISSUES.md entry
from #681 to #674 (my dupe of #674 from today's compiler-review
sweep — closed as duplicate).

## #675 — E500 fix text names all three repair classes

Pre-fix `fix=` text at `vera/verifier.py:1306-1308` named only
two repair classes: strengthen `requires(...)` or weaken
`ensures(...)`.  It missed the third (often most common) class:
fix the implementation to satisfy the existing contract.  When
E500 catches a typo in the function body (e.g. `@Int.0 + @Int.0
+ 1` when the contract says `@Int.0 + @Int.0`), the user's
intended repair is almost always "fix the typo" — the previous
wording implicitly biased them toward weakening the contract
instead.

New wording lists all three repair classes neutrally:

> Resolve the mismatch between the function body and its
> contract: fix the implementation so it satisfies this
> ensures() clause, strengthen requires(...) if the
> counterexample is outside the intended input domain, or
> weaken/change ensures(...) if the postcondition overstates
> the intended guarantee.

Implementation-repair comes first because it's the most
common case in practice; the verifier is otherwise neutral
about which artifact is "the source of truth".

Tightened `tests/test_verifier.py::TestCounterexamples::
test_violation_has_fix_suggestion` to assert all three
classes appear in the lowercased fix text.  The pre-existing
assertion (`"precondition" or "postcondition" in fix.lower()`)
survives the rewrite but would not have caught a regression
dropping "implementation"; the new assertion does.

Closes #675.

## #674 / #681 reassignment

#674 was filed by @rzyns on 2026-05-15 reporting that `vera
test --json` reports `category: "verified"` for functions the
verifier has refuted with E500.  I filed the same bug as #681
today (2026-05-18) from a strategic compiler-review session
without realising #674 existed.

#681 is closed as a duplicate; the `KNOWN_ISSUES.md` Bugs entry
now references #674 with a polished reproducer (@rzyns's
`@Int + @int + 1` form is cleaner than my `@Nat * 2 + 1`) and
explicit mention of the CI-consumes-JSON failure mode.

The underlying bug remains open; the fix is a separate code
PR (half a day's work — extend `vera/tester.py::_classify_
functions` to harvest `severity == "error"` codes too, add
a `"failed"` category, set non-zero exit code).

## Validation

- [x] `pytest tests/test_verifier.py::TestCounterexamples::test_violation_has_fix_suggestion -v` passes
- [x] `pytest tests/` — 3,875 passed, 14 skipped, 16 stress-deselected
- [x] `mypy vera/` clean
- [x] `ruff check --select S vera/` clean
- [x] `check_doc_counts.py`, `check_limitations_sync.py`, `check_readme_examples.py`, `check_faq_examples.py`, `check_spec_examples.py`, `check_site_assets.py` all consistent
- [x] TESTING.md test_verifier.py line count: 2,094 → 2,124

Co-Authored-By: Claude <noreply@anthropic.invalid>
CI's check_changelog_updated.py correctly flagged that PR #684
touches vera/verifier.py and spec/06-contracts.md without a
matching CHANGELOG.md entry.

Adds an [Unreleased] section covering:

- Fixed: #675 E500 fix-text rewording (external report from @rzyns)
- Documentation: the 2026-05-18 compiler-review docs sweep
  (README softening, FAQ Q&A, spec section 6.4.3, KNOWN_ISSUES
  #674 reassignment, ROADMAP additions for #679/#680/#682/#683)

Co-Authored-By: Claude <noreply@anthropic.invalid>
@aallan aallan merged commit 3651993 into main May 18, 2026
22 checks passed
@aallan aallan deleted the docs-2026-05-18-compiler-review-followups branch May 18, 2026 14:56
aallan added a commit to rzyns/vera that referenced this pull request May 19, 2026
CodeRabbit posted three findings on rzyns's PR plus there's a CI
gate (doc-counts) and a project convention (empty Bugs table) to
address.  Two additional minor refactors (DRY constant, public
helper rename) included since they touch the same files.

**CR Finding 1: _bad_vera missing encoding="utf-8"** (test
portability).  Pre-existing helper used 30+ times in test_cli.py
since long before aallan#685, but CR caught it now because rzyns's new
tests added more callers.  Windows runs would fall back to cp1252
on the default text-mode encoding and corrupt UTF-8 sequences in
Vera fixtures.  CLAUDE.md "Cross-platform pitfalls" already flags
this as a convention.  One-line fix on the helper benefits every
caller.

**CR Finding 2: needs_pos.category != "failed" too weak**
(test_tester.py).  In `test_call_precondition_error_fails_caller_
not_callee`, the negative assertion would still pass if needs_pos
were classified as "skipped" or any other non-failed bucket.
Strengthened to `== "verified"` so the test actually pins that the
callee is correctly proven, not just that it avoided the failed
bucket.

**CR Finding 3: failed_fns dict overwrite is order-dependent**
(vera/tester.py).  When a single function attracts multiple
verifier errors (e.g. `ensures(false)` produces E500 AND
`@Nat - @Nat` produces E502 on the same body — the
`test_displayed_failed_function_does_not_double_count_verifier_
errors` test exercises exactly this shape), the loop overwrites
the dict entry and the displayed `reason` flips with diagnostic
iteration order.  Switched to first-hit (`if name not in
failed_fns`) — `setdefault`-style semantics, no judgement call on
which code "should" win.

**Doc counts: 7 stale numbers** (CI gate would fail).
`check_doc_counts.py` flagged TESTING.md total, test_cli.py +
test_tester.py + test_tester_coverage.py rows, and ROADMAP.md
test count.  Updated all 7 plus README.md test count for
consistency: 3,905 → 3,916 total.  All counts now consistent.

**KNOWN_ISSUES.md empty table** (project convention).  PR
removed the aallan#674 row but left the table header `| Bug | Issue |`
with no body.  Per the convention established at aallan#673's release
("retain the Bugs section, and just have some text in it to say
'No known bugs'"), replaced the empty table with the convention
line.

**Nit 1: DRY** — `{"E500", "E501", "E502"}` was duplicated
between vera/tester.py and vera/cli.py.  Extracted as
`VERIFICATION_ERROR_CODES = frozenset(...)` module-level in
vera/tester.py; cli.py imports it.  `frozenset` over plain set so
the constant is immutable (matches its semantic role as a
classifier-set constant).

**Nit 2: private name reaching across modules** — `cli.py` was
importing `_failed_function_name` from vera/tester.py.  The
single-underscore prefix signals "private by convention" but
cli.py's reliance on it was a leak.  Renamed
`_failed_function_name` → `failed_function_name` (public);
docstring updated to mention the cli.py call site explicitly.

Validation:
- pytest 3,886 / 3,886 (no test count change — the renames don't
  affect test behaviour, just the names)
- mypy clean
- ruff S clean
- conformance 86/86, examples 34/34
- check_doc_counts.py consistent
- limitations sync consistent
- check_changelog_updated.py passes (existing [Unreleased] entry
  from rzyns's commit + the additional carry-over from PR aallan#684)
- check_site_assets.py up-to-date

Co-Authored-By: Claude <noreply@anthropic.invalid>
@aallan aallan mentioned this pull request May 19, 2026
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

E500 postcondition diagnostic fix text should mention implementation repair

1 participant