Fix vera test verifier error handling (#674)#685
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (6)
📝 WalkthroughWalkthroughVerifier diagnostics E500/E501/E502 are treated as errors: the test engine marks implicated functions "failed", preserves diagnostics, JSON "ok" and process exit codes fail, CLI prints a Diagnostics section, and summaries separate static verifier failures from Tier 3 runtime failures. ChangesVerifier error classification and CLI wiring
Sequence DiagramsequenceDiagram
participant Verifier
participant _TestEngine
participant CLI
Verifier-->>_TestEngine: emit diagnostics (E500/E501/E502)
_TestEngine->>_TestEngine: filter severity=="error", extract function names, mark impacted functions "failed"
_TestEngine-->>CLI: result (results, summary, diagnostics)
CLI->>CLI: compute has_errors, s.failed, include unlisted_errors, determine exit code
CLI-->>User: JSON/human output with Diagnostics block and non-zero exit where applicable
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test_cli.py`:
- Around line 1625-1643: In test_dispatch_test_json_verifier_error_nonzero
update the call to prog.write_text(...) to explicitly pass encoding="utf-8" so
the test file is written with UTF-8 encoding; locate the prog.write_text
invocation in that test (the block creating "wrong_double.vera") and add the
encoding parameter to the write_text call.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9f3cc335-936f-4a9c-b6db-70dd6b95dcc6
📒 Files selected for processing (6)
CHANGELOG.mdtests/test_cli.pytests/test_tester.pytests/test_tester_coverage.pyvera/cli.pyvera/tester.py
Fail closed when `vera test` sees E500/E501/E502 verifier diagnostics instead of reporting verifier-refuted contracts as successful Tier 1 results. Preserve verifier diagnostics in JSON, return ok=false with a nonzero exit, and show human diagnostics so private, unselected, or otherwise non-displayed verifier failures cannot disappear. Attribute E501 call-site precondition failures to the caller while preserving E500/E502 attribution, keep --fn rows filtered while whole-file verifier errors still fail closed, and distinguish static verifier failures from Tier 3 E700 runtime failures in human summaries. Add regression coverage for JSON and human output, private helper failures, --fn fail-closed diagnostics, E501 caller attribution, E502 underflow attribution, mixed static/Tier 3 summaries, E700 runtime failures, multiple diagnostics on one displayed failed function, and CLI dispatch return codes.
ce5214f to
c7e41bf
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test_cli.py`:
- Around line 1383-1399: The test creates .vera fixtures via the helper
_bad_vera which ultimately calls Path.write_text() without an explicit encoding,
making Windows runs brittle; update the fixture creation to use UTF-8 by either
modifying the helper _bad_vera to call write_text(..., encoding="utf-8") or
change the new test invocation to write the fixture with encoding="utf-8" when
creating the Path, ensuring the use of _bad_vera and write_text is the targeted
change.
In `@tests/test_tester.py`:
- Around line 90-93: The test currently asserts that needs_pos.category !=
"failed", which is too weak; change the assertion to require the callee remains
explicitly "verified" by replacing the line using _fn_result(result,
"needs_pos") to assert needs_pos.category == "verified" (keep the existing
assertion that caller.category == "failed"); update the assertion near the uses
of needs_pos, caller and _fn_result in tests/test_tester.py so the test proves
the regression only affects the caller.
In `@vera/tester.py`:
- Around line 428-431: The current assignment in the diagnostics loop overwrites
prior verifier codes for the same function (variable failed_fns) making the
visible reason order-dependent; change the logic in the block that uses
_failed_function_name(diag) so it does not clobber an existing entry for the
same name — either preserve the first hit (only set failed_fns[name] if name not
already present) or apply a deterministic priority (compare diag.error_code
against the existing failed_fns[name] using a defined priority/order such as
numeric lowest or an explicit list like ["E500","E501","E502"] and keep the
higher-priority code); update the assignment path that references diag.severity,
diag.error_code, and verification_error_codes so JSON and human outputs remain
stable and deterministic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: f47f2226-5c59-4a56-a8bb-1a4d77f67d6f
📒 Files selected for processing (7)
CHANGELOG.mdKNOWN_ISSUES.mdtests/test_cli.pytests/test_tester.pytests/test_tester_coverage.pyvera/cli.pyvera/tester.py
💤 Files with no reviewable changes (1)
- KNOWN_ISSUES.md
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #685 +/- ##
==========================================
+ Coverage 91.02% 91.07% +0.04%
==========================================
Files 60 60
Lines 23413 23458 +45
Branches 259 259
==========================================
+ Hits 21312 21364 +52
+ Misses 2094 2087 -7
Partials 7 7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
CodeRabbit posted three findings on rzyns's PR plus there's a CI gate (doc-counts) and a project convention (empty Bugs table) to address. Two additional minor refactors (DRY constant, public helper rename) included since they touch the same files. **CR Finding 1: _bad_vera missing encoding="utf-8"** (test portability). Pre-existing helper used 30+ times in test_cli.py since long before aallan#685, but CR caught it now because rzyns's new tests added more callers. Windows runs would fall back to cp1252 on the default text-mode encoding and corrupt UTF-8 sequences in Vera fixtures. CLAUDE.md "Cross-platform pitfalls" already flags this as a convention. One-line fix on the helper benefits every caller. **CR Finding 2: needs_pos.category != "failed" too weak** (test_tester.py). In `test_call_precondition_error_fails_caller_ not_callee`, the negative assertion would still pass if needs_pos were classified as "skipped" or any other non-failed bucket. Strengthened to `== "verified"` so the test actually pins that the callee is correctly proven, not just that it avoided the failed bucket. **CR Finding 3: failed_fns dict overwrite is order-dependent** (vera/tester.py). When a single function attracts multiple verifier errors (e.g. `ensures(false)` produces E500 AND `@Nat - @Nat` produces E502 on the same body — the `test_displayed_failed_function_does_not_double_count_verifier_ errors` test exercises exactly this shape), the loop overwrites the dict entry and the displayed `reason` flips with diagnostic iteration order. Switched to first-hit (`if name not in failed_fns`) — `setdefault`-style semantics, no judgement call on which code "should" win. **Doc counts: 7 stale numbers** (CI gate would fail). `check_doc_counts.py` flagged TESTING.md total, test_cli.py + test_tester.py + test_tester_coverage.py rows, and ROADMAP.md test count. Updated all 7 plus README.md test count for consistency: 3,905 → 3,916 total. All counts now consistent. **KNOWN_ISSUES.md empty table** (project convention). PR removed the aallan#674 row but left the table header `| Bug | Issue |` with no body. Per the convention established at aallan#673's release ("retain the Bugs section, and just have some text in it to say 'No known bugs'"), replaced the empty table with the convention line. **Nit 1: DRY** — `{"E500", "E501", "E502"}` was duplicated between vera/tester.py and vera/cli.py. Extracted as `VERIFICATION_ERROR_CODES = frozenset(...)` module-level in vera/tester.py; cli.py imports it. `frozenset` over plain set so the constant is immutable (matches its semantic role as a classifier-set constant). **Nit 2: private name reaching across modules** — `cli.py` was importing `_failed_function_name` from vera/tester.py. The single-underscore prefix signals "private by convention" but cli.py's reliance on it was a leak. Renamed `_failed_function_name` → `failed_function_name` (public); docstring updated to mention the cli.py call site explicitly. Validation: - pytest 3,886 / 3,886 (no test count change — the renames don't affect test behaviour, just the names) - mypy clean - ruff S clean - conformance 86/86, examples 34/34 - check_doc_counts.py consistent - limitations sync consistent - check_changelog_updated.py passes (existing [Unreleased] entry from rzyns's commit + the additional carry-over from PR aallan#684) - check_site_assets.py up-to-date Co-Authored-By: Claude <noreply@anthropic.invalid>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test_cli.py`:
- Around line 1479-1508: The test
test_fn_filter_surfaces_unselected_verifier_error currently only exercises the
JSON output path; add a sibling assertion that exercises the human/text path by
calling cmd_test(str(path), fn_name="identity") (without as_json) and assert it
returns exit code 1, then use capsys.readouterr().out to assert the verifier
error is present in the textual diagnostics, that only the selected function
"identity" is listed, and that its category is reported as "verified" (mirror
the JSON checks but against the text output) to ensure the fail-closed behavior
on the text branch is covered.
In `@vera/tester.py`:
- Around line 90-95: VERIFICATION_ERROR_CODES and failed_function_name() are
being exported for CLI-only bookkeeping; make them private (rename to
_VERIFICATION_ERROR_CODES and _failed_function_name) or move their logic into a
single shared helper used only by the CLI, and instead expose structured
attribution/deduplication data on the TestResult/TestSummary returned by test()
(e.g., attach a list of verifier error codes and failing-function attribution to
TestResult/TestSummary) so the CLI parses structured fields rather than relying
on regex helpers; update test(), TestResult/TestSummary, and any callers
(including the CLI consumer) to consume the new structured fields and remove the
public exports.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2136ebdf-5816-4f18-b1ae-62b3d7b2fec2
📒 Files selected for processing (8)
KNOWN_ISSUES.mdREADME.mdROADMAP.mdTESTING.mdtests/test_cli.pytests/test_tester.pyvera/cli.pyvera/tester.py
aallan
left a comment
There was a problem hiding this comment.
Thanks @rzyns — this is a fine fix. Approving once CI clears.
A few notes on what's here, for anyone looking at the PR:
Correctness. The classifier change in vera/tester.py is right: verifier-refuted contracts now take precedence over the verified/tier3/skipped buckets, so a function the verifier can refute statically can't fall through to a green-test result. The E501 attribution to the caller (not the callee) is the right call — the missing-precondition obligation lives at the caller, and that's where the user needs to add the requires(...) to make the verifier accept the call. The --fn filter logic is also nicely handled: filtered output stays scoped to the selected function, but whole-file verifier errors still fail closed (otherwise vera test --fn safe_fn buggy_file.vera would false-green).
Tests. Eleven new regression tests across test_tester.py and test_cli.py, each pinning a distinct property — JSON output, human output, private function visibility, --fn filter behaviour, summary split, Tier-3-vs-static distinction, double-count avoidance, dispatcher contract. The double-count test (function attracting both E500 and E502 on the same body via ensures(false) + @Nat - @Nat) was an edge case I'd have missed.
Cleanup commit (8d12298). I pushed a follow-up handling the three CR comments that arrived later — _bad_vera encoding (now encoding="utf-8" — benefits 30+ callers, not just the new tests), the needs_pos.category != "failed" → == "verified" assertion strengthening (the negative form would still pass under "skipped"), and the failed_fns dict-clobber fix (first-hit semantics, so multi-diagnostic functions get a deterministic reason instead of one that flips with iteration order). Also rolled in a couple of nits: extracted VERIFICATION_ERROR_CODES as a module-level frozenset constant (was duplicated between tester.py and cli.py), and dropped the underscore from failed_function_name since cli.py imports it across modules. Plus the doc-counts CI gate (7 stale numbers in TESTING.md / ROADMAP.md / README.md after the test count went 3,905 → 3,916) and the | Bug | Issue | empty-table convention in KNOWN_ISSUES.md.
Thanks for the well-shaped contributions across #674, #675, and this one.
CR's two new comments after my cleanup commit: **Finding 1: missing text-mode sibling for the --fn filter test** (test_cli.py). `test_fn_filter_surfaces_unselected_verifier_ error` covered the JSON path but not the human/text path — both codepaths can drift independently (text rendering has its own diagnostics section, summary line, unlisted-error counter), so the JSON-only coverage left a real gap. Added `test_fn_filter_surfaces_unselected_verifier_error_text` mirroring the JSON test for human output: asserts the displayed function list stays scoped to `identity` (per `--fn`), `broken`'s verifier error still surfaces in the Diagnostics section, the summary line names the unlisted verifier error, and the exit code is 1. **Finding 2: classifier internals exported for CLI-only use** (vera/tester.py). Last round I dropped the underscore from `_failed_function_name` so cli.py could import it across the module boundary. CR pushed back: those exports (along with `VERIFICATION_ERROR_CODES`) are CLI-only bookkeeping and don't belong on `vera.tester`'s public surface. Three options from CR: - (a) Privatise the exports — cli.py keeps reaching across the underscore boundary - (b) Move the logic into a CLI-only shared helper - (c) Expose the computed value as structured data on TestResult/TestSummary so cli.py reads it as a field Went with (c) — it's the cleanest because the only thing cli.py needed the exports for was a single count (verifier errors whose target function isn't in the displayed list). Adding `unlisted_errors: int = 0` to TestSummary, computing it in the engine after the main loop, and reading it as `s.unlisted_errors` in cli.py makes vera/cli.py purely presentational — no logic, no regex, no classifier internals. The engine returns the canonical answer and the formatter just renders it. Specifically: - `vera/tester.py::TestSummary` gains `unlisted_errors: int = 0` - `vera/tester.py::_TestEngine.run()` computes `summary.unlisted_errors` before returning by counting verifier-error diagnostics whose attributable function isn't in the `displayed_failed` set derived from `results` - `VERIFICATION_ERROR_CODES` → `_VERIFICATION_ERROR_CODES` (private; only the engine uses it now) - `failed_function_name` → `_failed_function_name` (private; only the engine uses it now) - `vera/cli.py` drops the imports + the local `displayed_failed` set + the `sum(...)` comprehension; just reads `s.unlisted_errors` directly - JSON output gains `"unlisted_errors": s.unlisted_errors` in the summary dict for downstream CI consumers - `test_fn_filter_surfaces_unselected_verifier_error` (JSON) now also asserts `data["summary"]["unlisted_errors"] == 1` to pin the new structured field This is option (c) — the principled long-term shape. Option (a) (which I did last round) was the minimum-touch fix but left `vera.tester`'s public API leaking implementation details the CLI happened to need. CR is right that (c) is the better contract. Validation: pytest 3,887 / 3,887 (one new test for Finding 1), mypy clean, ruff S clean, conformance 86/86, examples 34/34, doc-counts consistent (3,916 → 3,917 total; test_cli.py 225 → 226 tests, 3,250 → 3,300 lines). Co-Authored-By: Claude <noreply@anthropic.invalid>
|
Cleanup commit. ( |
Summary
Fixes #674.
vera testnow fails closed when the verifier reports E500/E501/E502 diagnostics instead of letting verifier-refuted functions fall through as"verified" / "Tier 1 (proved)".This makes
vera test --jsonsafe as a green contract gate again for the verifier-error cases discussed in #674.Behavior changes
TestResult.diagnostics."failed"function category for verifier-refuted public functions.TestSummary.failedin JSON and human output.vera test/vera test --jsonwhen result diagnostics contain errors."ok": falsewhen verifier errors are present.in function '...'@Natsubtraction underflow to the function named inin '...'--fnrows filtered to the selected function while still failing closed on whole-file verifier errors.Tests added / updated
Regression coverage now includes:
vera test --jsonreturns nonzero andok: falsefor E500 verifier errors.FAILED.--fnkeeps function rows filtered but still fails closed on unselected whole-file verifier errors.@Natunderflow is classified as a failed verifier error.vera test --jsonreturns nonzero for verifier errors.Verification
Ran locally:
Also checked:
and a staged added-line secret scan; both were clean.
Notes
This follows the intended direction from the maintainer comment on #674:
"failed",TestSummary.failed,I kept the displayed
--fnfunction list scoped to the selected function, but made diagnostics/exit status reflect whole-file verifier errors so--fncannot false-green a file with verifier-refuted contracts.Summary by CodeRabbit
Bug Fixes
New Features
Tests
Documentation