Repo housekeeping: hero image, verify fix, codecov, pre-commit, templates, DESIGN/ROADMAP#16
Conversation
- Add social preview image to assets/ and README header - Fix image extension in README (.jpg -> .png) - Fix verify@1 showing 0% instead of - for Python results in tier breakdown: only count problems as verify-eligible when verify_pass is explicitly True/False, not None (Python has no verifier) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughRefactors verification eligibility logic in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #16 +/- ##
==========================================
+ Coverage 65.41% 65.56% +0.15%
==========================================
Files 10 10
Lines 905 909 +4
==========================================
+ Hits 592 596 +4
Misses 313 313
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
CI & configuration: - codecov.yml: configure project/patch status checks with python flag - .pre-commit-config.yaml: trailing whitespace, YAML/JSON validation, ruff lint+format, problem validation on .json/.vera changes, pytest - pyproject.toml: add pre-commit to dev dependencies Documentation: - DESIGN.md: permanent design rationale extracted from BRIEFING.md (prior art, tier definitions, metrics, key decisions) - ROADMAP.md: forward-looking milestones and open issues - CONTRIBUTING.md: add pre-commit install to dev setup - CLAUDE.md: reference DESIGN.md and ROADMAP.md Issue templates: - Bug report template (with problem ID field) - New problem suggestion template (with tier and signature) - config.yml: allow blank issues Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@codecov.yml`:
- Around line 9-14: Patch coverage is currently set to informational (patch ->
python -> informational: true) which disables enforcement for new code; change
informational to false and add a reasonable minimum threshold (e.g., under the
same patch.python section set minimum: 50 or 70) so patch coverage is enforced,
or if you intend to keep it informational, add a comment/documentation entry
explaining and justifying why patch.python.informational remains true. Ensure
you update the patch.python keys (informational and minimum) so CI will enforce
the desired patch coverage policy.
- Around line 1-8: Update the Codecov project threshold to match the CI/local
requirement by replacing the current threshold value referenced as "threshold:
1.0%" in codecov.yml with a value consistent with the CI flag
"--cov-fail-under=35" (set to 35% or higher), or alternatively raise it to
reflect the ROADMAP target (>90%); if you intentionally keep a minimal gate, add
a short rationale comment in the repo docs explaining why codecov.yml's
threshold differs from CI's "--cov-fail-under=35" and the ROADMAP goal to avoid
mismatch confusion.
In `@DESIGN.md`:
- Around line 50-51: The doc claims verify@1 uses the first attempt but the
implementation computes verification on the best passing attempt (checks attempt
2 first, else attempt 1); choose one of two fixes: (A) Update the DESIGN.md text
for verify@1 to describe "best-attempt verify" semantics (mentioning that
verify@1 returns true if any attempt passes check, preferring later attempts),
or (B) Change the metrics code path that computes verify@1 to strictly use the
first attempt only (i.e., evaluate check on attempt1 and ignore subsequent
attempts when computing verify@1); update or add tests to cover the chosen
behavior so metrics and DESIGN.md remain consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: eb99c0a5-13a6-4ae4-9689-8a91a68c329e
📒 Files selected for processing (10)
.github/ISSUE_TEMPLATE/bug_report.md.github/ISSUE_TEMPLATE/config.yml.github/ISSUE_TEMPLATE/new_problem.md.pre-commit-config.yamlCLAUDE.mdCONTRIBUTING.mdDESIGN.mdROADMAP.mdcodecov.ymlpyproject.toml
- codecov.yml: add comments explaining threshold is a delta (not absolute floor) and why patch coverage is informational - DESIGN.md: verify@1 and run_correct use best passing attempt, not strictly first attempt — match the actual metrics implementation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@DESIGN.md`:
- Around line 22-29: Update the VerifyThisBench row in the table so the Language
column lists the seven supported verification frameworks (Dafny, Frama-C,
VerCors, VeriFast, Why3, Verus, CBMC) instead of "Dafny/Lean/Coq"; locate the
row beginning with "**VerifyThisBench**" in DESIGN.md and replace the language
cell content accordingly while keeping the rest of the row (Problems, Task, Key
metric) unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9872ad9f-2c24-4ebf-a9fd-c5c45514504f
📒 Files selected for processing (2)
DESIGN.mdcodecov.yml
Seven verification frameworks, not Dafny/Lean/Coq. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@DESIGN.md`:
- Line 50: Update the verify@1 definition to remove or clarify "Tier 1
contracts": change the phrase to explicitly state that verify@1 checks "Does the
best passing attempt pass `vera verify` (SMT-based contract verification)?" or
simply drop the tier qualifier so it reads "Does the best passing attempt pass
`vera verify`?" and ensure any mention of `verify_pass` or `vera verify`
elsewhere aligns with this SMT-based verification meaning.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 581ce3c2-8d97-457c-90d2-bc87bf2e2936
📒 Files selected for processing (1)
DESIGN.md
vera verify does SMT-based verification across all tiers. The Tier 1 expectation is a per-problem flag, not part of the metric definition. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Housekeeping PR bundling several improvements from the CI audit.
Bug fix
0%instead of-for Python results in tier breakdown (metrics only counted problems as verify-eligible whenverify_passwas non-None)CI & configuration
codecov.yml: project/patch status checks with python flag and documented thresholds.pre-commit-config.yaml: trailing whitespace, YAML/JSON/TOML validation, ruff lint+format, problem validation on .json/.vera changes, pytestpyproject.toml: add pre-commit to dev dependenciesDocumentation
DESIGN.md: permanent design rationale extracted from BRIEFING.md (prior art, tier definitions, metrics semantics, key decisions)ROADMAP.md: forward-looking milestones and open issuesCONTRIBUTING.md: add pre-commit install to dev setupCLAUDE.md: reference DESIGN.md and ROADMAP.mdIssue templates
config.yml: allow blank issuesREADME
assets/vera-bench-social-preview.png)Manual steps after merge
codecov/project/pythonto required checks once it firesTest plan
Generated with Claude Code
Summary by CodeRabbit
Release Notes
Documentation
New Features
Bug Fixes
Chores