Repo housekeeping: hero image, verify fix, codecov, pre-commit, templates, DESIGN/ROADMAP by aallan · Pull Request #16 · aallan/vera-bench

aallan · 2026-03-30T10:34:20Z

Summary

Housekeeping PR bundling several improvements from the CI audit.

Bug fix

Fix verify@1 showing 0% instead of - for Python results in tier breakdown (metrics only counted problems as verify-eligible when verify_pass was non-None)

CI & configuration

codecov.yml: project/patch status checks with python flag and documented thresholds
.pre-commit-config.yaml: trailing whitespace, YAML/JSON/TOML validation, ruff lint+format, problem validation on .json/.vera changes, pytest
pyproject.toml: add pre-commit to dev dependencies

Documentation

DESIGN.md: permanent design rationale extracted from BRIEFING.md (prior art, tier definitions, metrics semantics, key decisions)
ROADMAP.md: forward-looking milestones and open issues
CONTRIBUTING.md: add pre-commit install to dev setup
CLAUDE.md: reference DESIGN.md and ROADMAP.md

Issue templates

Bug report (with problem ID field)
New problem suggestion (with tier and signature)
config.yml: allow blank issues

README

Add hero image (assets/vera-bench-social-preview.png)
Fix image extension (.jpg -> .png)

Manual steps after merge

Settings -> Branches: add codecov/project/python to required checks once it fires
Settings -> Branches: add CodeRabbit status check name once visible

Test plan

308 tests pass
Ruff clean

Generated with Claude Code

Summary by CodeRabbit

Release Notes

Documentation
- Added design documentation and project roadmap.
- Added pre-commit hooks guide for contributors.
New Features
- Added social preview image to README.
- Added GitHub issue templates for bug reports and problem suggestions.
- Configured pre-commit hooks for code quality automation.
Bug Fixes
- Fixed verification metric calculations to exclude attempts without results.
Chores
- Added pre-commit and code coverage configuration.
- Updated development dependencies.

- Add social preview image to assets/ and README header - Fix image extension in README (.jpg -> .png) - Fix verify@1 showing 0% instead of - for Python results in tier breakdown: only count problems as verify-eligible when verify_pass is explicitly True/False, not None (Python has no verifier) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-03-30T10:34:37Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 38cc2482-e6d6-4aae-98bf-07803af8cd8f

📥 Commits

Reviewing files that changed from the base of the PR and between 3d42139 and 6dc343b.

📒 Files selected for processing (1)

DESIGN.md

📝 Walkthrough

Walkthrough

Refactors verification eligibility logic in vera_bench/metrics.py to exclude attempts lacking a verify_pass result, and adds documentation, issue templates, pre-commit and Codecov configs plus a README social-preview badge.

Changes

Cohort / File(s)	Summary
Metrics logic `vera_bench/metrics.py`	Change: store `vp = best.get("verify_pass")`; increment `verify_eligible` only when `vp is not None` and count verify successes only when `vp` is truthy so attempts without verifier results are excluded from verify-eligible failures.
Documentation & overview `README.md`, `CLAUDE.md`, `DESIGN.md`, `ROADMAP.md`, `CONTRIBUTING.md`	Added README social-preview badge; split/design/roadmap docs added/updated; CONTRIBUTING now documents pre-commit usage.
Issue templates `.github/ISSUE_TEMPLATE/bug_report.md`, `.github/ISSUE_TEMPLATE/new_problem.md`, `.github/ISSUE_TEMPLATE/config.yml`	Added bug-report and new-problem issue templates and enabled blank issues via config.
Pre-commit & CI config `.pre-commit-config.yaml`, `pyproject.toml`, `codecov.yml`	Added pre-commit configuration with hooks for linting/formatting/validation/tests; added `pre-commit>=3.0` to dev extras; introduced Codecov rules for `python` flag and patch behaviour.
Repository hygiene `CLAUDE.md`	Replaced single design pointer with separate DESIGN and ROADMAP references; clarified BRIEFING.md as historical.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Implement LLM runner harness (Phase 2) #3 — modifies vera_bench/metrics.py and targets the same verify-eligibility logic.

Suggested labels

harness, ci, docs

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the primary changes: housekeeping across multiple areas (hero image, verify bug fix, codecov, pre-commit, templates, and new documentation files).
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/housekeeping

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov-commenter · 2026-03-30T10:35:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 65.56%. Comparing base (9801346) to head (6dc343b).

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #16      +/-   ##
==========================================
+ Coverage   65.41%   65.56%   +0.15%     
==========================================
  Files          10       10              
  Lines         905      909       +4     
==========================================
+ Hits          592      596       +4     
  Misses        313      313

Flag	Coverage Δ
python	`65.56% <100.00%> (+0.15%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

CI & configuration: - codecov.yml: configure project/patch status checks with python flag - .pre-commit-config.yaml: trailing whitespace, YAML/JSON validation, ruff lint+format, problem validation on .json/.vera changes, pytest - pyproject.toml: add pre-commit to dev dependencies Documentation: - DESIGN.md: permanent design rationale extracted from BRIEFING.md (prior art, tier definitions, metrics, key decisions) - ROADMAP.md: forward-looking milestones and open issues - CONTRIBUTING.md: add pre-commit install to dev setup - CLAUDE.md: reference DESIGN.md and ROADMAP.md Issue templates: - Bug report template (with problem ID field) - New problem suggestion template (with tier and signature) - config.yml: allow blank issues Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@codecov.yml`:
- Around line 9-14: Patch coverage is currently set to informational (patch ->
python -> informational: true) which disables enforcement for new code; change
informational to false and add a reasonable minimum threshold (e.g., under the
same patch.python section set minimum: 50 or 70) so patch coverage is enforced,
or if you intend to keep it informational, add a comment/documentation entry
explaining and justifying why patch.python.informational remains true. Ensure
you update the patch.python keys (informational and minimum) so CI will enforce
the desired patch coverage policy.
- Around line 1-8: Update the Codecov project threshold to match the CI/local
requirement by replacing the current threshold value referenced as "threshold:
1.0%" in codecov.yml with a value consistent with the CI flag
"--cov-fail-under=35" (set to 35% or higher), or alternatively raise it to
reflect the ROADMAP target (>90%); if you intentionally keep a minimal gate, add
a short rationale comment in the repo docs explaining why codecov.yml's
threshold differs from CI's "--cov-fail-under=35" and the ROADMAP goal to avoid
mismatch confusion.

In `@DESIGN.md`:
- Around line 50-51: The doc claims verify@1 uses the first attempt but the
implementation computes verification on the best passing attempt (checks attempt
2 first, else attempt 1); choose one of two fixes: (A) Update the DESIGN.md text
for verify@1 to describe "best-attempt verify" semantics (mentioning that
verify@1 returns true if any attempt passes check, preferring later attempts),
or (B) Change the metrics code path that computes verify@1 to strictly use the
first attempt only (i.e., evaluate check on attempt1 and ignore subsequent
attempts when computing verify@1); update or add tests to cover the chosen
behavior so metrics and DESIGN.md remain consistent.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: eb99c0a5-13a6-4ae4-9689-8a91a68c329e

📥 Commits

Reviewing files that changed from the base of the PR and between 6b80275 and e43c6d7.

📒 Files selected for processing (10)

.github/ISSUE_TEMPLATE/bug_report.md
.github/ISSUE_TEMPLATE/config.yml
.github/ISSUE_TEMPLATE/new_problem.md
.pre-commit-config.yaml
CLAUDE.md
CONTRIBUTING.md
DESIGN.md
ROADMAP.md
codecov.yml
pyproject.toml

- codecov.yml: add comments explaining threshold is a delta (not absolute floor) and why patch coverage is informational - DESIGN.md: verify@1 and run_correct use best passing attempt, not strictly first attempt — match the actual metrics implementation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@DESIGN.md`:
- Around line 22-29: Update the VerifyThisBench row in the table so the Language
column lists the seven supported verification frameworks (Dafny, Frama-C,
VerCors, VeriFast, Why3, Verus, CBMC) instead of "Dafny/Lean/Coq"; locate the
row beginning with "**VerifyThisBench**" in DESIGN.md and replace the language
cell content accordingly while keeping the rest of the row (Problems, Task, Key
metric) unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9872ad9f-2c24-4ebf-a9fd-c5c45514504f

📥 Commits

Reviewing files that changed from the base of the PR and between e43c6d7 and 00a8846.

📒 Files selected for processing (2)

DESIGN.md
codecov.yml

Seven verification frameworks, not Dafny/Lean/Coq. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@DESIGN.md`:
- Line 50: Update the verify@1 definition to remove or clarify "Tier 1
contracts": change the phrase to explicitly state that verify@1 checks "Does the
best passing attempt pass `vera verify` (SMT-based contract verification)?" or
simply drop the tier qualifier so it reads "Does the best passing attempt pass
`vera verify`?" and ensure any mention of `verify_pass` or `vera verify`
elsewhere aligns with this SMT-based verification meaning.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 581ce3c2-8d97-457c-90d2-bc87bf2e2936

📥 Commits

Reviewing files that changed from the base of the PR and between 00a8846 and 3d42139.

📒 Files selected for processing (1)

DESIGN.md

vera verify does SMT-based verification across all tiers. The Tier 1 expectation is a per-problem flag, not part of the metric definition. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed Mar 30, 2026

View reviewed changes

Comment thread codecov.yml

Comment thread codecov.yml

Comment thread DESIGN.md Outdated

coderabbitai Bot reviewed Mar 30, 2026

View reviewed changes

Comment thread DESIGN.md

Fix VerifyThisBench languages in DESIGN.md

3d42139

Seven verification frameworks, not Dafny/Lean/Coq. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

aallan changed the title ~~Add hero image, fix verify display for Python results~~ Repo housekeeping: hero image, verify fix, codecov, pre-commit, templates, DESIGN/ROADMAP Mar 30, 2026

coderabbitai Bot reviewed Mar 30, 2026

View reviewed changes

Comment thread DESIGN.md Outdated

Simplify verify@1 definition: drop tier qualifier

6dc343b

vera verify does SMT-based verification across all tiers. The Tier 1 expectation is a per-problem flag, not part of the metric definition. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

aallan merged commit 908b435 into main Mar 30, 2026
8 checks passed

aallan deleted the fix/housekeeping branch March 30, 2026 15:51

coderabbitai Bot mentioned this pull request Mar 30, 2026

Strengthen problem descriptions and postconditions (v0.0.5) #32

Merged

4 tasks

This was referenced Apr 13, 2026

Document Aver support and add comparison language guide #52

Merged

Remove BRIEFING.md bootstrap document #54

Merged

Report T1-T4 aggregate separately for cross-language comparison #56

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repo housekeeping: hero image, verify fix, codecov, pre-commit, templates, DESIGN/ROADMAP#16

Repo housekeeping: hero image, verify fix, codecov, pre-commit, templates, DESIGN/ROADMAP#16
aallan merged 5 commits into
mainfrom
fix/housekeeping

aallan commented Mar 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 30, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

codecov-commenter commented Mar 30, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aallan commented Mar 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bug fix

CI & configuration

Documentation

Issue templates

README

Manual steps after merge

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

codecov-commenter commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aallan commented Mar 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 30, 2026 •

edited

Loading

codecov-commenter commented Mar 30, 2026 •

edited

Loading