Skip to content

Repo housekeeping: hero image, verify fix, codecov, pre-commit, templates, DESIGN/ROADMAP#16

Merged
aallan merged 5 commits into
mainfrom
fix/housekeeping
Mar 30, 2026
Merged

Repo housekeeping: hero image, verify fix, codecov, pre-commit, templates, DESIGN/ROADMAP#16
aallan merged 5 commits into
mainfrom
fix/housekeeping

Conversation

@aallan

@aallan aallan commented Mar 30, 2026

Copy link
Copy Markdown
Owner

Summary

Housekeeping PR bundling several improvements from the CI audit.

Bug fix

  • Fix verify@1 showing 0% instead of - for Python results in tier breakdown (metrics only counted problems as verify-eligible when verify_pass was non-None)

CI & configuration

  • codecov.yml: project/patch status checks with python flag and documented thresholds
  • .pre-commit-config.yaml: trailing whitespace, YAML/JSON/TOML validation, ruff lint+format, problem validation on .json/.vera changes, pytest
  • pyproject.toml: add pre-commit to dev dependencies

Documentation

  • DESIGN.md: permanent design rationale extracted from BRIEFING.md (prior art, tier definitions, metrics semantics, key decisions)
  • ROADMAP.md: forward-looking milestones and open issues
  • CONTRIBUTING.md: add pre-commit install to dev setup
  • CLAUDE.md: reference DESIGN.md and ROADMAP.md

Issue templates

  • Bug report (with problem ID field)
  • New problem suggestion (with tier and signature)
  • config.yml: allow blank issues

README

  • Add hero image (assets/vera-bench-social-preview.png)
  • Fix image extension (.jpg -> .png)

Manual steps after merge

  • Settings -> Branches: add codecov/project/python to required checks once it fires
  • Settings -> Branches: add CodeRabbit status check name once visible

Test plan

  • 308 tests pass
  • Ruff clean

Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • Documentation

    • Added design documentation and project roadmap.
    • Added pre-commit hooks guide for contributors.
  • New Features

    • Added social preview image to README.
    • Added GitHub issue templates for bug reports and problem suggestions.
    • Configured pre-commit hooks for code quality automation.
  • Bug Fixes

    • Fixed verification metric calculations to exclude attempts without results.
  • Chores

    • Added pre-commit and code coverage configuration.
    • Updated development dependencies.

- Add social preview image to assets/ and README header
- Fix image extension in README (.jpg -> .png)
- Fix verify@1 showing 0% instead of - for Python results in tier
  breakdown: only count problems as verify-eligible when verify_pass
  is explicitly True/False, not None (Python has no verifier)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Mar 30, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 38cc2482-e6d6-4aae-98bf-07803af8cd8f

📥 Commits

Reviewing files that changed from the base of the PR and between 3d42139 and 6dc343b.

📒 Files selected for processing (1)
  • DESIGN.md

📝 Walkthrough

Walkthrough

Refactors verification eligibility logic in vera_bench/metrics.py to exclude attempts lacking a verify_pass result, and adds documentation, issue templates, pre-commit and Codecov configs plus a README social-preview badge.

Changes

Cohort / File(s) Summary
Metrics logic
vera_bench/metrics.py
Change: store vp = best.get("verify_pass"); increment verify_eligible only when vp is not None and count verify successes only when vp is truthy so attempts without verifier results are excluded from verify-eligible failures.
Documentation & overview
README.md, CLAUDE.md, DESIGN.md, ROADMAP.md, CONTRIBUTING.md
Added README social-preview badge; split/design/roadmap docs added/updated; CONTRIBUTING now documents pre-commit usage.
Issue templates
.github/ISSUE_TEMPLATE/bug_report.md, .github/ISSUE_TEMPLATE/new_problem.md, .github/ISSUE_TEMPLATE/config.yml
Added bug-report and new-problem issue templates and enabled blank issues via config.
Pre-commit & CI config
.pre-commit-config.yaml, pyproject.toml, codecov.yml
Added pre-commit configuration with hooks for linting/formatting/validation/tests; added pre-commit>=3.0 to dev extras; introduced Codecov rules for python flag and patch behaviour.
Repository hygiene
CLAUDE.md
Replaced single design pointer with separate DESIGN and ROADMAP references; clarified BRIEFING.md as historical.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

harness, ci, docs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the primary changes: housekeeping across multiple areas (hero image, verify bug fix, codecov, pre-commit, templates, and new documentation files).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/housekeeping

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter

codecov-commenter commented Mar 30, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 65.56%. Comparing base (9801346) to head (6dc343b).

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #16      +/-   ##
==========================================
+ Coverage   65.41%   65.56%   +0.15%     
==========================================
  Files          10       10              
  Lines         905      909       +4     
==========================================
+ Hits          592      596       +4     
  Misses        313      313              
Flag Coverage Δ
python 65.56% <100.00%> (+0.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

CI & configuration:
- codecov.yml: configure project/patch status checks with python flag
- .pre-commit-config.yaml: trailing whitespace, YAML/JSON validation,
  ruff lint+format, problem validation on .json/.vera changes, pytest
- pyproject.toml: add pre-commit to dev dependencies

Documentation:
- DESIGN.md: permanent design rationale extracted from BRIEFING.md
  (prior art, tier definitions, metrics, key decisions)
- ROADMAP.md: forward-looking milestones and open issues
- CONTRIBUTING.md: add pre-commit install to dev setup
- CLAUDE.md: reference DESIGN.md and ROADMAP.md

Issue templates:
- Bug report template (with problem ID field)
- New problem suggestion template (with tier and signature)
- config.yml: allow blank issues

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@codecov.yml`:
- Around line 9-14: Patch coverage is currently set to informational (patch ->
python -> informational: true) which disables enforcement for new code; change
informational to false and add a reasonable minimum threshold (e.g., under the
same patch.python section set minimum: 50 or 70) so patch coverage is enforced,
or if you intend to keep it informational, add a comment/documentation entry
explaining and justifying why patch.python.informational remains true. Ensure
you update the patch.python keys (informational and minimum) so CI will enforce
the desired patch coverage policy.
- Around line 1-8: Update the Codecov project threshold to match the CI/local
requirement by replacing the current threshold value referenced as "threshold:
1.0%" in codecov.yml with a value consistent with the CI flag
"--cov-fail-under=35" (set to 35% or higher), or alternatively raise it to
reflect the ROADMAP target (>90%); if you intentionally keep a minimal gate, add
a short rationale comment in the repo docs explaining why codecov.yml's
threshold differs from CI's "--cov-fail-under=35" and the ROADMAP goal to avoid
mismatch confusion.

In `@DESIGN.md`:
- Around line 50-51: The doc claims verify@1 uses the first attempt but the
implementation computes verification on the best passing attempt (checks attempt
2 first, else attempt 1); choose one of two fixes: (A) Update the DESIGN.md text
for verify@1 to describe "best-attempt verify" semantics (mentioning that
verify@1 returns true if any attempt passes check, preferring later attempts),
or (B) Change the metrics code path that computes verify@1 to strictly use the
first attempt only (i.e., evaluate check on attempt1 and ignore subsequent
attempts when computing verify@1); update or add tests to cover the chosen
behavior so metrics and DESIGN.md remain consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: eb99c0a5-13a6-4ae4-9689-8a91a68c329e

📥 Commits

Reviewing files that changed from the base of the PR and between 6b80275 and e43c6d7.

📒 Files selected for processing (10)
  • .github/ISSUE_TEMPLATE/bug_report.md
  • .github/ISSUE_TEMPLATE/config.yml
  • .github/ISSUE_TEMPLATE/new_problem.md
  • .pre-commit-config.yaml
  • CLAUDE.md
  • CONTRIBUTING.md
  • DESIGN.md
  • ROADMAP.md
  • codecov.yml
  • pyproject.toml

Comment thread codecov.yml
Comment thread codecov.yml
Comment thread DESIGN.md Outdated
- codecov.yml: add comments explaining threshold is a delta (not absolute
  floor) and why patch coverage is informational
- DESIGN.md: verify@1 and run_correct use best passing attempt, not
  strictly first attempt — match the actual metrics implementation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@DESIGN.md`:
- Around line 22-29: Update the VerifyThisBench row in the table so the Language
column lists the seven supported verification frameworks (Dafny, Frama-C,
VerCors, VeriFast, Why3, Verus, CBMC) instead of "Dafny/Lean/Coq"; locate the
row beginning with "**VerifyThisBench**" in DESIGN.md and replace the language
cell content accordingly while keeping the rest of the row (Problems, Task, Key
metric) unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9872ad9f-2c24-4ebf-a9fd-c5c45514504f

📥 Commits

Reviewing files that changed from the base of the PR and between e43c6d7 and 00a8846.

📒 Files selected for processing (2)
  • DESIGN.md
  • codecov.yml

Comment thread DESIGN.md
Seven verification frameworks, not Dafny/Lean/Coq.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@aallan aallan changed the title Add hero image, fix verify display for Python results Repo housekeeping: hero image, verify fix, codecov, pre-commit, templates, DESIGN/ROADMAP Mar 30, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@DESIGN.md`:
- Line 50: Update the verify@1 definition to remove or clarify "Tier 1
contracts": change the phrase to explicitly state that verify@1 checks "Does the
best passing attempt pass `vera verify` (SMT-based contract verification)?" or
simply drop the tier qualifier so it reads "Does the best passing attempt pass
`vera verify`?" and ensure any mention of `verify_pass` or `vera verify`
elsewhere aligns with this SMT-based verification meaning.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 581ce3c2-8d97-457c-90d2-bc87bf2e2936

📥 Commits

Reviewing files that changed from the base of the PR and between 00a8846 and 3d42139.

📒 Files selected for processing (1)
  • DESIGN.md

Comment thread DESIGN.md Outdated
vera verify does SMT-based verification across all tiers. The Tier 1
expectation is a per-problem flag, not part of the metric definition.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@aallan aallan merged commit 908b435 into main Mar 30, 2026
8 checks passed
@aallan aallan deleted the fix/housekeeping branch March 30, 2026 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants