Fix run_correct reporting: show '-' when no tests exist by aallan · Pull Request #4 · aallan/vera-bench

aallan · 2026-03-29T20:44:04Z

Summary

Tier 2/3 problems have empty test_cases (vera run can't pass string/ADT args via CLI). Previously _rate(0, 0) returned 0.0, making these tiers show 0% run_correct — misleading since no tests were run.
Now _rate() returns None when denominator is 0, rendered as - in reports and CLI.
Also adds results/*.jsonl and results/summary.md to .gitignore (generated artifacts).

Test plan

285 tests pass
Ruff clean
Verified with actual benchmark results from first Sonnet run

Summary by CodeRabbit

Bug Fixes
- Rates now use a null sentinel and display a dash (–) when no data is available, avoiding misleading zero values.
Tests
- Updated test expectations to reflect null-rate behaviour for empty inputs.
Chores
- Added ignore patterns to exclude generated results files from version control.

Tier 2/3 problems have empty test_cases (vera run cannot pass string/ADT args via CLI). Previously _rate(0, 0) returned 0.0, making these tiers show 0% run_correct in reports -- misleading since no tests were run. Now _rate() returns None when the denominator is 0, and the report/CLI render None as -. This correctly distinguishes no tests exist from all tests failed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-03-29T20:44:16Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 75a808f4-87fd-4f40-b2a9-f003235df8b9

📥 Commits

Reviewing files that changed from the base of the PR and between f63ef12 and 6cc244d.

📒 Files selected for processing (2)

vera_bench/cli.py
vera_bench/metrics.py

📝 Walkthrough

Walkthrough

This change makes metric rate fields nullable and propagates that sentinel (None) through computation, CLI and report formatting; formatting functions now show "-" for null rates. Also updates tests and .gitignore to exclude results files.

Changes

Cohort / File(s)	Summary
Gitignore configuration `\.gitignore`	Added ignore patterns to exclude `results/*.jsonl` and `results/summary.md`.
Metrics types & computation `vera_bench/metrics.py`	Changed rate fields from `float` to `float \| None`; `_rate` returns `None` when denominator is zero; aggregate and per-tier computations propagate `None`.
Reporting and CLI formatting `vera_bench/report.py`, `vera_bench/cli.py`	Updated `_pct` (`report`) and added `_fmt_rate` (`cli`) to accept/format `float \| None`, rendering `"-"` for null rates; CLI uses new formatter in metric tables.
Tests `tests/test_runner.py`	Updated assertion: `compute_metrics([])` now expects `check_rate` to be `None` instead of `0.0`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested labels

harness

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main change: updating null/None handling for rate metrics to display '-' when no tests exist, specifically addressing run_correct reporting.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/run-correct-reporting

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov-commenter · 2026-03-29T20:44:54Z

Codecov Report

❌ Patch coverage is 66.66667% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.88%. Comparing base (36265db) to head (6cc244d).

Files with missing lines	Patch %	Lines
vera_bench/cli.py	12.50%	7 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main       #4      +/-   ##
==========================================
- Coverage   59.97%   59.88%   -0.09%     
==========================================
  Files           9        9              
  Lines         707      713       +6     
==========================================
+ Hits          424      427       +3     
- Misses        283      286       +3

Flag	Coverage Δ
python	`59.88% <66.66%> (-0.09%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@vera_bench/metrics.py`:
- Line 23: The check_rate field is annotated as float but can be None at runtime
(see assignments and _rate() returning float | None), so update its type
annotation to allow None (use float | None) where declared (the class/struct
field and any parameter/attribute declarations currently annotated as
check_rate: float). Also adjust the signatures/usages of _fmt_rate and _pct (or
add local guards in those functions) to accept Optional[float] (or handle None
early) to avoid calling methods on None; ensure _rate(), _fmt_rate(), and _pct()
types are consistent with check_rate being float | None.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 06e236c1-37a3-4bf4-b042-9c9e56a06ad8

📥 Commits

Reviewing files that changed from the base of the PR and between 36265db and f63ef12.

📒 Files selected for processing (5)

.gitignore
tests/test_runner.py
vera_bench/cli.py
vera_bench/metrics.py
vera_bench/report.py

CodeRabbit correctly identified that check_rate was annotated as float but _rate() can return None (when denominator is 0). Also fix the per-tier display in _print_metrics to use _fmt_rate() for None safety. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

aallan and others added 2 commits March 29, 2026 21:43

Ignore generated results files (JSONL, summary.md)

f63ef12

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed Mar 29, 2026

View reviewed changes

Comment thread vera_bench/metrics.py Outdated

aallan merged commit 52c921b into main Mar 29, 2026
8 checks passed

This was referenced Mar 29, 2026

Implement baseline runner (Phase 3) #8

Merged

Repo housekeeping: hero image, verify fix, codecov, pre-commit, templates, DESIGN/ROADMAP #16

Merged

aallan deleted the fix/run-correct-reporting branch March 30, 2026 15:51

coderabbitai Bot mentioned this pull request Apr 14, 2026

Report T1-T4 aggregate separately for cross-language comparison #56

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix run_correct reporting: show '-' when no tests exist#4

Fix run_correct reporting: show '-' when no tests exist#4
aallan merged 3 commits into
mainfrom
fix/run-correct-reporting

aallan commented Mar 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 29, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

codecov-commenter commented Mar 29, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aallan commented Mar 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

codecov-commenter commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aallan commented Mar 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 29, 2026 •

edited

Loading

codecov-commenter commented Mar 29, 2026 •

edited

Loading