Skip to content

Include bench and vera versions in filenames and JSONL records (#20)#35

Merged
aallan merged 2 commits into
mainfrom
feature/version-in-filenames
Mar 31, 2026
Merged

Include bench and vera versions in filenames and JSONL records (#20)#35
aallan merged 2 commits into
mainfrom
feature/version-in-filenames

Conversation

@aallan

@aallan aallan commented Mar 31, 2026

Copy link
Copy Markdown
Owner

Filenames now include both VeraBench and vera compiler versions:

model-bench-0-0-5-vera-0-0-105.jsonl         # vera full-spec
model-spec-from-nl-bench-0-0-5-vera-0-0-105.jsonl  # vera spec-from-nl
model-python-bench-0-0-5.jsonl                # python (no vera version)

Each JSONL record also carries bench_version and vera_version fields for cross-version analysis.

Closes #20.

Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Benchmark now records both bench and Vera compiler versions, embeds them in output files and includes them in console output; filenames include version identifiers for traceability.
  • Tests

    • Added tests ensuring version strings are retrieved from the compiler and correctly serialized into generated output.

- VeraRunner.version(): queries vera compiler version via subprocess
- ProblemResult: new bench_version and vera_version fields
- CLI: versions threaded through run_benchmark to every result record
- Filenames include both versions with dots-to-hyphens conversion:
  model-bench-0-0-5-vera-0-0-105.jsonl (vera runs)
  model-python-bench-0-0-5.jsonl (non-vera runs)
- Console output prints both version strings

Closes #20.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Mar 31, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f5e04f5c-2c64-4bd9-9149-f02f071cdfcf

📥 Commits

Reviewing files that changed from the base of the PR and between f98cfae and af1aee1.

📒 Files selected for processing (1)
  • vera_bench/vera_runner.py

📝 Walkthrough

Walkthrough

Adds benchmark and compiler version tracking: VeraRunner.version() is introduced; ProblemResult gains bench_version and vera_version; runner and CLI propagate and display these versions and include them in output filenames; tests updated to assert version fields and VeraRunner.version() behaviour.

Changes

Cohort / File(s) Summary
Tests
tests/test_runner.py, tests/test_validate.py
Added test_to_jsonl_includes_versions to assert bench_version/vera_version are serialized; added TestVeraRunner.test_version to assert VeraRunner.version() returns a non-unknown, dot-containing string; removed a strict error_message assertion from an existing test.
Runner data & APIs
vera_bench/runner.py
Added bench_version: str and vera_version: str fields to ProblemResult; updated run_single_problem() and run_benchmark() signatures to accept and forward these parameters (default to empty strings).
Vera CLI integration
vera_bench/vera_runner.py
Added VeraRunner.version() which runs the Vera CLI with a 5s timeout, parses stdout for the compiler version, and returns "unknown" on failure.
CLI filename & output
vera_bench/cli.py
Now reads vera_bench.__version__, computes a bench-version slug, conditionally queries Vera version for language == "vera", prints Bench/Vera versions, appends version segments to output filename, and passes versions into run_benchmark().

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Suggested labels

harness

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarises the main change: including bench and vera versions in both output filenames and JSONL records, matching the PR's core objectives.
Linked Issues check ✅ Passed All objectives from issue #20 are met: vera version is obtained via VeraRunner.version(), appended to filenames for vera runs only, recorded in JSONL records, and python runs remain unaffected.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #20 requirements; no unrelated modifications to unrelated components or feature creep detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/version-in-filenames

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented Mar 31, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 33.33333% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.68%. Comparing base (6a48aef) to head (af1aee1).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
vera_bench/cli.py 0.00% 13 Missing ⚠️
vera_bench/vera_runner.py 66.66% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #35      +/-   ##
==========================================
- Coverage   66.29%   65.68%   -0.61%     
==========================================
  Files          10       10              
  Lines        1068     1090      +22     
==========================================
+ Hits          708      716       +8     
- Misses        360      374      +14     
Flag Coverage Δ
python 65.68% <33.33%> (-0.61%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@aallan aallan merged commit e723cb7 into main Mar 31, 2026
10 checks passed
@aallan aallan deleted the feature/version-in-filenames branch March 31, 2026 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Include vera compiler version in output JSONL filename

1 participant