Include mode in output filename to avoid overwriting results by aallan · Pull Request #17 · aallan/vera-bench

aallan · 2026-03-30T11:06:13Z

full-spec and spec-from-nl were writing to the same JSONL file. Now:

vera-bench run --model X → X.jsonl
vera-bench run --model X --mode spec-from-nl → X-spec-from-nl.jsonl
vera-bench run --model X --language python → X-python.jsonl

Summary by CodeRabbit

Bug Fixes
- Refined output filename generation to include mode information in results filenames for non-default configurations, improving file organisation and discoverability.

full-spec (default) produces {model}.jsonl, spec-from-nl produces {model}-spec-from-nl.jsonl. Language suffix also included when not vera. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

codecov-commenter · 2026-03-30T11:06:49Z

Codecov Report

❌ Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.27%. Comparing base (908b435) to head (60f60a0).

Files with missing lines	Patch %	Lines
vera_bench/cli.py	0.00%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #17      +/-   ##
==========================================
- Coverage   65.56%   65.27%   -0.29%     
==========================================
  Files          10       10              
  Lines         909      913       +4     
==========================================
  Hits          596      596              
- Misses        313      317       +4

Flag	Coverage Δ
python	`65.27% <0.00%> (-0.29%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai · 2026-03-30T11:11:23Z

Warning

Rate limit exceeded

@aallan has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 4 minutes and 3 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 4 minutes and 3 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 43e8b075-9b4b-4356-8e9c-3085139282e5

📥 Commits

Reviewing files that changed from the base of the PR and between c0f642f and 60f60a0.

📒 Files selected for processing (1)

vera_bench/cli.py

📝 Walkthrough

Walkthrough

Modified the output filename generation logic in the run command to conditionally append both language and mode parameters to the model name, rather than only appending language. The logic now builds a hyphen-joined list of parts and produces filenames like model-language-mode.jsonl depending on parameter values.

Changes

Cohort / File(s)	Summary
Filename Generation Logic `vera_bench/cli.py`	Changed output filename construction from conditional language suffix to conditional language and mode suffix. Builds parts list starting with model, appending language (when not "vera") and mode (when not "full-spec"), then joins with hyphens.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested labels

harness

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and clearly describes the primary change: including mode in the output filename to prevent overwriting results.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/output-filename-mode

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@vera_bench/cli.py`:
- Around line 143-145: The filename builder currently appends the CLI variable
mode into parts and thus into output_path even when mode is ignored for Python;
update the logic around parts/ output_path so that when language == "python" (or
when the code path that warns about mode being ignored) you do not append mode
to parts — i.e., only append mode when it is actually honored (keep references
to the variables mode, parts, output_path, and output_dir to locate the change)
so filenames no longer reflect an ignored mode.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7278d546-3f5a-4ea3-8586-c2d8f826e63e

📥 Commits

Reviewing files that changed from the base of the PR and between 908b435 and c0f642f.

📒 Files selected for processing (1)

vera_bench/cli.py

Mode is ignored for Python, so it shouldn't appear in the filename. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Include mode in output filename to avoid overwriting results

c0f642f

full-spec (default) produces {model}.jsonl, spec-from-nl produces {model}-spec-from-nl.jsonl. Language suffix also included when not vera. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed Mar 30, 2026

View reviewed changes

Comment thread vera_bench/cli.py Outdated

Only include mode in filename for Vera runs

60f60a0

Mode is ignored for Python, so it shouldn't appear in the filename. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

aallan merged commit 6f03419 into main Mar 30, 2026
8 checks passed

aallan deleted the fix/output-filename-mode branch March 30, 2026 15:51

coderabbitai Bot mentioned this pull request Mar 31, 2026

Include bench and vera versions in filenames and JSONL records (#20) #35

Merged

coderabbitai Bot mentioned this pull request Apr 8, 2026

Fix FileNotFoundError for slash-prefixed model names #40

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include mode in output filename to avoid overwriting results#17

Include mode in output filename to avoid overwriting results#17
aallan merged 2 commits into
mainfrom
fix/output-filename-mode

aallan commented Mar 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

codecov-commenter commented Mar 30, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Mar 30, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Suggested labels

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aallan commented Mar 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

codecov-commenter commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Suggested labels

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aallan commented Mar 30, 2026 •

edited by coderabbitai Bot

Loading

codecov-commenter commented Mar 30, 2026 •

edited

Loading

coderabbitai Bot commented Mar 30, 2026 •

edited

Loading