Skip to content

Increase test coverage to 83%, version in filenames (v0.0.6)#36

Merged
aallan merged 3 commits into
mainfrom
feature/test-coverage
Mar 31, 2026
Merged

Increase test coverage to 83%, version in filenames (v0.0.6)#36
aallan merged 3 commits into
mainfrom
feature/test-coverage

Conversation

@aallan

@aallan aallan commented Mar 31, 2026

Copy link
Copy Markdown
Owner

Summary

Two issues bundled: #20 (version in filenames) and #5 (test coverage).

Coverage: 66% → 83%

File Before After
validate.py 12% 82%
cli.py 48% 71%
prompts.py 79% 100%
vera_runner.py 65% 91%
runner.py 68% 77%
baseline_runner.py 91% 92%

4 new test files, 52 new tests, 376 total (was 324).
CI coverage threshold raised from 35% to 80%.

Version tracking (#20)

Filenames now include bench + vera versions:

model-bench-0-0-6-vera-0-0-105.jsonl

Each JSONL record carries bench_version and vera_version fields.

Closes #20. Progress on #5.

Generated with Claude Code

Summary by CodeRabbit

  • Tests

    • Added extensive unit and integration tests across CLI, models, runner and validation, raising coverage to ~83% (52 new tests).
  • New Features

    • Added a public version-reporting method for the runner API.
  • Chores

    • Bumped project version to 0.0.6.
    • Raised CI coverage gate from 35% to 80%.
    • Updated changelog and roadmap to reflect the new release.

New test files:
- test_vera_runner_integration.py: real vera subprocess tests (check,
  verify, run_fn, version, _vera_bin edge cases)
- test_validate_integration.py: real validation pipeline tests
  (find_vera_file, normalize_output, validate_problem, run_validation)
- test_cli.py: Click CliRunner tests for all commands
- test_models.py: LLM client creation, missing API keys, mock complete()

Expanded existing tests:
- test_runner.py: Python eval error paths (syntax, runtime, wrong output),
  run_benchmark JSONL writing, version fields in ProblemResult

Coverage improvements:
- validate.py: 12% → 82%
- cli.py: 48% → 71%
- prompts.py: 79% → 100%
- vera_runner.py: 65% → 91%
- runner.py: 68% → 77%

CI coverage threshold raised from 35% to 80%.
376 tests passing (was 324).

Closes #20. Progress on #5.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Mar 31, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Bumps package to v0.0.6, raises CI Python 3.12 coverage gate from 35% to 80%, and adds extensive test coverage (multiple new unit and integration test files) along with changelog and roadmap updates; no production API or exported symbols were changed.

Changes

Cohort / File(s) Summary
Version & Release Metadata
pyproject.toml, CHANGELOG.md, ROADMAP.md
Package version updated to 0.0.6; changelog and roadmap updated to document release, benchmark/vera versioning metadata, new VeraRunner.version() mention, and adjusted coverage target/metrics.
CI Configuration
.github/workflows/ci.yml
Increased coverage-fail threshold for Python 3.12 tests from --cov-fail-under=35 to --cov-fail-under=80.
CLI Tests
tests/test_cli.py
New Click integration tests for validate, run, baselines, and report subcommands, asserting exit codes, output strings, warnings, and JSONL baseline generation.
Model Client Tests
tests/test_models.py
New tests for create_client, AnthropicClient, OpenAIClient and LLMResponse, covering missing API keys, unknown models, and mocked SDK completions.
Runner & Validator Tests
tests/test_runner.py, tests/test_validate.py, tests/test_validate_integration.py
New unit and integration tests for Python evaluation errors, JSONL output writing, skill markdown loading, validate_problem behaviour, normalization and error categories; some tests gated on external vera availability.
VeraRunner Integration Tests
tests/test_vera_runner_integration.py
Integration tests validating vera binary discovery, VeraRunner.version(), check()/verify() outcomes and exported-function execution against the system vera binary (module skipped if vera absent).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Possibly related issues

Possibly related PRs

Suggested labels

ci

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning Issue #20 requires vera compiler version in output JSONL filenames and records, but the raw_summary provides no evidence of implementation in cli.py, runner.py, or JSONL output logic. Verify that cli.py appends vera version to filenames when language=='vera', and that JSONL records include bench_version and vera_version fields as required by #20.
Docstring Coverage ⚠️ Warning Docstring coverage is 8.47% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarises the main changes: increased test coverage to 83% and version bump to 0.0.6 reflected in filenames and documentation.
Out of Scope Changes check ✅ Passed All changes align with PR objectives: test coverage expansion (4 new test files, 52 tests) and version documentation updates (pyproject.toml, CHANGELOG, ROADMAP) directly support issues #20 and #5.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/test-coverage

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented Mar 31, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.93%. Comparing base (e723cb7) to head (4bdb888).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main      #36       +/-   ##
===========================================
+ Coverage   65.68%   82.93%   +17.24%     
===========================================
  Files          10       10               
  Lines        1090     1090               
===========================================
+ Hits          716      904      +188     
+ Misses        374      186      -188     
Flag Coverage Δ
python 82.93% <ø> (+17.24%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Around line 16-17: Update the release note sentence that currently reads "52
new tests across 3 new test files (test_cli.py, test_models.py,
test_vera_runner_integration.py)" to reflect the correct number "4" and ensure
the parenthetical lists all four new test filenames (add the missing test file
name or mirror the PR summary); update the numeric count ("3" → "4") and adjust
the parenthetical to include the fourth file so the changelog is factually
accurate.

In `@ROADMAP.md`:
- Line 19: Update the checklist item text that currently reads "[x] Increase
test coverage to >83% (issue `#5`, ongoing)" in ROADMAP.md to accurately reflect
the achieved coverage by replacing ">83%" with either "83%" or ">=83%"; locate
the exact string in the file and edit it to "83%" (or ">=83%" if you prefer to
express a lower bound) so the roadmap is factually correct.

In `@tests/test_cli.py`:
- Around line 79-92: The test test_typescript_baselines currently always asserts
exit_code == 0 but may require the external "tsx" runtime; modify
test_typescript_baselines to detect presence of the TypeScript runtime before
invoking CliRunner (e.g., use shutil.which("tsx") or similar) and call
pytest.skip("tsx not found") when missing so the test is skipped rather than
failing; update the test in tests/test_cli.py (inside test_typescript_baselines,
before invoking main/CliRunner) to perform this check and skip behavior.
- Around line 11-15: The test_runs_successfully test hardcodes "50/50" which
will break when the corpus size changes; update the assertion on result.output
(from the CliRunner(...).invoke(main, ["validate"]) call) to check for a dynamic
passed/total pattern instead (e.g., use a regex like \d+/\d+ via re.search) or
parse the output to extract numeric counts and assert the format and that
exit_code == 0 remains true so the test is resilient to changes in problem
count.

In `@tests/test_models.py`:
- Around line 84-89: The current test patches vera_bench.models.anthropic/openai
but local imports inside AnthropicClient.__init__ and OpenAIClient.__init__
still import the real SDKs; instead patch the constructors themselves: replace
patch("vera_bench.models.anthropic") with
patch.object(vera_bench.models.AnthropicClient, "__init__", return_value=None)
and similarly patch.object(vera_bench.models.OpenAIClient, "__init__",
return_value=None); after patching the __init__s, set up MagicMock instances for
the client behaviors you need and, if your code references SDK exceptions like
anthropic.APITimeoutError or openai.error.Timeout, inject mock exception
attributes onto the mocked objects or modules used by the clients so tests use
the mocked exceptions rather than real SDK classes.

In `@tests/test_validate_integration.py`:
- Line 1: Add a module-level skip when the external "vera" binary is not on
PATH: import pytest and shutil, set pytestmark =
pytest.mark.skipif(shutil.which("vera") is None, reason="vera not available") at
the top of tests/test_validate_integration.py so all tests in this module are
skipped if shutil.which("vera") returns None; reference the pytestmark symbol
and the use of shutil.which("vera") to locate where to add the guard.

In `@tests/test_validate.py`:
- Around line 177-201: These tests rely on real network calls to veralang.dev
(tests test_load_from_url, test_load_default, test_bad_url) which makes them
flaky; update them to mock the URL fetch used by load_skill_md (or replace
SKILL_MD_URL) so network I/O is deterministic: patch the HTTP client/function
load_skill_md uses (e.g., requests.get or your internal fetcher) to return a
fixed response body for the success cases and a controlled error/HTTP status for
the failure case, and assert against that mocked content (keep
test_load_from_file unchanged); reference load_skill_md, SKILL_MD_URL, and the
tests test_load_from_url/test_load_default/test_bad_url when implementing the
mocks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2dd86e15-3402-4b28-af7b-c6aa0a913582

📥 Commits

Reviewing files that changed from the base of the PR and between e723cb7 and de4ef1a.

📒 Files selected for processing (10)
  • .github/workflows/ci.yml
  • CHANGELOG.md
  • ROADMAP.md
  • pyproject.toml
  • tests/test_cli.py
  • tests/test_models.py
  • tests/test_runner.py
  • tests/test_validate.py
  • tests/test_validate_integration.py
  • tests/test_vera_runner_integration.py

Comment thread CHANGELOG.md Outdated
Comment thread ROADMAP.md Outdated
Comment thread tests/test_cli.py
Comment thread tests/test_cli.py
Comment thread tests/test_models.py
Comment thread tests/test_validate_integration.py
Comment thread tests/test_validate.py
- CHANGELOG: fix test file count (3 -> 4)
- ROADMAP: fix coverage percentage (>83% -> 83%)
- test_cli.py: skip TS baselines when tsx missing, use regex for
  problem count assertion
- test_validate_integration.py: skip all when vera not on PATH
- test_vera_runner_integration.py: skip all when vera not on PATH
- test_validate.py: mock URL fetch in load_skill_md tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_cli.py`:
- Around line 83-99: The inline comment "# May skip if tsx not available" in the
test_typescript_baselines test is stale because the pytest.mark.skipif decorator
(pytest.mark.skipif(...)) already handles skipping when tsx/npx are missing;
remove that comment to avoid misleading phrasing and keep the test doc accurate,
leaving the decorator and the assert result.exit_code == 0 unchanged.

In `@tests/test_validate.py`:
- Around line 212-216: test_bad_url currently performs a real HTTP request;
update the test_bad_url test to patch urllib.request.urlopen so it raises
urllib.error.URLError (e.g., via unittest.mock.patch or pytest monkeypatch) when
load_skill_md is called, preserving the pytest.raises(RuntimeError,
match="Failed to fetch") assertion and referencing the load_skill_md function
and urllib.request.urlopen to locate the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5e71ab2d-be2d-4557-bc12-018f49ff875d

📥 Commits

Reviewing files that changed from the base of the PR and between de4ef1a and 2879585.

📒 Files selected for processing (6)
  • CHANGELOG.md
  • ROADMAP.md
  • tests/test_cli.py
  • tests/test_validate.py
  • tests/test_validate_integration.py
  • tests/test_vera_runner_integration.py

Comment thread tests/test_cli.py
Comment thread tests/test_validate.py
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@aallan aallan merged commit e0c542c into main Mar 31, 2026
9 checks passed
@aallan aallan deleted the feature/test-coverage branch March 31, 2026 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Include vera compiler version in output JSONL filename

1 participant