Increase test coverage to 83%, version in filenames (v0.0.6)#36
Conversation
New test files: - test_vera_runner_integration.py: real vera subprocess tests (check, verify, run_fn, version, _vera_bin edge cases) - test_validate_integration.py: real validation pipeline tests (find_vera_file, normalize_output, validate_problem, run_validation) - test_cli.py: Click CliRunner tests for all commands - test_models.py: LLM client creation, missing API keys, mock complete() Expanded existing tests: - test_runner.py: Python eval error paths (syntax, runtime, wrong output), run_benchmark JSONL writing, version fields in ProblemResult Coverage improvements: - validate.py: 12% → 82% - cli.py: 48% → 71% - prompts.py: 79% → 100% - vera_runner.py: 65% → 91% - runner.py: 68% → 77% CI coverage threshold raised from 35% to 80%. 376 tests passing (was 324). Closes #20. Progress on #5. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughBumps package to v0.0.6, raises CI Python 3.12 coverage gate from 35% to 80%, and adds extensive test coverage (multiple new unit and integration test files) along with changelog and roadmap updates; no production API or exported symbols were changed. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~35 minutes Possibly related issues
Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #36 +/- ##
===========================================
+ Coverage 65.68% 82.93% +17.24%
===========================================
Files 10 10
Lines 1090 1090
===========================================
+ Hits 716 904 +188
+ Misses 374 186 -188
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CHANGELOG.md`:
- Around line 16-17: Update the release note sentence that currently reads "52
new tests across 3 new test files (test_cli.py, test_models.py,
test_vera_runner_integration.py)" to reflect the correct number "4" and ensure
the parenthetical lists all four new test filenames (add the missing test file
name or mirror the PR summary); update the numeric count ("3" → "4") and adjust
the parenthetical to include the fourth file so the changelog is factually
accurate.
In `@ROADMAP.md`:
- Line 19: Update the checklist item text that currently reads "[x] Increase
test coverage to >83% (issue `#5`, ongoing)" in ROADMAP.md to accurately reflect
the achieved coverage by replacing ">83%" with either "83%" or ">=83%"; locate
the exact string in the file and edit it to "83%" (or ">=83%" if you prefer to
express a lower bound) so the roadmap is factually correct.
In `@tests/test_cli.py`:
- Around line 79-92: The test test_typescript_baselines currently always asserts
exit_code == 0 but may require the external "tsx" runtime; modify
test_typescript_baselines to detect presence of the TypeScript runtime before
invoking CliRunner (e.g., use shutil.which("tsx") or similar) and call
pytest.skip("tsx not found") when missing so the test is skipped rather than
failing; update the test in tests/test_cli.py (inside test_typescript_baselines,
before invoking main/CliRunner) to perform this check and skip behavior.
- Around line 11-15: The test_runs_successfully test hardcodes "50/50" which
will break when the corpus size changes; update the assertion on result.output
(from the CliRunner(...).invoke(main, ["validate"]) call) to check for a dynamic
passed/total pattern instead (e.g., use a regex like \d+/\d+ via re.search) or
parse the output to extract numeric counts and assert the format and that
exit_code == 0 remains true so the test is resilient to changes in problem
count.
In `@tests/test_models.py`:
- Around line 84-89: The current test patches vera_bench.models.anthropic/openai
but local imports inside AnthropicClient.__init__ and OpenAIClient.__init__
still import the real SDKs; instead patch the constructors themselves: replace
patch("vera_bench.models.anthropic") with
patch.object(vera_bench.models.AnthropicClient, "__init__", return_value=None)
and similarly patch.object(vera_bench.models.OpenAIClient, "__init__",
return_value=None); after patching the __init__s, set up MagicMock instances for
the client behaviors you need and, if your code references SDK exceptions like
anthropic.APITimeoutError or openai.error.Timeout, inject mock exception
attributes onto the mocked objects or modules used by the clients so tests use
the mocked exceptions rather than real SDK classes.
In `@tests/test_validate_integration.py`:
- Line 1: Add a module-level skip when the external "vera" binary is not on
PATH: import pytest and shutil, set pytestmark =
pytest.mark.skipif(shutil.which("vera") is None, reason="vera not available") at
the top of tests/test_validate_integration.py so all tests in this module are
skipped if shutil.which("vera") returns None; reference the pytestmark symbol
and the use of shutil.which("vera") to locate where to add the guard.
In `@tests/test_validate.py`:
- Around line 177-201: These tests rely on real network calls to veralang.dev
(tests test_load_from_url, test_load_default, test_bad_url) which makes them
flaky; update them to mock the URL fetch used by load_skill_md (or replace
SKILL_MD_URL) so network I/O is deterministic: patch the HTTP client/function
load_skill_md uses (e.g., requests.get or your internal fetcher) to return a
fixed response body for the success cases and a controlled error/HTTP status for
the failure case, and assert against that mocked content (keep
test_load_from_file unchanged); reference load_skill_md, SKILL_MD_URL, and the
tests test_load_from_url/test_load_default/test_bad_url when implementing the
mocks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2dd86e15-3402-4b28-af7b-c6aa0a913582
📒 Files selected for processing (10)
.github/workflows/ci.ymlCHANGELOG.mdROADMAP.mdpyproject.tomltests/test_cli.pytests/test_models.pytests/test_runner.pytests/test_validate.pytests/test_validate_integration.pytests/test_vera_runner_integration.py
- CHANGELOG: fix test file count (3 -> 4) - ROADMAP: fix coverage percentage (>83% -> 83%) - test_cli.py: skip TS baselines when tsx missing, use regex for problem count assertion - test_validate_integration.py: skip all when vera not on PATH - test_vera_runner_integration.py: skip all when vera not on PATH - test_validate.py: mock URL fetch in load_skill_md tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_cli.py`:
- Around line 83-99: The inline comment "# May skip if tsx not available" in the
test_typescript_baselines test is stale because the pytest.mark.skipif decorator
(pytest.mark.skipif(...)) already handles skipping when tsx/npx are missing;
remove that comment to avoid misleading phrasing and keep the test doc accurate,
leaving the decorator and the assert result.exit_code == 0 unchanged.
In `@tests/test_validate.py`:
- Around line 212-216: test_bad_url currently performs a real HTTP request;
update the test_bad_url test to patch urllib.request.urlopen so it raises
urllib.error.URLError (e.g., via unittest.mock.patch or pytest monkeypatch) when
load_skill_md is called, preserving the pytest.raises(RuntimeError,
match="Failed to fetch") assertion and referencing the load_skill_md function
and urllib.request.urlopen to locate the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5e71ab2d-be2d-4557-bc12-018f49ff875d
📒 Files selected for processing (6)
CHANGELOG.mdROADMAP.mdtests/test_cli.pytests/test_validate.pytests/test_validate_integration.pytests/test_vera_runner_integration.py
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Two issues bundled: #20 (version in filenames) and #5 (test coverage).
Coverage: 66% → 83%
4 new test files, 52 new tests, 376 total (was 324).
CI coverage threshold raised from 35% to 80%.
Version tracking (#20)
Filenames now include bench + vera versions:
Each JSONL record carries
bench_versionandvera_versionfields.Closes #20. Progress on #5.
Generated with Claude Code
Summary by CodeRabbit
Tests
New Features
Chores