Implement LLM runner harness (Phase 2) by aallan · Pull Request #3 · aallan/vera-bench

aallan · 2026-03-29T19:58:32Z

Summary

Implements the complete benchmark evaluation pipeline — vera-bench run --model claude-sonnet-4-20250514 now works end-to-end.

models.py: Anthropic + OpenAI API abstraction with lazy imports, provider detection from model ID prefix, SDK built-in retry for rate limits
runner.py: generate → check → verify → run → fix pipeline with code extraction from markdown fences, JSONL output (incremental, crash-safe), temp file management
metrics.py: check_rate, verify_rate, fix_rate, run_correct_rate computation with per-tier breakdowns
report.py: markdown report generation (summary table, tier breakdown, per-problem detail)
cli.py: run and report commands fully wired up

New CLI usage

# Run full benchmark
vera-bench run --model claude-sonnet-4-20250514

# Run single tier
vera-bench run --model claude-sonnet-4-20250514 --tier 1

# Run single problem
vera-bench run --model claude-sonnet-4-20250514 --problem VB-T1-001

# Spec-from-NL mode (agent writes contracts)
vera-bench run --model claude-sonnet-4-20250514 --mode spec-from-nl

# Generate report
vera-bench report results/

Key design decisions

No streaming — batch responses only, token counts come automatically
SDK retry — no custom rate limit handling, both SDKs retry 429s
Incremental JSONL — each result written immediately (survives crashes)
Code extraction — regex for markdown fences, longest block wins, falls back to raw text
One fix attempt — on check failure, feeds error back to model for one retry

Test plan

285 tests pass (259 existing + 26 new)
ruff check . && ruff format --check . clean
ruff check --select S vera_bench/ security lint clean
vera-bench run --model claude-sonnet-4-20250514 --problem VB-T1-001 (requires API key)

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Fully functional CLI run command with benchmark execution, per-model JSONL results and Rich summary table
- New run options: --max-tokens (default 4096), --keep-temps, improved --tier help and --skill-md support
- Benchmark runner: end-to-end generation, evaluation, optional fix attempts and incremental result output
- Markdown-only report generation that writes/prints summary.md and per-model reports
- Automated metrics computation with tiered breakdowns and JSONL load support
- Integrated LLM provider support with clearer errors for missing keys/unsupported models
Tests
- Comprehensive test suite covering parsing, serialization, provider detection, metrics, reporting and retry behaviour
Documentation
- README updated with prerequisites, installation, Vera setup and revised quick-start commands

Add the complete benchmark evaluation pipeline: models.py — LLM API abstraction - AnthropicClient and OpenAIClient with lazy imports - Unified LLMResponse dataclass (text, tokens, wall_time) - Provider detection from model ID prefix (claude-*, gpt-*, o1-*, o3-*) - API keys from environment, SDK built-in retry for rate limits runner.py — Pipeline orchestration - extract_vera_code(): regex-based code extraction from markdown fences - run_single_problem(): generate -> check -> verify -> run -> fix pipeline - run_benchmark(): iterate problems with rich progress, JSONL output - ProblemResult dataclass matching BRIEFING.md JSONL format - Retry-with-error-feedback (one fix attempt on check failure) - Temp file management with optional --keep-temps metrics.py — Result aggregation - load_results(): parse JSONL files - compute_metrics(): check_rate, verify_rate, fix_rate, run_correct_rate - Per-tier breakdowns via problem ID parsing - Handles multi-attempt results (best-attempt for verify/run, fix_rate) report.py — Markdown report generation - Summary table (model x metrics) - Tier breakdown matrix - Per-problem detail listing - Writes summary.md to results directory cli.py — Wired up run and report commands - vera-bench run --model MODEL [--tier N] [--problem ID] [--mode MODE] - vera-bench report RESULTS_DIR - Problem filtering, SKILL.md loading, output directory management - Metrics summary printed on completion tests/test_runner.py — 26 new tests - Code extraction (plain, fenced, multi-fence, no-fence) - ProblemResult JSONL serialization - Provider detection (claude/gpt/unknown) - Metrics computation with hand-crafted fixtures - Report generation - Full pipeline with mock LLMClient and VeraRunner Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-03-29T19:58:49Z

Warning

Rate limit exceeded

@aallan has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 5 minutes and 23 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 5 minutes and 23 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d77e22f1-4725-467c-a451-86497c6ff8ea

📥 Commits

Reviewing files that changed from the base of the PR and between 8874109 and f281579.

📒 Files selected for processing (5)

README.md
tests/test_runner.py
vera_bench/cli.py
vera_bench/models.py
vera_bench/runner.py

📝 Walkthrough

Walkthrough

A benchmark harness adding LLM client abstractions, a runner that generates and evaluates Vera programs with optional fix attempts, metric computation and Markdown reporting, CLI wiring for run/report, and comprehensive tests covering parsing, serialization, provider selection, metrics, reporting and retry behaviour.

Changes

Cohort / File(s)	Summary
LLM client & models `vera_bench/models.py`	Added `LLMResponse` dataclass and `LLMClient` Protocol; implemented `create_client()` factory, `AnthropicClient` and `OpenAIClient` with lazy SDK imports, API-key checks, timeout→`TimeoutError` handling, and safe extraction/defaulting of tokens, model and wall-time.
Runner / evaluation `vera_bench/runner.py`	Added `extract_vera_code()` parser, `ProblemResult` with `to_jsonl()` that omits `None`, `_evaluate_code()` to write/run/check/verify/execute tests, `run_single_problem()` with fix-retry logic and `run_benchmark()` with Rich progress, temp-dir management, optional keep-temps and `_now()` timestamp helper.
Metrics computation `vera_bench/metrics.py`	Added `TierMetrics` and `BenchmarkMetrics` dataclasses, `load_results()` and `compute_metrics()` plus helpers (`_compute_by_tier`, `_tier_from_id`, `_rate`) to group attempts by problem/tier and compute check/verify/fix/run_correct rates with zero-division protection and empty-input handling.
Reporting (Markdown) `vera_bench/report.py`	Replaced multi-format reporting with Markdown-only `generate_report(results_dir: Path) -> str`; scans `*.jsonl`, uses `load_results` and `compute_metrics`, writes `summary.md`, builds summary, tier breakdown and per-problem sections, and returns/report messages when no results found.
CLI integration `vera_bench/cli.py`	Implemented `run` flow: discover problems, load SKILL.md, create client, instantiate VeraRunner and call `run_benchmark`; added `--max-tokens` and `--keep-temps` options, `_repo_root()` helper, improved `--tier` help, and updated `report` to call `generate_report` and print `summary.md` path.
Tests `tests/test_runner.py`	New tests covering `extract_vera_code()`, `ProblemResult.to_jsonl()`, provider selection/errors for `create_client()`, `compute_metrics()` and `load_results()` behaviours, `generate_report()` outputs, CLI command presence, and mocked `run_single_problem()` retry semantics (including retry suppression when `max_fix_attempts=0`).
Docs / README `README.md`	Rewrote Quick start and installation: explicit Python/Git prerequisites, virtualenv workflow, `.[llm]` extra, instructions to install Vera compiler and minimum Vera version, updated examples using `vera-bench` and `results/` layout.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested labels

harness,docs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 24.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Implement LLM runner harness (Phase 2)' directly and specifically describes the main change—completing the LLM evaluation pipeline with models, runner, metrics, reporting, and CLI integration.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/llm-runner

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov-commenter · 2026-03-29T19:59:51Z

Codecov Report

❌ Patch coverage is 70.11236% with 133 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.97%. Comparing base (a6749ee) to head (f281579).

Files with missing lines	Patch %	Lines
vera_bench/cli.py	10.60%	59 Missing ⚠️
vera_bench/runner.py	71.85%	38 Missing ⚠️
vera_bench/models.py	52.38%	30 Missing ⚠️
vera_bench/metrics.py	96.49%	4 Missing ⚠️
vera_bench/report.py	97.01%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main       #3       +/-   ##
===========================================
+ Coverage   40.14%   59.97%   +19.82%     
===========================================
  Files           5        9        +4     
  Lines         269      707      +438     
===========================================
+ Hits          108      424      +316     
- Misses        161      283      +122

Flag	Coverage Δ
python	`59.97% <70.11%> (+19.82%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_runner.py`:
- Around line 109-124: Tests in TestCreateClient depend on external environment
state; make them deterministic by using pytest's monkeypatch to clear provider
API env vars before calling create_client. Update test_anthropic_prefix,
test_openai_prefix, and test_o1_prefix to accept a monkeypatch fixture and call
monkeypatch.delenv("ANTHROPIC_API_KEY", raising=False) /
monkeypatch.delenv("OPENAI_API_KEY", raising=False) /
monkeypatch.delenv("O1_API_KEY", raising=False) respectively (or the actual env
var names used by create_client), then run the existing with
pytest.raises((ImportError, EnvironmentError)): create_client(...) assertion so
the test no longer relies on CI secrets; keep the
test_unknown_raises_value_error unchanged.
- Around line 206-217: The test_jsonl_round_trip creates a temporary file with
tempfile.NamedTemporaryFile and calls path.unlink() after assertions, but that
cleanup won't run if an assertion fails; update the test to use pytest's
tmp_path fixture or a try/finally so the temp file is always removed.
Specifically, replace the tempfile.NamedTemporaryFile usage in
test_jsonl_round_trip (and the path variable) with tmp_path.joinpath / tmp_path
/ tmp_path fixture APIs to create/write the .jsonl file, or wrap the current
creation/assertion in try/finally to call path.unlink() in the finally block so
cleanup always occurs.

In `@vera_bench/cli.py`:
- Around line 143-144: The code is re-serialising ProblemResult objects via
to_jsonl() and json.loads(), which is wasteful; update compute_metrics (and its
callers like where compute_metrics is invoked before _print_metrics) to accept a
list of ProblemResult objects directly (or alternatively convert each
ProblemResult to a dict with dataclasses.asdict(result) or a dedicated to_dict()
method) and pass results returned from run_benchmark straight into
compute_metrics (replace json.loads(r.to_jsonl()) with either r or asdict(r));
adjust compute_metrics parameter type and internal handling to read fields from
ProblemResult instead of expecting pre-parsed dicts.
- Around line 62-67: The CLI accepts --max-tokens but it isn't forwarded to the
LLM call; update the call chain to thread max_tokens from the click handler into
run(), then into run_benchmark(), then into run_single_problem(), and finally
pass it to client.complete() (or the client's request payload) so the runtime
uses the user-specified value; update the function signatures for run(),
run_benchmark(), and run_single_problem() to accept a max_tokens:int (with
existing defaults preserved) and propagate that parameter when invoking
client.complete().

In `@vera_bench/metrics.py`:
- Around line 68-108: The logic that tallies check/verify/fix/run counts is
duplicated between compute_metrics and _compute_by_tier; extract it into a new
helper _compute_counts(by_problem: dict[str, list[dict]]) that returns the tuple
(check_pass_count, verify_pass_count, verify_eligible, fix_success,
fix_eligible, run_correct_count, run_eligible, total) using the exact selection
logic (attempt_1, attempt_2, best) shown in the diff, then replace the local
counting blocks in compute_metrics and _compute_by_tier to call _compute_counts
and map the returned values into their BenchmarkMetrics constructions (update
the arguments to _rate calls accordingly) so both functions reuse the single
implementation and remain consistent.

In `@vera_bench/models.py`:
- Around line 138-146: The code assumes choice.message is non-null when
computing text (choice.message.content), which can raise AttributeError; modify
the extraction to defensively check that choice and choice.message exist before
accessing .content (e.g., set text = choice.message.content if choice and
choice.message and choice.message.content else ""), update the logic around
response.choices and the LLMResponse construction (references: response.choices,
choice, choice.message, LLMResponse) so text falls back to an empty string when
message is None while preserving the existing usage and model fields.
- Around line 114-147: The complete method is passing timeout=timeout into
self._client.chat.completions.create which the OpenAI SDK 1.x does not accept;
remove the timeout kwarg from that call and instead either instantiate the
client with a timeout or call
self._client.with_options(timeout=timeout).chat.completions.create(...); update
the call site in complete (and any similar calls) to use
client.with_options(timeout=timeout).chat.completions.create(...) or ensure the
client was created with OpenAI(timeout=...) so you avoid the TypeError at
runtime.

In `@vera_bench/runner.py`:
- Around line 27-41: The regex _FENCE_RE used by extract_vera_code requires a
newline before the closing backticks so blocks like ```vera\ncode``` are missed;
update _FENCE_RE to allow an optional newline before the closing backticks (e.g.
make the pattern use \n? before ```), keep re.DOTALL, then ensure
extract_vera_code continues to pick the longest match and returns the stripped
code plus a terminating newline.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d75ec1e4-9724-4227-bafb-f731073ea039

📥 Commits

Reviewing files that changed from the base of the PR and between a6749ee and f847788.

📒 Files selected for processing (6)

tests/test_runner.py
vera_bench/cli.py
vera_bench/metrics.py
vera_bench/models.py
vera_bench/report.py
vera_bench/runner.py

Add prerequisites, step-by-step clone/venv/install, separate Vera compiler installation, and expanded CLI usage examples. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Around line 53-74: Add a new "Results summary table" section to README.md that
explains that running `vera-bench report results/` produces `results/summary.md`
and show the expected per-model columns (Model, check_rate, verify_rate,
fix_rate, run_correct_rate, wall_time_s) with a small example row; place this
section near the usage/CLI examples so it satisfies the README requirement to
document the results summary and reference `summary.md` as the source of the
table.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d0dd6b9b-5e98-4088-9105-3123a83bed0b

📥 Commits

Reviewing files that changed from the base of the PR and between f847788 and 3c4647c.

📒 Files selected for processing (1)

README.md

The harness finds vera via shutil.which(), so it can be installed from any location. Show both local clone and direct-from-GitHub options. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

♻️ Duplicate comments (1)

README.md (1)
57-78: ⚠️ Potential issue | 🟠 Major

Add an explicit results summary table section (still missing).

The README now covers installation and CLI usage well, but it still does not document a concrete results summary table (columns/example) or clearly tie it to results/summary.md output from vera-bench report. This is a required README element.
Suggested patch
 ## Quick start
@@
 # Generate a report from results
 vera-bench report results/
+## Results summary table
+
+Running:
+
+bash +vera-bench report results/ +
+
+writes results/summary.md, including a per-model summary table. Typical columns:
+
+| Model | check_rate | verify_rate | fix_rate | run_correct_rate | wall_time_s |
+|------|------------:|------------:|---------:|-----------------:|------------:|
+| claude-sonnet-4-20250514 | ... | ... | ... | ... | ... |
</details>

As per coding guidelines, `README.md` must document installation, CLI usage, problem structure, metric definitions, results summary table, and citation information.

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.

In @README.md around lines 57 - 78, The README is missing a "Results summary
table" section; add a short subsection explaining that running the CLI command
vera-bench report results/ writes results/summary.md and include a concrete
example table (per-model summary) with the typical columns used by the reporter
(e.g., Model, check_rate, verify_rate, fix_rate, run_correct_rate, wall_time_s)
and an example row (e.g., claude-sonnet-4-20250514 | ... | ... | ... | ... |
...), and mention the file name results/summary.md so readers can correlate the
CLI output to the documented table.
</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @README.md:

Around line 57-78: The README is missing a "Results summary table" section;
add a short subsection explaining that running the CLI command vera-bench report
results/ writes results/summary.md and include a concrete example table
(per-model summary) with the typical columns used by the reporter (e.g., Model,
check_rate, verify_rate, fix_rate, run_correct_rate, wall_time_s) and an example
row (e.g., claude-sonnet-4-20250514 | ... | ... | ... | ... | ...), and mention
the file name results/summary.md so readers can correlate the CLI output to the
documented table.
</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Path: .coderabbit.yaml

**Review profile**: ASSERTIVE

**Plan**: Pro

**Run ID**: `97734e42-572f-482d-913a-243cb3b85b00`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 3c4647cc9ac7293745ab6c48581a8d7538795dac and 8874109e4508291b398ee2f7b625e8b3206585a7.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `README.md`

</details>

</details>

Bugs fixed: - Thread --max-tokens through CLI -> run_benchmark -> run_single_problem -> client.complete() (was accepted but silently ignored) - OpenAI: use client.with_options(timeout=) instead of passing timeout kwarg to create() (not supported in SDK 1.x) - OpenAI: defensive null check on choice.message before accessing .content - Fence regex: allow optional trailing newline before closing backticks Tests hardened: - monkeypatch env vars in create_client tests for determinism - Use tmp_path fixture for JSONL round-trip (cleanup on assertion failure) README: - Add Results section documenting summary.md output format Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Two of CR's three outside-diff findings on the latest review: 1. `_ailang_literal(value) -> str` was missing the parameter type hint on `value`. One-character fix matching the project's "type hints everywhere" rule from CLAUDE.md. The sibling `_aver_literal` has the same gap and predates this PR — that's a "do next time we touch the Aver path" mental note rather than scope-creep here. 2. Per-test subprocess failures in `_evaluate_aver_code` and `_evaluate_ailang_code` silently `continue` without capturing stderr — unlike the Python/TypeScript evaluators which record stderr into `ProblemResult.error_message`. Filed as aallan#72 with a shared-helper refactor proposal that fixes Aver and AILANG consistently. Roadmap'd under Milestone 1; not blocking this PR. The third outside-diff finding (`AILANG_RESULTS.md:74` version pin inconsistency) becomes moot once the file is removed per ask aallan#3 in the consolidated review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai Bot reviewed Mar 29, 2026

View reviewed changes

Improve README with full installation instructions

3c4647c

Add prerequisites, step-by-step clone/venv/install, separate Vera compiler installation, and expanded CLI usage examples. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed Mar 29, 2026

View reviewed changes

Comment thread README.md

Clarify Vera compiler install — PATH-based, not path-relative

8874109

The harness finds vera via shutil.which(), so it can be installed from any location. Show both local clone and direct-from-GitHub options. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed Mar 29, 2026

View reviewed changes

aallan merged commit 36265db into main Mar 29, 2026
8 checks passed

aallan mentioned this pull request Mar 29, 2026

Benchmark suite for LLM code generation aallan/vera#225

Open

aallan deleted the feature/llm-runner branch March 30, 2026 15:51

This was referenced Mar 31, 2026

Include bench and vera versions in filenames and JSONL records (#20) #35

Merged

Increase test coverage to 83%, version in filenames (v0.0.6) #36

Merged

This was referenced Apr 7, 2026

Moonshot provider support + full benchmark script (v0.0.7) #38

Merged

Add Aver language support + language-neutral problem descriptions #48

Merged

Report T1-T4 aggregate separately for cross-language comparison #56

Merged

coderabbitai Bot mentioned this pull request Apr 17, 2026

docs: document all scripts in scripts/README.md; make plot_results data-driven #59

Merged

5 tasks

coderabbitai Bot mentioned this pull request May 21, 2026

Add AILANG as a baseline target language #70

Merged

coderabbitai Bot mentioned this pull request May 25, 2026

Bump version to 0.0.12 + documentation consistency pass #74

Merged

5 tasks

Conversation

aallan commented Mar 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New CLI usage

Key design decisions

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

codecov-commenter commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aallan commented Mar 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 29, 2026 •

edited

Loading

codecov-commenter commented Mar 29, 2026 •

edited

Loading