You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bump version to 0.0.11 for the Aver 0.16 Console=String migration
The Aver test-wrapper harness and 56 canonical baselines now emit
string interpolation (`Console.print("{x}")`) instead of bare
expressions — required for Aver 0.16's typed `Console.print`,
backward-compatible to Aver 0.10. Plus three restored T2 baselines
and 9 coverage-gap fixes (see #65).
This is a methodology change for Aver scoring with two distinct
flavours of compatibility impact:
- On Aver 0.16+: required. Without this, every injected `aver run`
crashes at typecheck and `run_correct = 0%`.
- On Aver 0.10–0.15: scoring may differ slightly between v0.0.10 and
v0.0.11 result files, because the 9 coverage-gap fixes mean
`run_correct` is now measured against the full test_cases set
rather than the partial set the baseline `main()` happened to
print, and the 3 restored T2 baselines now contribute to the Aver
baseline `run_correct` denominator.
Aver baseline rises to 100% check@1 / 100% run_correct against
Aver 0.15.2 (previously 95%/73% on the same compiler). Vera, Vera
spec-from-NL, Python, and TypeScript scoring is unaffected.
Files touched:
- pyproject.toml: 0.0.10 -> 0.0.11
- CITATION.cff: version + date-released bumped together
- CHANGELOG.md: new [0.0.11] section with Compatibility note
documenting both flavours of impact; link references updated
- ROADMAP.md: prepended a v0.0.11 line above the v0.0.10 summary
Verified: `pip install -e .` followed by
`python -c "from importlib.metadata import version; print(version('vera-bench'))"`
reports 0.0.11, full test suite green at 494 cases, Aver baselines
100%/100% locally on Aver 0.15.2.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: ROADMAP.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,8 @@
2
2
3
3
## Where we are
4
4
5
+
**v0.0.11** — Aver test-wrapper and 56 canonical baselines migrated to string interpolation (`Console.print("{x}")`) for compatibility with Aver 0.16's typed `Console.print`. Three previously-removed Aver baselines (T2-011/012/013) restored using Aver 0.15+ stdlib. Coverage-gap fix in 9 baselines whose `main()` printed only a subset of test cases. Methodology change documented in CHANGELOG.
6
+
5
7
**v0.0.10** — Aver evaluation harness strips module-header `effects [...]` declarations before injecting the test main, so canonical and LLM-generated solutions continue to compile under Aver 0.13's enforced effects boundary. No-op on Aver 0.12 and earlier; methodology change documented in CHANGELOG.
6
8
7
9
**v0.0.9** — 60 problems across 5 tiers (10 new T2/T3 problems with testable signatures). T1–T4 `run_correct` pool expanded from 18 to 30 testable problems. New T3 problems use Int-only signatures with internal ADT construction for CLI testability.
0 commit comments