Strengthen problem descriptions and postconditions (v0.0.5) by aallan · Pull Request #32 · aallan/vera-bench

aallan · 2026-03-30T20:42:58Z

Summary

Addresses the #1 failure mode in the benchmark: De Bruijn slot ordering confusion and weak postconditions that don't catch logic bugs.

Issue #13: Slot ordering in descriptions

The description field (which the LLM sees in the prompt) now explicitly states which @Type.N maps to which parameter:

VB-T4-002 (GCD): @Nat.0 is b (second/rightmost), @Nat.1 is a (first)
VB-T4-004 (power): @Nat.0 is the exponent (rightmost), @Nat.1 is the base
VB-T5-003 (safe_div): Slot mapping for both safe_div and checked_div

Issue #14: Stronger postconditions

Problem	Old ensures	New ensures	Tier
GCD	`>= 0`	+ `result <= a \|\| b > 0`	T1
sum_to_n	`>= 0`	+ `result >= n`	T1
multiply	`>= 0`	+ `result == a * b`	T1
div_natural	`>= 0`	+ `result * b <= a`	T1
counter	`true`	`result == 3`	T3
state_double	`true`	`result == input * 2`	T3
state_max	`true`	`result == n`	T3

Tier 4 postconditions are all Z3-proved (Tier 1). Tier 5 fall to runtime (Tier 3) due to effect handlers — expected.

Test plan

vera verify passes on all 7 modified .vera files
50/50 validation passing
322 tests pass
Ruff clean

Closes #13. Closes #14.

Generated with Claude Code

Summary by CodeRabbit

Documentation
- Problem descriptions clarified with explicit parameter mappings and strengthened postconditions to make expected results and termination behaviour explicit
- Runtime documentation source now fetched from veralang.dev
- Compare links and changelog updated for the new release
Chores
- Project version bumped to 0.0.5
- Roadmap milestone items marked complete

Issue #13: Explicit De Bruijn slot ordering in descriptions - VB-T4-002 (GCD): description now states @Nat.0=b, @Nat.1=a - VB-T4-004 (power): description now states @Nat.0=exp, @Nat.1=base - VB-T5-003 (safe_div): description now states slot mapping for both safe_div and checked_div Issue #14: Stronger postconditions that catch logic bugs - VB-T4-002 (GCD): added @Nat.result <= @Nat.1 || @Nat.0 > 0 - VB-T4-005 (sum_to_n): added @Nat.result >= @Nat.0 - VB-T4-008 (multiply): added @Nat.result == @Nat.1 * @Nat.0 - VB-T4-010 (div_natural): added @Nat.result * @Nat.0 <= @Nat.1 - VB-T5-001 (counter): ensures(true) -> ensures(@Int.result == 3) - VB-T5-006 (state_double): ensures(true) -> @Int.result == @Int.0 * 2 - VB-T5-009 (state_max): ensures(true) -> @Int.result == @Nat.0 Tier 4 postconditions all verify at Tier 1 (Z3-proved). Tier 5 postconditions fall to Tier 3 (runtime) due to effect handlers. All 50/50 problems pass validation. 322 tests passing. Closes #13. Closes #14. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-03-30T20:43:14Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: bac7c3bb-20bb-4a0e-9fca-6437b90a8a84

📥 Commits

Reviewing files that changed from the base of the PR and between e401609 and 1cce34b.

⛔ Files ignored due to path filters (2)

solutions/vera/VB-T4-005_sum_to_n.vera is excluded by !**/*.vera
solutions/vera/VB-T4-010_div_natural.vera is excluded by !**/*.vera

📒 Files selected for processing (3)

problems/tier4/VB_T4_005_sum_to_n.json
problems/tier4/VB_T4_010_div_natural.json
problems/tier5/VB_T5_003_safe_division_exceptions.json

📝 Walkthrough

Walkthrough

Version 0.0.5: release metadata and docs updated; several problem JSON descriptions clarified to explicitly map parameters to De Bruijn slots; multiple problem contracts strengthened from vacuous/non‑negative postconditions to specific functional equalities/inequalities; runtime SKILL.md fetching noted.

Changes

Cohort / File(s)	Summary
Version & docs `CHANGELOG.md`, `ROADMAP.md`, `pyproject.toml`	Bumped release to 0.0.5; updated changelog and roadmap entries; added/updated GitHub compare links; `pyproject.toml` project version incremented.
De Bruijn slot mapping descriptions `problems/tier4/VB_T4_002_greatest_common_divisor.json`, `problems/tier4/VB_T4_004_power.json`, `problems/tier5/VB_T5_003_safe_division_exceptions.json`	Problem `description` strings clarified with explicit parameter→slot mappings (e.g., which `@Nat`/`@Int` index corresponds to each named parameter) and termination/argument-threading notes. No signatures changed.
Strengthened arithmetic postconditions `problems/tier4/VB_T4_005_sum_to_n.json`, `problems/tier4/VB_T4_008_multiply.json`, `problems/tier4/VB_T4_010_div_natural.json`	`contracts.ensures` tightened to express concrete relations (closed‑form sum, multiplication equality, quotient bounds/strictness). Some JSON formatting expanded; test values unchanged.
Strengthened state/exception postconditions & formatting `problems/tier5/VB_T5_001_counter.json`, `problems/tier5/VB_T5_006_state_double.json`, `problems/tier5/VB_T5_009_state_max.json`	Vacuous `"true"` ensures replaced with explicit equalities (e.g., `@Int.result == 3`, `@Int.result ==` @Int.0 `* 2`, `@Int.result ==` @Nat.0``). JSON formatting and verification flags adjusted; semantics/signatures unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

problems

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and accurately summarises the main change: strengthening problem descriptions and postconditions across multiple problems, with version bump to v0.0.5.
Linked Issues check	✅ Passed	The PR fully addresses both linked issues: Issue `#13` requirements are met (descriptions for VB-T4-002, VB-T4-004, VB-T5-003 now explicitly state De Bruijn slot mappings); Issue `#14` requirements are met (postconditions strengthened across 7 problems with concrete constraints to catch logic bugs).
Out of Scope Changes check	✅ Passed	All changes are in scope. Modifications are limited to problem JSON metadata (descriptions, postconditions), supporting documentation (CHANGELOG, ROADMAP), and version bumps. No unrelated code or scope-creep changes present.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/strengthen-problems

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-03-30T20:43:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 66.29%. Comparing base (81e2f12) to head (1cce34b).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #32   +/-   ##
=======================================
  Coverage   66.29%   66.29%           
=======================================
  Files          10       10           
  Lines        1068     1068           
=======================================
  Hits          708      708           
  Misses        360      360

Flag	Coverage Δ
python	`66.29% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@problems/tier4/VB_T4_002_greatest_common_divisor.json`:
- Around line 12-13: The postcondition is too weak because "@Nat.result <=
`@Nat.1` || `@Nat.0` > 0" is vacuous when `@Nat.0` > 0; replace it with a proper gcd
specification: require that `@Nat.result` divides both inputs and that it is the
greatest such divisor. Concretely, change the second clause to assert
"(`@Nat.result` | `@Nat.0`) && (`@Nat.result` | `@Nat.1`) && forall k. ((k | `@Nat.0`) &&
(k | `@Nat.1`) -> k <= `@Nat.result`)", keeping the existing non-negativity
"@Nat.result >= 0". Use the same `@Nat.result`, `@Nat.0`, `@Nat.1` symbols so the
verifier checks both divisibility and maximality.

In `@problems/tier4/VB_T4_005_sum_to_n.json`:
- Line 9: The postcondition is too weak—replace the current ensures (which only
asserts "@Nat.result >= 0" and "@Nat.result >= `@Nat.0`") with a precise
specification that the result equals the sum 0..n (e.g., assert "@Nat.result ==
(n * (n + 1)) / 2" or an equivalent quantified/formal sum over the range) so
verification will reject trivial returns like "@Nat.0"; update the ensures
clause (referencing "@Nat.result" and the input Nat parameter) to express exact
equality to the mathematical sum rather than just non-negativity or a lower
bound.

In `@problems/tier4/VB_T4_010_div_natural.json`:
- Around line 12-14: The current ensures allow returning 0 because only the
lower bound (`@Nat.result` * `@Nat.0` <= `@Nat.1`) is asserted; add the complementary
upper (floor-division) bound so the quotient is pinned: add an ensures clause
asserting ((`@Nat.result` + 1) * `@Nat.0` > `@Nat.1`) referencing the same symbols
(`@Nat.result`, `@Nat.0`, `@Nat.1`) so the result is the floor(dividend/divisor).

In `@problems/tier5/VB_T5_003_safe_division_exceptions.json`:
- Line 5: Update the problem description so the exception value is explicit:
replace the ambiguous “throws 0-1” with “throws -1” so it matches the expected
thrown value used by the Exn<Int> handler; specifically edit the description
mentioning Exn<E>, throw, checked_div and safe_div (and the `@Int.0/`@Int.1
parameter notes) to state that checked_div throws -1 when the divisor is zero.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e25b3ba7-9e3c-4b58-89c0-66165b8510ed

📥 Commits

Reviewing files that changed from the base of the PR and between 81e2f12 and e401609.

⛔ Files ignored due to path filters (7)

solutions/vera/VB-T4-002_gcd.vera is excluded by !**/*.vera
solutions/vera/VB-T4-005_sum_to_n.vera is excluded by !**/*.vera
solutions/vera/VB-T4-008_multiply.vera is excluded by !**/*.vera
solutions/vera/VB-T4-010_div_natural.vera is excluded by !**/*.vera
solutions/vera/VB-T5-001_counter.vera is excluded by !**/*.vera
solutions/vera/VB-T5-006_state_double.vera is excluded by !**/*.vera
solutions/vera/VB-T5-009_state_max.vera is excluded by !**/*.vera

📒 Files selected for processing (12)

CHANGELOG.md
ROADMAP.md
problems/tier4/VB_T4_002_greatest_common_divisor.json
problems/tier4/VB_T4_004_power.json
problems/tier4/VB_T4_005_sum_to_n.json
problems/tier4/VB_T4_008_multiply.json
problems/tier4/VB_T4_010_div_natural.json
problems/tier5/VB_T5_001_counter.json
problems/tier5/VB_T5_003_safe_division_exceptions.json
problems/tier5/VB_T5_006_state_double.json
problems/tier5/VB_T5_009_state_max.json
pyproject.toml

- sum_to_n: exact formula @Nat.result == @Nat.0 * (@Nat.0 + 1) / 2 (Z3 proves at Tier 1 — stronger than the >= @Nat.0 lower bound) - div_natural: add upper bound (@Nat.result + 1) * @Nat.0 > @Nat.1 (pins result to exact floor division, Z3 proves at Tier 1) - safe_div description: 'throws 0-1' -> 'throws -1 (written as 0 - 1 in Vera, which has no unary minus)' for clarity GCD postcondition kept as-is: Z3 cannot prove divisibility properties for recursive Euclidean algorithm at Tier 1 (tested and confirmed). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed Mar 30, 2026

View reviewed changes

Comment thread problems/tier4/VB_T4_002_greatest_common_divisor.json

Comment thread problems/tier4/VB_T4_005_sum_to_n.json Outdated

Comment thread problems/tier4/VB_T4_010_div_natural.json

Comment thread problems/tier5/VB_T5_003_safe_division_exceptions.json Outdated

aallan merged commit d01004f into main Mar 30, 2026
10 checks passed

aallan deleted the fix/strengthen-problems branch March 30, 2026 21:19

coderabbitai Bot mentioned this pull request Apr 11, 2026

Add Aver language support + language-neutral problem descriptions #48

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strengthen problem descriptions and postconditions (v0.0.5)#32

Strengthen problem descriptions and postconditions (v0.0.5)#32
aallan merged 2 commits into
mainfrom
fix/strengthen-problems

aallan commented Mar 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 30, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested labels

Uh oh!

codecov Bot commented Mar 30, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aallan commented Mar 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Issue #13: Slot ordering in descriptions

Issue #14: Stronger postconditions

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Uh oh!

codecov Bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aallan commented Mar 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 30, 2026 •

edited

Loading

codecov Bot commented Mar 30, 2026 •

edited

Loading