Strengthen problem descriptions and postconditions (v0.0.5)#32
Conversation
Issue #13: Explicit De Bruijn slot ordering in descriptions - VB-T4-002 (GCD): description now states @Nat.0=b, @Nat.1=a - VB-T4-004 (power): description now states @Nat.0=exp, @Nat.1=base - VB-T5-003 (safe_div): description now states slot mapping for both safe_div and checked_div Issue #14: Stronger postconditions that catch logic bugs - VB-T4-002 (GCD): added @Nat.result <= @Nat.1 || @Nat.0 > 0 - VB-T4-005 (sum_to_n): added @Nat.result >= @Nat.0 - VB-T4-008 (multiply): added @Nat.result == @Nat.1 * @Nat.0 - VB-T4-010 (div_natural): added @Nat.result * @Nat.0 <= @Nat.1 - VB-T5-001 (counter): ensures(true) -> ensures(@Int.result == 3) - VB-T5-006 (state_double): ensures(true) -> @Int.result == @Int.0 * 2 - VB-T5-009 (state_max): ensures(true) -> @Int.result == @Nat.0 Tier 4 postconditions all verify at Tier 1 (Z3-proved). Tier 5 postconditions fall to Tier 3 (runtime) due to effect handlers. All 50/50 problems pass validation. 322 tests passing. Closes #13. Closes #14. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: ⛔ Files ignored due to path filters (2)
📒 Files selected for processing (3)
📝 WalkthroughWalkthroughVersion 0.0.5: release metadata and docs updated; several problem JSON descriptions clarified to explicitly map parameters to De Bruijn slots; multiple problem contracts strengthened from vacuous/non‑negative postconditions to specific functional equalities/inequalities; runtime SKILL.md fetching noted. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #32 +/- ##
=======================================
Coverage 66.29% 66.29%
=======================================
Files 10 10
Lines 1068 1068
=======================================
Hits 708 708
Misses 360 360
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@problems/tier4/VB_T4_002_greatest_common_divisor.json`:
- Around line 12-13: The postcondition is too weak because "@Nat.result <=
`@Nat.1` || `@Nat.0` > 0" is vacuous when `@Nat.0` > 0; replace it with a proper gcd
specification: require that `@Nat.result` divides both inputs and that it is the
greatest such divisor. Concretely, change the second clause to assert
"(`@Nat.result` | `@Nat.0`) && (`@Nat.result` | `@Nat.1`) && forall k. ((k | `@Nat.0`) &&
(k | `@Nat.1`) -> k <= `@Nat.result`)", keeping the existing non-negativity
"@Nat.result >= 0". Use the same `@Nat.result`, `@Nat.0`, `@Nat.1` symbols so the
verifier checks both divisibility and maximality.
In `@problems/tier4/VB_T4_005_sum_to_n.json`:
- Line 9: The postcondition is too weak—replace the current ensures (which only
asserts "@Nat.result >= 0" and "@Nat.result >= `@Nat.0`") with a precise
specification that the result equals the sum 0..n (e.g., assert "@Nat.result ==
(n * (n + 1)) / 2" or an equivalent quantified/formal sum over the range) so
verification will reject trivial returns like "@Nat.0"; update the ensures
clause (referencing "@Nat.result" and the input Nat parameter) to express exact
equality to the mathematical sum rather than just non-negativity or a lower
bound.
In `@problems/tier4/VB_T4_010_div_natural.json`:
- Around line 12-14: The current ensures allow returning 0 because only the
lower bound (`@Nat.result` * `@Nat.0` <= `@Nat.1`) is asserted; add the complementary
upper (floor-division) bound so the quotient is pinned: add an ensures clause
asserting ((`@Nat.result` + 1) * `@Nat.0` > `@Nat.1`) referencing the same symbols
(`@Nat.result`, `@Nat.0`, `@Nat.1`) so the result is the floor(dividend/divisor).
In `@problems/tier5/VB_T5_003_safe_division_exceptions.json`:
- Line 5: Update the problem description so the exception value is explicit:
replace the ambiguous “throws 0-1” with “throws -1” so it matches the expected
thrown value used by the Exn<Int> handler; specifically edit the description
mentioning Exn<E>, throw, checked_div and safe_div (and the `@Int.0/`@Int.1
parameter notes) to state that checked_div throws -1 when the divisor is zero.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: e25b3ba7-9e3c-4b58-89c0-66165b8510ed
⛔ Files ignored due to path filters (7)
solutions/vera/VB-T4-002_gcd.verais excluded by!**/*.verasolutions/vera/VB-T4-005_sum_to_n.verais excluded by!**/*.verasolutions/vera/VB-T4-008_multiply.verais excluded by!**/*.verasolutions/vera/VB-T4-010_div_natural.verais excluded by!**/*.verasolutions/vera/VB-T5-001_counter.verais excluded by!**/*.verasolutions/vera/VB-T5-006_state_double.verais excluded by!**/*.verasolutions/vera/VB-T5-009_state_max.verais excluded by!**/*.vera
📒 Files selected for processing (12)
CHANGELOG.mdROADMAP.mdproblems/tier4/VB_T4_002_greatest_common_divisor.jsonproblems/tier4/VB_T4_004_power.jsonproblems/tier4/VB_T4_005_sum_to_n.jsonproblems/tier4/VB_T4_008_multiply.jsonproblems/tier4/VB_T4_010_div_natural.jsonproblems/tier5/VB_T5_001_counter.jsonproblems/tier5/VB_T5_003_safe_division_exceptions.jsonproblems/tier5/VB_T5_006_state_double.jsonproblems/tier5/VB_T5_009_state_max.jsonpyproject.toml
- sum_to_n: exact formula @Nat.result == @Nat.0 * (@Nat.0 + 1) / 2 (Z3 proves at Tier 1 — stronger than the >= @Nat.0 lower bound) - div_natural: add upper bound (@Nat.result + 1) * @Nat.0 > @Nat.1 (pins result to exact floor division, Z3 proves at Tier 1) - safe_div description: 'throws 0-1' -> 'throws -1 (written as 0 - 1 in Vera, which has no unary minus)' for clarity GCD postcondition kept as-is: Z3 cannot prove divisibility properties for recursive Euclidean algorithm at Tier 1 (tested and confirmed). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Addresses the #1 failure mode in the benchmark: De Bruijn slot ordering confusion and weak postconditions that don't catch logic bugs.
Issue #13: Slot ordering in descriptions
The
descriptionfield (which the LLM sees in the prompt) now explicitly states which@Type.Nmaps to which parameter:@Nat.0 is b (second/rightmost), @Nat.1 is a (first)@Nat.0 is the exponent (rightmost), @Nat.1 is the basesafe_divandchecked_divIssue #14: Stronger postconditions
>= 0result <= a || b > 0>= 0result >= n>= 0result == a * b>= 0result * b <= atrueresult == 3trueresult == input * 2trueresult == nTier 4 postconditions are all Z3-proved (Tier 1). Tier 5 fall to runtime (Tier 3) due to effect handlers — expected.
Test plan
vera verifypasses on all 7 modified .vera filesCloses #13. Closes #14.
Generated with Claude Code
Summary by CodeRabbit
Documentation
Chores