Skip to content

Strengthen problem descriptions and postconditions (v0.0.5)#32

Merged
aallan merged 2 commits into
mainfrom
fix/strengthen-problems
Mar 30, 2026
Merged

Strengthen problem descriptions and postconditions (v0.0.5)#32
aallan merged 2 commits into
mainfrom
fix/strengthen-problems

Conversation

@aallan

@aallan aallan commented Mar 30, 2026

Copy link
Copy Markdown
Owner

Summary

Addresses the #1 failure mode in the benchmark: De Bruijn slot ordering confusion and weak postconditions that don't catch logic bugs.

Issue #13: Slot ordering in descriptions

The description field (which the LLM sees in the prompt) now explicitly states which @Type.N maps to which parameter:

  • VB-T4-002 (GCD): @Nat.0 is b (second/rightmost), @Nat.1 is a (first)
  • VB-T4-004 (power): @Nat.0 is the exponent (rightmost), @Nat.1 is the base
  • VB-T5-003 (safe_div): Slot mapping for both safe_div and checked_div

Issue #14: Stronger postconditions

Problem Old ensures New ensures Tier
GCD >= 0 + result <= a || b > 0 T1
sum_to_n >= 0 + result >= n T1
multiply >= 0 + result == a * b T1
div_natural >= 0 + result * b <= a T1
counter true result == 3 T3
state_double true result == input * 2 T3
state_max true result == n T3

Tier 4 postconditions are all Z3-proved (Tier 1). Tier 5 fall to runtime (Tier 3) due to effect handlers — expected.

Test plan

  • vera verify passes on all 7 modified .vera files
  • 50/50 validation passing
  • 322 tests pass
  • Ruff clean

Closes #13. Closes #14.

Generated with Claude Code

Summary by CodeRabbit

  • Documentation

    • Problem descriptions clarified with explicit parameter mappings and strengthened postconditions to make expected results and termination behaviour explicit
    • Runtime documentation source now fetched from veralang.dev
    • Compare links and changelog updated for the new release
  • Chores

    • Project version bumped to 0.0.5
    • Roadmap milestone items marked complete

Issue #13: Explicit De Bruijn slot ordering in descriptions
- VB-T4-002 (GCD): description now states @Nat.0=b, @Nat.1=a
- VB-T4-004 (power): description now states @Nat.0=exp, @Nat.1=base
- VB-T5-003 (safe_div): description now states slot mapping for both
  safe_div and checked_div

Issue #14: Stronger postconditions that catch logic bugs
- VB-T4-002 (GCD): added @Nat.result <= @Nat.1 || @Nat.0 > 0
- VB-T4-005 (sum_to_n): added @Nat.result >= @Nat.0
- VB-T4-008 (multiply): added @Nat.result == @Nat.1 * @Nat.0
- VB-T4-010 (div_natural): added @Nat.result * @Nat.0 <= @Nat.1
- VB-T5-001 (counter): ensures(true) -> ensures(@Int.result == 3)
- VB-T5-006 (state_double): ensures(true) -> @Int.result == @Int.0 * 2
- VB-T5-009 (state_max): ensures(true) -> @Int.result == @Nat.0

Tier 4 postconditions all verify at Tier 1 (Z3-proved).
Tier 5 postconditions fall to Tier 3 (runtime) due to effect handlers.
All 50/50 problems pass validation. 322 tests passing.

Closes #13. Closes #14.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Mar 30, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: bac7c3bb-20bb-4a0e-9fca-6437b90a8a84

📥 Commits

Reviewing files that changed from the base of the PR and between e401609 and 1cce34b.

⛔ Files ignored due to path filters (2)
  • solutions/vera/VB-T4-005_sum_to_n.vera is excluded by !**/*.vera
  • solutions/vera/VB-T4-010_div_natural.vera is excluded by !**/*.vera
📒 Files selected for processing (3)
  • problems/tier4/VB_T4_005_sum_to_n.json
  • problems/tier4/VB_T4_010_div_natural.json
  • problems/tier5/VB_T5_003_safe_division_exceptions.json

📝 Walkthrough

Walkthrough

Version 0.0.5: release metadata and docs updated; several problem JSON descriptions clarified to explicitly map parameters to De Bruijn slots; multiple problem contracts strengthened from vacuous/non‑negative postconditions to specific functional equalities/inequalities; runtime SKILL.md fetching noted.

Changes

Cohort / File(s) Summary
Version & docs
CHANGELOG.md, ROADMAP.md, pyproject.toml
Bumped release to 0.0.5; updated changelog and roadmap entries; added/updated GitHub compare links; pyproject.toml project version incremented.
De Bruijn slot mapping descriptions
problems/tier4/VB_T4_002_greatest_common_divisor.json, problems/tier4/VB_T4_004_power.json, problems/tier5/VB_T5_003_safe_division_exceptions.json
Problem description strings clarified with explicit parameter→slot mappings (e.g., which @Nat/@Int index corresponds to each named parameter) and termination/argument-threading notes. No signatures changed.
Strengthened arithmetic postconditions
problems/tier4/VB_T4_005_sum_to_n.json, problems/tier4/VB_T4_008_multiply.json, problems/tier4/VB_T4_010_div_natural.json
contracts.ensures tightened to express concrete relations (closed‑form sum, multiplication equality, quotient bounds/strictness). Some JSON formatting expanded; test values unchanged.
Strengthened state/exception postconditions & formatting
problems/tier5/VB_T5_001_counter.json, problems/tier5/VB_T5_006_state_double.json, problems/tier5/VB_T5_009_state_max.json
Vacuous "true" ensures replaced with explicit equalities (e.g., @Int.result == 3, @Int.result == @Int.0 * 2, @Int.result == @Nat.0``). JSON formatting and verification flags adjusted; semantics/signatures unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

problems

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarises the main change: strengthening problem descriptions and postconditions across multiple problems, with version bump to v0.0.5.
Linked Issues check ✅ Passed The PR fully addresses both linked issues: Issue #13 requirements are met (descriptions for VB-T4-002, VB-T4-004, VB-T5-003 now explicitly state De Bruijn slot mappings); Issue #14 requirements are met (postconditions strengthened across 7 problems with concrete constraints to catch logic bugs).
Out of Scope Changes check ✅ Passed All changes are in scope. Modifications are limited to problem JSON metadata (descriptions, postconditions), supporting documentation (CHANGELOG, ROADMAP), and version bumps. No unrelated code or scope-creep changes present.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/strengthen-problems

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented Mar 30, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 66.29%. Comparing base (81e2f12) to head (1cce34b).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #32   +/-   ##
=======================================
  Coverage   66.29%   66.29%           
=======================================
  Files          10       10           
  Lines        1068     1068           
=======================================
  Hits          708      708           
  Misses        360      360           
Flag Coverage Δ
python 66.29% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@problems/tier4/VB_T4_002_greatest_common_divisor.json`:
- Around line 12-13: The postcondition is too weak because "@Nat.result <=
`@Nat.1` || `@Nat.0` > 0" is vacuous when `@Nat.0` > 0; replace it with a proper gcd
specification: require that `@Nat.result` divides both inputs and that it is the
greatest such divisor. Concretely, change the second clause to assert
"(`@Nat.result` | `@Nat.0`) && (`@Nat.result` | `@Nat.1`) && forall k. ((k | `@Nat.0`) &&
(k | `@Nat.1`) -> k <= `@Nat.result`)", keeping the existing non-negativity
"@Nat.result >= 0". Use the same `@Nat.result`, `@Nat.0`, `@Nat.1` symbols so the
verifier checks both divisibility and maximality.

In `@problems/tier4/VB_T4_005_sum_to_n.json`:
- Line 9: The postcondition is too weak—replace the current ensures (which only
asserts "@Nat.result >= 0" and "@Nat.result >= `@Nat.0`") with a precise
specification that the result equals the sum 0..n (e.g., assert "@Nat.result ==
(n * (n + 1)) / 2" or an equivalent quantified/formal sum over the range) so
verification will reject trivial returns like "@Nat.0"; update the ensures
clause (referencing "@Nat.result" and the input Nat parameter) to express exact
equality to the mathematical sum rather than just non-negativity or a lower
bound.

In `@problems/tier4/VB_T4_010_div_natural.json`:
- Around line 12-14: The current ensures allow returning 0 because only the
lower bound (`@Nat.result` * `@Nat.0` <= `@Nat.1`) is asserted; add the complementary
upper (floor-division) bound so the quotient is pinned: add an ensures clause
asserting ((`@Nat.result` + 1) * `@Nat.0` > `@Nat.1`) referencing the same symbols
(`@Nat.result`, `@Nat.0`, `@Nat.1`) so the result is the floor(dividend/divisor).

In `@problems/tier5/VB_T5_003_safe_division_exceptions.json`:
- Line 5: Update the problem description so the exception value is explicit:
replace the ambiguous “throws 0-1” with “throws -1” so it matches the expected
thrown value used by the Exn<Int> handler; specifically edit the description
mentioning Exn<E>, throw, checked_div and safe_div (and the `@Int.0/`@Int.1
parameter notes) to state that checked_div throws -1 when the divisor is zero.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e25b3ba7-9e3c-4b58-89c0-66165b8510ed

📥 Commits

Reviewing files that changed from the base of the PR and between 81e2f12 and e401609.

⛔ Files ignored due to path filters (7)
  • solutions/vera/VB-T4-002_gcd.vera is excluded by !**/*.vera
  • solutions/vera/VB-T4-005_sum_to_n.vera is excluded by !**/*.vera
  • solutions/vera/VB-T4-008_multiply.vera is excluded by !**/*.vera
  • solutions/vera/VB-T4-010_div_natural.vera is excluded by !**/*.vera
  • solutions/vera/VB-T5-001_counter.vera is excluded by !**/*.vera
  • solutions/vera/VB-T5-006_state_double.vera is excluded by !**/*.vera
  • solutions/vera/VB-T5-009_state_max.vera is excluded by !**/*.vera
📒 Files selected for processing (12)
  • CHANGELOG.md
  • ROADMAP.md
  • problems/tier4/VB_T4_002_greatest_common_divisor.json
  • problems/tier4/VB_T4_004_power.json
  • problems/tier4/VB_T4_005_sum_to_n.json
  • problems/tier4/VB_T4_008_multiply.json
  • problems/tier4/VB_T4_010_div_natural.json
  • problems/tier5/VB_T5_001_counter.json
  • problems/tier5/VB_T5_003_safe_division_exceptions.json
  • problems/tier5/VB_T5_006_state_double.json
  • problems/tier5/VB_T5_009_state_max.json
  • pyproject.toml

Comment thread problems/tier4/VB_T4_002_greatest_common_divisor.json
Comment thread problems/tier4/VB_T4_005_sum_to_n.json Outdated
Comment thread problems/tier4/VB_T4_010_div_natural.json
Comment thread problems/tier5/VB_T5_003_safe_division_exceptions.json Outdated
- sum_to_n: exact formula @Nat.result == @Nat.0 * (@Nat.0 + 1) / 2
  (Z3 proves at Tier 1 — stronger than the >= @Nat.0 lower bound)
- div_natural: add upper bound (@Nat.result + 1) * @Nat.0 > @Nat.1
  (pins result to exact floor division, Z3 proves at Tier 1)
- safe_div description: 'throws 0-1' -> 'throws -1 (written as 0 - 1
  in Vera, which has no unary minus)' for clarity

GCD postcondition kept as-is: Z3 cannot prove divisibility properties
for recursive Euclidean algorithm at Tier 1 (tested and confirmed).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@aallan aallan merged commit d01004f into main Mar 30, 2026
10 checks passed
@aallan aallan deleted the fix/strengthen-problems branch March 30, 2026 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Strengthen postconditions to catch slot-swap bugs via vera verify Strengthen problem descriptions for De Bruijn slot ordering

1 participant