Skip to content

Add 10 new T2/T3 problems with testable signatures#57

Merged
aallan merged 5 commits into
mainfrom
feature/expand-test-cases
Apr 16, 2026
Merged

Add 10 new T2/T3 problems with testable signatures#57
aallan merged 5 commits into
mainfrom
feature/expand-test-cases

Conversation

@aallan

@aallan aallan commented Apr 16, 2026

Copy link
Copy Markdown
Owner

Summary

  • Add 5 new Tier 2 problems (VB-T2-011 through 015) testing string builtin discovery
  • Add 5 new Tier 3 problems (VB-T3-011 through 015) testing ADT definition + pattern matching
  • Add test_cases to existing VB-T2-004 and VB-T2-005 (previously testable but empty)
  • Canonical solutions in all 4 languages (Vera, Python, TypeScript, Aver)

Context

When #56 splits the report into T1-T5 and T1-T4 sections, the T1-T4 run_correct denominator was only 18 problems (T1: 10, T4: 8) because T2 and T3 had zero test cases -- vera run --fn could not handle their Array/ADT/String-return signatures.

This PR increases the T1-T4 testable pool to 30 problems by:

  1. Adding test_cases to existing problems where the signature was already testable (T2-004 @String -> @Bool, T2-005 @String, @String -> @Bool)
  2. Designing new T2 problems with testable signatures (String inputs, Bool/Nat outputs): starts_with_prefix, ends_with_suffix, get_char_code, combined_length, is_longer_than
  3. Designing new T3 problems with Int-only signatures that require internal ADT construction + match: safe_divide, pair_sum, classify_sign, color_code, either_select

The T3 design is the key insight: functions take @Int args and return @Int, but the LLM must define private data types and write match expressions internally. This preserves the tier's intent (testing ADT + pattern matching) while making the problems testable via vera run --fn.

Tier Before (testable/total) After (testable/total)
T1 10/10 10/10
T2 0/10 7/15
T3 0/10 5/15
T4 8/10 8/10
T1-T4 18/40 (45%) 30/50 (60%)

Partial progress on #25. Intended to be merged alongside #56.

Test plan

  • All 60 problem JSONs pass schema validation (required fields present)
  • Python baselines: 10/10 new solutions pass assertions
  • TypeScript baselines: 10/10 new solutions pass via npx tsx
  • Vera solutions: correct De Bruijn indices, mandatory braces, proper data definitions
  • ruff check clean
  • 445 existing tests pass (2 preexisting failures from broken vera install)
  • Vera canonical solutions pass vera check + vera verify (blocked by local vera install)

Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added 10 new benchmark problems (5 Tier 2, 5 Tier 3), expanding the suite to 60 challenges.
  • Tests

    • Added test cases for the 10 new problems and populated tests for two existing string exercises (is_empty_string, contains_substring).
  • Documentation

    • Updated quick-start and contribution docs to reflect validating “all 60 problems and canonical solutions”.
  • Chores

    • Bumped project and citation version to v0.0.9 and updated changelog/roadmap entries.

@coderabbitai

coderabbitai Bot commented Apr 16, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 455dedce-a91a-49fd-9cf2-e2640b7d670e

📥 Commits

Reviewing files that changed from the base of the PR and between 8928723 and c80c27a.

⛔ Files ignored due to path filters (5)
  • solutions/aver/VB_T3_011_safe_divide.av is excluded by !**/*.av, !solutions/aver/**
  • solutions/aver/VB_T3_012_pair_sum.av is excluded by !**/*.av, !solutions/aver/**
  • solutions/aver/VB_T3_013_classify_sign.av is excluded by !**/*.av, !solutions/aver/**
  • solutions/aver/VB_T3_014_color_code.av is excluded by !**/*.av, !solutions/aver/**
  • solutions/aver/VB_T3_015_either_select.av is excluded by !**/*.av, !solutions/aver/**
📒 Files selected for processing (4)
  • problems/tier2/VB_T2_011_starts_with_prefix.json
  • problems/tier2/VB_T2_012_ends_with_suffix.json
  • problems/tier2/VB_T2_013_get_char_code.json
  • problems/tier2/VB_T2_014_combined_length.json

📝 Walkthrough

Walkthrough

This PR expands the benchmark from 50 → 60 problems by adding five Tier‑2 string problems and five Tier‑3 ADT pattern‑matching problems, fills test cases for two existing Tier‑2 problems, and updates docs/metadata (version, changelog, roadmap, citation).

Changes

Cohort / File(s) Summary
Documentation & Metadata
CHANGELOG.md, CONTRIBUTING.md, README.md, CITATION.cff, ROADMAP.md, pyproject.toml
Bumped package/version to 0.0.9, updated release date and changelog compare links, adjusted docs to reference 60 problems, and added roadmap notes about the new problems and T3 Int-only ADT style.
Tier 2 Test Case Population
problems/tier2/VB_T2_004_is_empty_string.json, problems/tier2/VB_T2_005_contains_substring.json
Populated test_cases with concrete inputs/expected outputs for is_empty_string and contains_substring.
New Tier 2 String Problems
problems/tier2/VB_T2_011_starts_with_prefix.json, problems/tier2/VB_T2_012_ends_with_suffix.json, problems/tier2/VB_T2_013_get_char_code.json, problems/tier2/VB_T2_014_combined_length.json, problems/tier2/VB_T2_015_is_longer_than.json
Added five Tier‑2 JSON specs with public entry points, pure contract stubs, tags/notes referencing string built‑ins, and five test cases each.
New Tier 3 ADT Pattern‑Matching Problems
problems/tier3/VB_T3_011_safe_divide.json, problems/tier3/VB_T3_012_pair_sum.json, problems/tier3/VB_T3_013_classify_sign.json, problems/tier3/VB_T3_014_color_code.json, problems/tier3/VB_T3_015_either_select.json
Added five Tier‑3 specs requiring internal ADT construction and pattern matching; external signatures use Int-only params, include constructor/De Bruijn notes, verification flags, and test suites.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

problems, docs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarises the primary change: adding 10 new Tier 2 and Tier 3 problems with testable function signatures, which is the core objective of this PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/expand-test-cases

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@problems/tier2/VB_T2_012_ends_with_suffix.json`:
- Line 23: The vera_verify_tier1 flag is incorrectly set to true for a problem
that uses string contracts; update the JSON so the "vera_verify_tier1" property
is false for this problem (VB_T2_012_ends_with_suffix) to reflect that string
contracts fall to Tier 3 runtime verification.
- Around line 15-21: The test harness entries under the "test_cases" array must
be removed because this problem's function accepts String parameters which vera
run --fn cannot parse; update the JSON by clearing the "test_cases" array (make
it empty) so there are no runtime tests; ensure the "test_cases" key remains but
its value is an empty list to rely on vera check/verify for validation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 04218f3b-6234-423c-8ff8-111fc382c7bc

📥 Commits

Reviewing files that changed from the base of the PR and between acf1693 and b18c22d.

⛔ Files ignored due to path filters (40)
  • solutions/aver/VB_T2_011_starts_with_prefix.av is excluded by !**/*.av, !solutions/aver/**
  • solutions/aver/VB_T2_012_ends_with_suffix.av is excluded by !**/*.av, !solutions/aver/**
  • solutions/aver/VB_T2_013_get_char_code.av is excluded by !**/*.av, !solutions/aver/**
  • solutions/aver/VB_T2_014_combined_length.av is excluded by !**/*.av, !solutions/aver/**
  • solutions/aver/VB_T2_015_is_longer_than.av is excluded by !**/*.av, !solutions/aver/**
  • solutions/aver/VB_T3_011_safe_divide.av is excluded by !**/*.av, !solutions/aver/**
  • solutions/aver/VB_T3_012_pair_sum.av is excluded by !**/*.av, !solutions/aver/**
  • solutions/aver/VB_T3_013_classify_sign.av is excluded by !**/*.av, !solutions/aver/**
  • solutions/aver/VB_T3_014_color_code.av is excluded by !**/*.av, !solutions/aver/**
  • solutions/aver/VB_T3_015_either_select.av is excluded by !**/*.av, !solutions/aver/**
  • solutions/python/VB_T2_011_starts_with_prefix.py is excluded by !solutions/python/**
  • solutions/python/VB_T2_012_ends_with_suffix.py is excluded by !solutions/python/**
  • solutions/python/VB_T2_013_get_char_code.py is excluded by !solutions/python/**
  • solutions/python/VB_T2_014_combined_length.py is excluded by !solutions/python/**
  • solutions/python/VB_T2_015_is_longer_than.py is excluded by !solutions/python/**
  • solutions/python/VB_T3_011_safe_divide.py is excluded by !solutions/python/**
  • solutions/python/VB_T3_012_pair_sum.py is excluded by !solutions/python/**
  • solutions/python/VB_T3_013_classify_sign.py is excluded by !solutions/python/**
  • solutions/python/VB_T3_014_color_code.py is excluded by !solutions/python/**
  • solutions/python/VB_T3_015_either_select.py is excluded by !solutions/python/**
  • solutions/typescript/VB_T2_011_starts_with_prefix.ts is excluded by !solutions/typescript/**
  • solutions/typescript/VB_T2_012_ends_with_suffix.ts is excluded by !solutions/typescript/**
  • solutions/typescript/VB_T2_013_get_char_code.ts is excluded by !solutions/typescript/**
  • solutions/typescript/VB_T2_014_combined_length.ts is excluded by !solutions/typescript/**
  • solutions/typescript/VB_T2_015_is_longer_than.ts is excluded by !solutions/typescript/**
  • solutions/typescript/VB_T3_011_safe_divide.ts is excluded by !solutions/typescript/**
  • solutions/typescript/VB_T3_012_pair_sum.ts is excluded by !solutions/typescript/**
  • solutions/typescript/VB_T3_013_classify_sign.ts is excluded by !solutions/typescript/**
  • solutions/typescript/VB_T3_014_color_code.ts is excluded by !solutions/typescript/**
  • solutions/typescript/VB_T3_015_either_select.ts is excluded by !solutions/typescript/**
  • solutions/vera/VB-T2-011_starts_with_prefix.vera is excluded by !**/*.vera
  • solutions/vera/VB-T2-012_ends_with_suffix.vera is excluded by !**/*.vera
  • solutions/vera/VB-T2-013_get_char_code.vera is excluded by !**/*.vera
  • solutions/vera/VB-T2-014_combined_length.vera is excluded by !**/*.vera
  • solutions/vera/VB-T2-015_is_longer_than.vera is excluded by !**/*.vera
  • solutions/vera/VB-T3-011_safe_divide.vera is excluded by !**/*.vera
  • solutions/vera/VB-T3-012_pair_sum.vera is excluded by !**/*.vera
  • solutions/vera/VB-T3-013_classify_sign.vera is excluded by !**/*.vera
  • solutions/vera/VB-T3-014_color_code.vera is excluded by !**/*.vera
  • solutions/vera/VB-T3-015_either_select.vera is excluded by !**/*.vera
📒 Files selected for processing (15)
  • CHANGELOG.md
  • CONTRIBUTING.md
  • README.md
  • problems/tier2/VB_T2_004_is_empty_string.json
  • problems/tier2/VB_T2_005_contains_substring.json
  • problems/tier2/VB_T2_011_starts_with_prefix.json
  • problems/tier2/VB_T2_012_ends_with_suffix.json
  • problems/tier2/VB_T2_013_get_char_code.json
  • problems/tier2/VB_T2_014_combined_length.json
  • problems/tier2/VB_T2_015_is_longer_than.json
  • problems/tier3/VB_T3_011_safe_divide.json
  • problems/tier3/VB_T3_012_pair_sum.json
  • problems/tier3/VB_T3_013_classify_sign.json
  • problems/tier3/VB_T3_014_color_code.json
  • problems/tier3/VB_T3_015_either_select.json

Comment thread problems/tier2/VB_T2_012_ends_with_suffix.json
Comment thread problems/tier2/VB_T2_012_ends_with_suffix.json
@aallan

aallan commented Apr 16, 2026

Copy link
Copy Markdown
Owner Author

Re: CodeRabbit suggestion to remove test_cases from VB-T2-012 — this is incorrect. vera run --fn has supported String arguments since v0.0.102 (see #25 comment thread, specifically the vera#441 reply). Verified locally:

$ vera run solutions/vera/VB-T2-012_ends_with_suffix.vera --fn ends_with_suffix -- "hello" "llo"
1
$ vera run solutions/vera/VB-T2-012_ends_with_suffix.vera --fn ends_with_suffix -- "hello" "world"
0

The test_cases are correct and should stay. What vera run --fn cannot handle is Array/ADT arguments and String return values (prints WASM pointer). String inputs with Bool/Int/Nat returns work fine — that is the design basis for all the new T2 problems.

@codecov

codecov Bot commented Apr 16, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.30%. Comparing base (b59fdc6) to head (c80c27a).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #57   +/-   ##
=======================================
  Coverage   83.30%   83.30%           
=======================================
  Files          10       10           
  Lines        1360     1360           
=======================================
  Hits         1133     1133           
  Misses        227      227           
Flag Coverage Δ
python 83.30% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

aallan and others added 3 commits April 16, 2026 09:13
Expand the testable problem pool so that T1-T4 run_correct has a
meaningful denominator when T5 is excluded for cross-language comparison.

Before: 18/40 testable T1-T4 problems (T2: 0, T3: 0)
After:  30/50 testable T1-T4 problems (T2: 7, T3: 5)

New T2 problems (VB-T2-011 through 015) test string builtin discovery
with testable signatures (String/Int inputs, Bool/Nat outputs):
starts_with, ends_with, char_code, combined_length, is_longer_than.

New T3 problems (VB-T3-011 through 015) test ADT definition and pattern
matching with Int-only signatures — the function takes primitive args but
must construct ADTs and match internally: safe_divide, pair_sum,
classify_sign, color_code, either_select.

Also adds test_cases to VB-T2-004 (is_empty_string) and VB-T2-005
(contains_substring), which were testable but had empty test_cases.

All canonical solutions provided in Vera, Python, TypeScript, and Aver.
Python and TypeScript baselines verified passing.

Partial progress on #25.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Vera let syntax is `let @type = expr;`, not `let name: Type = expr;`.
The parser expects AT or UPPER_IDENT after let, not a lowercase
identifier. Fixed all 5 new T3 canonical solutions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adding problems is a version break for the benchmark — results from
v0.0.9 (60 problems) must not be conflated with v0.0.8 (50 problems).
The bench_version field in JSONL output comes from pyproject.toml via
importlib.metadata, so bumping here ensures result files are tagged
correctly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@aallan aallan force-pushed the feature/expand-test-cases branch from ff07af5 to 8928723 Compare April 16, 2026 08:15
@jasisz

jasisz commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

Heads up — the new Aver solutions have a couple of issues:

T2: String.starts_with, String.ends_with, Char.to_int, String.char_at don't exist in Aver's stdlib. Only String.len, String.contains, String.concat, String.toUpper, String.toLower, String.trim, String.join are available (see existing T2-004/T2-005 for reference).

T3: ADT syntax uses Haskell-style type X = A | B(Int), but Aver needs indented constructors + qualified names in match:

type DivResult
    Failure
    Success(Int)

match r
    DivResult.Failure -> -1
    DivResult.Success(v) -> v

(See solutions/aver/VB_T3_001_list_length.av for the pattern.)

T2-014 and T2-015 (String.len) look fine.

@aallan

aallan commented Apr 16, 2026

Copy link
Copy Markdown
Owner Author

Good catches, thanks @jasisz! Fixed in 7ad8d7e:

T2: Removed VB-T2-011/012/013 Aver solutions entirely — starts_with, ends_with, and char_at simply aren't available in the stdlib, so there's no canonical solution to ship. T2-014/015 (String.len) were already fine.

T3: All 5 solutions now use indented constructor syntax and qualified names (DivResult.Failure, Pair.MkPair, etc.) — matched the pattern from your VB_T3_001_list_length.av.

I don't have Aver installed locally to verify — would you mind giving these a quick sanity check?

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@problems/tier2/VB_T2_011_starts_with_prefix.json`:
- Line 24: The notes for starts_with_prefix are missing the De Bruijn slot
ordering which causes param inversion; update the "notes" field for
starts_with_prefix(`@String`, `@String` -> `@Bool`) to explicitly state slot ordering:
`@String.0` = prefix (the first argument), `@String.1` = haystack (the
second/rightmost argument), and that `@Bool` is the result, so future readers
won't confuse the parameter order. Reference the function name
starts_with_prefix and the param slots `@String.0` and `@String.1` when adding the
sentence.

In `@problems/tier2/VB_T2_012_ends_with_suffix.json`:
- Line 24: Update the "notes" field in the JSON entry to explicitly call out the
parameter slot ordering for the two-String problem: state that `@String.0` refers
to the rightmost parameter and warn that string_ends_with(haystack, suffix) can
be easily inverted (suffix vs haystack); mention which slot is expected for
"suffix" and which for "haystack" to prevent argument inversion when using
string_ends_with(...).

In `@problems/tier2/VB_T2_013_get_char_code.json`:
- Around line 23-24: Update the JSON metadata to enable Tier 1 verification by
changing the boolean field "vera_verify_tier1" from false to true; this file
uses trivial contracts (requires(true) and ensures(true)) so set
"vera_verify_tier1": true to reflect that Tier 1 verification applies.

In `@problems/tier2/VB_T2_014_combined_length.json`:
- Around line 23-24: Update the JSON to enable Tier 1 verification by setting
"vera_verify_tier1" to true and revise the "notes" field to state that the
contracts (requires(true), ensures(true)) are trivially true and Z3-verifiable
at Tier 1; modify the "vera_verify_tier1" and "notes" entries accordingly so
they reflect that string_length in the function body does not force Tier 3
fallback.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: cf815003-10bc-40f7-89a2-415ac6215345

📥 Commits

Reviewing files that changed from the base of the PR and between ff07af5 and 8928723.

⛔ Files ignored due to path filters (40)
  • solutions/aver/VB_T2_011_starts_with_prefix.av is excluded by !**/*.av, !solutions/aver/**
  • solutions/aver/VB_T2_012_ends_with_suffix.av is excluded by !**/*.av, !solutions/aver/**
  • solutions/aver/VB_T2_013_get_char_code.av is excluded by !**/*.av, !solutions/aver/**
  • solutions/aver/VB_T2_014_combined_length.av is excluded by !**/*.av, !solutions/aver/**
  • solutions/aver/VB_T2_015_is_longer_than.av is excluded by !**/*.av, !solutions/aver/**
  • solutions/aver/VB_T3_011_safe_divide.av is excluded by !**/*.av, !solutions/aver/**
  • solutions/aver/VB_T3_012_pair_sum.av is excluded by !**/*.av, !solutions/aver/**
  • solutions/aver/VB_T3_013_classify_sign.av is excluded by !**/*.av, !solutions/aver/**
  • solutions/aver/VB_T3_014_color_code.av is excluded by !**/*.av, !solutions/aver/**
  • solutions/aver/VB_T3_015_either_select.av is excluded by !**/*.av, !solutions/aver/**
  • solutions/python/VB_T2_011_starts_with_prefix.py is excluded by !solutions/python/**
  • solutions/python/VB_T2_012_ends_with_suffix.py is excluded by !solutions/python/**
  • solutions/python/VB_T2_013_get_char_code.py is excluded by !solutions/python/**
  • solutions/python/VB_T2_014_combined_length.py is excluded by !solutions/python/**
  • solutions/python/VB_T2_015_is_longer_than.py is excluded by !solutions/python/**
  • solutions/python/VB_T3_011_safe_divide.py is excluded by !solutions/python/**
  • solutions/python/VB_T3_012_pair_sum.py is excluded by !solutions/python/**
  • solutions/python/VB_T3_013_classify_sign.py is excluded by !solutions/python/**
  • solutions/python/VB_T3_014_color_code.py is excluded by !solutions/python/**
  • solutions/python/VB_T3_015_either_select.py is excluded by !solutions/python/**
  • solutions/typescript/VB_T2_011_starts_with_prefix.ts is excluded by !solutions/typescript/**
  • solutions/typescript/VB_T2_012_ends_with_suffix.ts is excluded by !solutions/typescript/**
  • solutions/typescript/VB_T2_013_get_char_code.ts is excluded by !solutions/typescript/**
  • solutions/typescript/VB_T2_014_combined_length.ts is excluded by !solutions/typescript/**
  • solutions/typescript/VB_T2_015_is_longer_than.ts is excluded by !solutions/typescript/**
  • solutions/typescript/VB_T3_011_safe_divide.ts is excluded by !solutions/typescript/**
  • solutions/typescript/VB_T3_012_pair_sum.ts is excluded by !solutions/typescript/**
  • solutions/typescript/VB_T3_013_classify_sign.ts is excluded by !solutions/typescript/**
  • solutions/typescript/VB_T3_014_color_code.ts is excluded by !solutions/typescript/**
  • solutions/typescript/VB_T3_015_either_select.ts is excluded by !solutions/typescript/**
  • solutions/vera/VB-T2-011_starts_with_prefix.vera is excluded by !**/*.vera
  • solutions/vera/VB-T2-012_ends_with_suffix.vera is excluded by !**/*.vera
  • solutions/vera/VB-T2-013_get_char_code.vera is excluded by !**/*.vera
  • solutions/vera/VB-T2-014_combined_length.vera is excluded by !**/*.vera
  • solutions/vera/VB-T2-015_is_longer_than.vera is excluded by !**/*.vera
  • solutions/vera/VB-T3-011_safe_divide.vera is excluded by !**/*.vera
  • solutions/vera/VB-T3-012_pair_sum.vera is excluded by !**/*.vera
  • solutions/vera/VB-T3-013_classify_sign.vera is excluded by !**/*.vera
  • solutions/vera/VB-T3-014_color_code.vera is excluded by !**/*.vera
  • solutions/vera/VB-T3-015_either_select.vera is excluded by !**/*.vera
📒 Files selected for processing (18)
  • CHANGELOG.md
  • CITATION.cff
  • CONTRIBUTING.md
  • README.md
  • ROADMAP.md
  • problems/tier2/VB_T2_004_is_empty_string.json
  • problems/tier2/VB_T2_005_contains_substring.json
  • problems/tier2/VB_T2_011_starts_with_prefix.json
  • problems/tier2/VB_T2_012_ends_with_suffix.json
  • problems/tier2/VB_T2_013_get_char_code.json
  • problems/tier2/VB_T2_014_combined_length.json
  • problems/tier2/VB_T2_015_is_longer_than.json
  • problems/tier3/VB_T3_011_safe_divide.json
  • problems/tier3/VB_T3_012_pair_sum.json
  • problems/tier3/VB_T3_013_classify_sign.json
  • problems/tier3/VB_T3_014_color_code.json
  • problems/tier3/VB_T3_015_either_select.json
  • pyproject.toml

Comment thread problems/tier2/VB_T2_011_starts_with_prefix.json Outdated
Comment thread problems/tier2/VB_T2_012_ends_with_suffix.json Outdated
Comment thread problems/tier2/VB_T2_013_get_char_code.json Outdated
Comment thread problems/tier2/VB_T2_014_combined_length.json Outdated
@jasisz

jasisz commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

The T3 Aver solutions still fail — Aver has no let or if/then/else:

$ aver run VB_T3_011_safe_divide.av
error[9:0]: Unknown identifier 'let'
error[9:0]: Unknown identifier 'if'
error[9:0]: Unknown identifier 'then'
error[9:0]: Unknown identifier 'else'

All branching must go through match. Here are working solutions (all pass aver check + aver run):

VB_T3_011_safe_divide.av
module SafeDivide
    intent = "Integer division returning -1 for division by zero, using ADT and pattern matching."
    exposes [safe_divide]

type DivResult
    Failure
    Success(Int)

fn toResult(a: Int, b: Int) -> DivResult
    ? "Wraps division in a DivResult."
    match b == 0
        true -> DivResult.Failure
        false -> DivResult.Success(a / b)

fn safe_divide(a: Int, b: Int) -> Int
    ? "Returns a / b, or -1 if b is zero."
    match toResult(a, b)
        DivResult.Failure -> -1
        DivResult.Success(v) -> v

verify toResult
    toResult(10, 2) => DivResult.Success(5)
    toResult(10, 0) => DivResult.Failure

verify safe_divide
    safe_divide(10, 2) => 5
    safe_divide(7, 3) => 2
    safe_divide(10, 0) => -1
    safe_divide(0, 5) => 0
    safe_divide(-6, 3) => -2

fn main() -> Unit
    ! [Console.print]
    Console.print(safe_divide(10, 2))
    Console.print(safe_divide(10, 0))
    Console.print(safe_divide(-6, 3))
VB_T3_012_pair_sum.av
module PairSum
    intent = "Construct a pair from two integers and sum them using pattern matching."
    exposes [pair_sum]

type Pair
    MkPair(Int, Int)

fn pair_sum(a: Int, b: Int) -> Int
    ? "Constructs a pair and returns the sum of both components."
    match Pair.MkPair(a, b)
        Pair.MkPair(x, y) -> x + y

verify pair_sum
    pair_sum(3, 4) => 7
    pair_sum(0, 0) => 0
    pair_sum(-5, 5) => 0
    pair_sum(10, -3) => 7
    pair_sum(100, 200) => 300

fn main() -> Unit
    ! [Console.print]
    Console.print(pair_sum(3, 4))
    Console.print(pair_sum(0, 0))
    Console.print(pair_sum(-5, 5))
VB_T3_013_classify_sign.av
module ClassifySign
    intent = "Classify an integer's sign using an enum ADT and pattern matching."
    exposes [classifySign]

type Sign
    Negative
    Zero
    Positive

fn toSign(n: Int) -> Sign
    ? "Maps an integer to its Sign variant."
    match n < 0
        true -> Sign.Negative
        false -> match n == 0
            true -> Sign.Zero
            false -> Sign.Positive

fn classifySign(n: Int) -> Int
    ? "Returns -1 for negative, 0 for zero, 1 for positive."
    match toSign(n)
        Sign.Negative -> -1
        Sign.Zero -> 0
        Sign.Positive -> 1

verify toSign
    toSign(-5) => Sign.Negative
    toSign(0) => Sign.Zero
    toSign(5) => Sign.Positive

verify classifySign
    classifySign(42) => 1
    classifySign(-7) => -1
    classifySign(0) => 0
    classifySign(1) => 1
    classifySign(-100) => -1

fn main() -> Unit
    ! [Console.print]
    Console.print(classifySign(42))
    Console.print(classifySign(-7))
    Console.print(classifySign(0))
VB_T3_014_color_code.av
module ColorCode
    intent = "Map an integer to a Color ADT variant and return the RGB value via pattern matching."
    exposes [colorCode]

type Color
    Red
    Green
    Blue
    Black

fn toColor(code: Int) -> Color
    ? "Maps an integer code to a Color variant."
    match code
        0 -> Color.Red
        1 -> Color.Green
        2 -> Color.Blue
        _ -> Color.Black

fn colorCode(code: Int) -> Int
    ? "Returns the RGB integer for the given color code."
    match toColor(code)
        Color.Red -> 16711680
        Color.Green -> 65280
        Color.Blue -> 255
        Color.Black -> 0

verify toColor
    toColor(0) => Color.Red
    toColor(1) => Color.Green
    toColor(2) => Color.Blue
    toColor(99) => Color.Black

verify colorCode
    colorCode(0) => 16711680
    colorCode(1) => 65280
    colorCode(2) => 255
    colorCode(3) => 0
    colorCode(-1) => 0

fn main() -> Unit
    ! [Console.print]
    Console.print(colorCode(0))
    Console.print(colorCode(1))
    Console.print(colorCode(2))
VB_T3_015_either_select.av
module EitherSelect
    intent = "Conditionally wrap a value in Left or Right, then extract it via pattern matching."
    exposes [eitherSelect]

type Choice
    Left(Int)
    Right(Int)

fn wrapChoice(a: Int, b: Int, selector: Int) -> Choice
    ? "Wraps a or b in a Choice based on selector."
    match selector > 0
        true -> Choice.Left(a)
        false -> Choice.Right(b)

fn eitherSelect(a: Int, b: Int, selector: Int) -> Int
    ? "Returns a if selector > 0, otherwise b, using an Either-like ADT."
    match wrapChoice(a, b, selector)
        Choice.Left(v) -> v
        Choice.Right(v) -> v

verify wrapChoice
    wrapChoice(10, 20, 1) => Choice.Left(10)
    wrapChoice(10, 20, 0) => Choice.Right(20)

verify eitherSelect
    eitherSelect(10, 20, 1) => 10
    eitherSelect(10, 20, 0) => 20
    eitherSelect(10, 20, -5) => 20
    eitherSelect(42, 99, 100) => 42
    eitherSelect(0, 0, 1) => 0

fn main() -> Unit
    ! [Console.print]
    Console.print(eitherSelect(10, 20, 1))
    Console.print(eitherSelect(10, 20, 0))
    Console.print(eitherSelect(42, 99, 100))

Key patterns:

  • name = expr instead of let name = expr
  • match bool_expr with true ->/false -> instead of if/then/else
  • Match arm body must start on same line as ->, so extract helpers for multi-step logic
  • Helper fns need their own verify blocks (aver check warns otherwise)
  • Aver uses camelCase for function names (not snake_case)

Aver has no `let` or `if/then/else` — conditionals use `match` on
booleans, bindings use `name = expr`. Rewrote all 5 T3 solutions
to use idiomatic Aver. All 25 verify cases pass (aver 0.9.5).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@aallan aallan force-pushed the feature/expand-test-cases branch from 7ad8d7e to 2bb0329 Compare April 16, 2026 08:27
@aallan

aallan commented Apr 16, 2026

Copy link
Copy Markdown
Owner Author

Thanks @jasisz — installed Aver 0.9.5 locally and fixed everything properly this time.

T2: Removed VB-T2-011/012/013 (no starts_with/ends_with/char_at in stdlib). T2-014/015 already fine.

T3: Turned out the qualified constructor fix alone wasn't enough — Aver also has no let or if/then/else, so I rewrote all 5 to use idiomatic match on booleans and name = expr bindings. All verified:

✓ safe_divide      5/5
✓ pair_sum         5/5
✓ classify_sign    5/5
✓ color_code       5/5
✓ either_select    5/5

Verified with aver check + aver verify against Aver 0.9.5.

- T2-011/012: Add De Bruijn slot ordering to notes (matches T2-005 pattern)
- T2-013/014: Change vera_verify_tier1 from false to true — contracts
  are trivial (ensures(true)) so Z3-verifiable at Tier 1. String ops
  are in the function body, not in the contracts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fn color_code(code: Int) -> Int
? "Returns the RGB integer for the given color code."
c: Color = match code

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just re-discovered this is valid Aver, love the pattern!

@aallan aallan merged commit 60187c4 into main Apr 16, 2026
10 checks passed
@aallan aallan deleted the feature/expand-test-cases branch April 16, 2026 08:40
sunholo-voight-kampff added a commit to sunholo-voight-kampff/vera-bench that referenced this pull request May 22, 2026
Items 2, 3, 4 from @aallan's consolidated review on PR aallan#70.
(Item 1 — extracting --parallel N into its own PR — addressed via
PR aallan#73.)

### Item 2: README headline section -> single sentence in §Overview

Removed the "AILANG: AI-designed language..." headline section
(13 lines: the heading, the description paragraphs, the per-mode
results table, the "full-circle finding" paragraph). The phrasing
included editorial claims about VeraBench's identity that should
be a project-owner call, and "added in this fork" wouldn't read
correctly post-merge.

Replaced the §Overview line about baselines with the form
@aallan suggested verbatim:

  The same problems are also run in Python, TypeScript,
  [Aver](https://github.com/jasisz/aver), and [AILANG](https://ailang.sunholo.com/)
  as baselines. AILANG and Aver are zero-training-data languages,
  providing additional data points alongside Vera for the
  language-design-vs-training-data thesis.

Matches the existing Aver pattern: light-touch mention without
results writeups in the README.

### Item 3: Delete AILANG_MAPPING.md and AILANG_RESULTS.md

Neither file is load-bearing — no code or tests reference them.
Aver landed across PRs aallan#57 / aallan#62 / aallan#65 without AVER_RESULTS.md or
AVER_MAPPING.md. Numbers and writeups go in PR descriptions and
external content; in-repo docs are reserved for things future
maintainers need.

### Item 4: .coderabbit.yaml path_filters

Added the two missing AILANG entries to mirror the existing
{python, typescript, aver} pattern:

    - "!**/*.ail"             (alongside !**/*.vera, !**/*.av)
    - "!solutions/ailang/**"  (alongside the other solutions/* entries)

This stops CodeRabbit from generating speculative findings on
.ail solution files in future review passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants