Skip to content

Expand to 75+ problems (15 per tier) #25

@aallan

Description

@aallan

Context

The benchmark currently has 50 problems (10 per tier). Expanding to 75+ (15 per tier) would improve statistical power, reduce the impact of non-deterministic results on per-tier rates, and cover more Vera language features.

Guidelines for new problems

  • Don't duplicate existing vera examples/ directory (in training data)
  • Each problem should test something specific and be tagged accordingly
  • Follow the existing problem structure (JSON + .vera + .py + .ts solutions)
  • All canonical solutions must pass vera check + vera verify

Suggested additions per tier

Tier 1 (5 more): Pure arithmetic

  • Integer overflow handling
  • Multi-way branching (4+ conditions)
  • Nested arithmetic with complex postconditions

Tier 2 (5 more): Builtin discovery

  • string_split + string_join composition
  • array_range + array_map pipeline
  • json_parse + field extraction
  • regex_match (if available in SKILL.md)
  • Type conversion functions (int_to_string, float_to_int)

Tier 3 (5 more): ADTs + match

  • Mutual ADTs (two types referencing each other)
  • Generic ADTs (List<T> with forall)
  • Three+ variant ADTs
  • Nested pattern matching (match inside match)
  • ADT with multiple type parameters

Tier 4 (5 more): Recursion + termination

  • Lexicographic decreases (two measures)
  • Accumulator pattern with non-trivial invariant
  • Tree traversal with path tracking
  • String recursion (if string indexing supports it)
  • Mutual recursion with where blocks

Tier 5 (5 more): Effects

  • Nested effect handlers
  • Multiple effects in one function
  • Effect polymorphism
  • State + IO combination
  • Error recovery with retry logic

Metadata

Metadata

Assignees

No one assigned

    Labels

    problemsProblem definitions and canonical solutions

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions