Context
The benchmark currently has 50 problems (10 per tier). Expanding to 75+ (15 per tier) would improve statistical power, reduce the impact of non-deterministic results on per-tier rates, and cover more Vera language features.
Guidelines for new problems
- Don't duplicate existing vera examples/ directory (in training data)
- Each problem should test something specific and be tagged accordingly
- Follow the existing problem structure (JSON + .vera + .py + .ts solutions)
- All canonical solutions must pass
vera check + vera verify
Suggested additions per tier
Tier 1 (5 more): Pure arithmetic
- Integer overflow handling
- Multi-way branching (4+ conditions)
- Nested arithmetic with complex postconditions
Tier 2 (5 more): Builtin discovery
string_split + string_join composition
array_range + array_map pipeline
json_parse + field extraction
regex_match (if available in SKILL.md)
- Type conversion functions (
int_to_string, float_to_int)
Tier 3 (5 more): ADTs + match
- Mutual ADTs (two types referencing each other)
- Generic ADTs (
List<T> with forall)
- Three+ variant ADTs
- Nested pattern matching (match inside match)
- ADT with multiple type parameters
Tier 4 (5 more): Recursion + termination
- Lexicographic decreases (two measures)
- Accumulator pattern with non-trivial invariant
- Tree traversal with path tracking
- String recursion (if string indexing supports it)
- Mutual recursion with where blocks
Tier 5 (5 more): Effects
- Nested effect handlers
- Multiple effects in one function
- Effect polymorphism
- State + IO combination
- Error recovery with retry logic
Context
The benchmark currently has 50 problems (10 per tier). Expanding to 75+ (15 per tier) would improve statistical power, reduce the impact of non-deterministic results on per-tier rates, and cover more Vera language features.
Guidelines for new problems
vera check+vera verifySuggested additions per tier
Tier 1 (5 more): Pure arithmetic
Tier 2 (5 more): Builtin discovery
string_split+string_joincompositionarray_range+array_mappipelinejson_parse+ field extractionregex_match(if available in SKILL.md)int_to_string,float_to_int)Tier 3 (5 more): ADTs + match
List<T>with forall)Tier 4 (5 more): Recursion + termination
Tier 5 (5 more): Effects