Skip to content

Commit ae573d3

Browse files
aallanClaude
andcommitted
Address PR review findings on #671
Comprehensive PR review (code-reviewer + pr-test-analyzer + comment-analyzer) surfaced 5 actionable items. All fixed here: **Inaccurate comments** (comment-analyzer) 1. `tests/test_stress.py` — `test_tco_with_allocation_1m_iterations` docstring claimed "~12 bytes per leaked frame ≈ 1300 iters". `gc_shadow_push` writes one i32 = 4 bytes, not 12 (the 12-byte figure came from a different fn elsewhere in the file with 2 pointer params + 1 tmp root). The `loop` fn here only pushes 1 pointer per iteration. Corrected to "4 bytes per leaked pointer root ≈ 4,096 iterations" and added the inline note that the @int params aren't pointer-typed. Also dropped the "~200ms on a 2026 MacBook Pro" claim (will rot). 2. `vera/codegen/functions.py` — comment cited "operators.py" unqualified, but `vera/codegen/` and `vera/wasm/` are peer directories — should be `vera/wasm/operators.py`. This matters because #672 explicitly audits the inline-indentation surface; a mis-pathed reference would cost search time. Added a #672 cross-reference too. 3. `vera/codegen/functions.py` dispatch comment said "Two sources of post-body work" but the actual code is a three-branch dispatch (revert / patch / untouched). Reframed as "Three outcomes (precedence: 1 > 2 > 3)" with the third explicitly named — the common non-allocating tail-recursion case from SKILL.md's "Iteration" section. Also trimmed the `alloc_local` implementation-detail digression that didn't add value. 4. `tests/test_codegen.py` — `test_allocating_function_uses_gc_ aware_tco_549` docstring said "sacrificing TCO for shadow- stack correctness". Inverted framing — pre-#549 the GC epilogue ran fine, what was sacrificed was WASM call-stack depth (1M iters trapped with `call stack exhausted` because the function paid a WASM frame per recursion). Reframed. **Test coverage gap** (pr-test-analyzer rating 8) 5. Added `test_allocating_function_gc_aware_tco_patches_both_ branches`. The patch loop iterates every body instruction and patches every `return_call` site individually; a buggy implementation that bails after the first match (e.g. an accidental `break`) would still pass the single-site test above. New test uses a `match` with two ADT arms each ending in `return_call $build`, asserts BOTH sites have the `local.get N; global.set $gc_sp` preamble, and that N matches the same `$gc_sp_save` local captured by the prologue. **Suggestion** (pr-test-analyzer rating 6) 6. `test_shadow_stack_overflow_traps` — added inline calibration for the 2000-iteration count: "16K shadow stack / 12 bytes per frame (2 Array<Bool> params + 1 tmp root) ≈ 1,365 frames; 2000 chosen with ~1.5× safety margin". Previously the threshold was undocumented; a future per-frame size change could silently weaken the test without anyone noticing. **Not addressed**: - pr-test-analyzer's mutual-recursion test suggestion (rating 8) — the dispatch is per-function and each function's correctness is independently pinned. Cross-function GC composition is a deeper concern that doesn't sit in this PR's scope and would benefit from a separate issue. - pr-test-analyzer's zero-`return_call`-site test (rating 6) — edge case is fine; allocating leaf fns just have an unused `$gc_sp_save` local, no harm. - api.py inner-helper workaround mention in `stack_exhausted` message — too tangential for an end-user-facing trap message. **code-reviewer verdict**: no issues found at confidence ≥ 80. The fix is structurally correct, the dispatch precedence is explicitly pinned by tests, and CodeRabbit's prior concerns (S101 noqa, indent alignment, boundary-safe regex) are all addressed. Validation: pytest 3863/3863 ✓, mypy clean ✓, ruff S rules clean ✓, doc-counts consistent (test counts updated to 3,893). Co-Authored-By: Claude <noreply@anthropic.invalid>
1 parent 846ce05 commit ae573d3

6 files changed

Lines changed: 162 additions & 43 deletions

File tree

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,7 @@ cp /path/to/vera/SKILL.md ~/.claude/skills/vera-language/SKILL.md
181181

182182
## Project status
183183

184-
Vera is in **active development** at v0.0.154 — 810+ commits, 154 releases, 3,892 tests, 96% code coverage, 86 conformance programs, 34 examples, and a 13-chapter specification. See **[HISTORY.md](HISTORY.md)** for how the compiler was built.
184+
Vera is in **active development** at v0.0.154 — 810+ commits, 154 releases, 3,893 tests, 96% code coverage, 86 conformance programs, 34 examples, and a 13-chapter specification. See **[HISTORY.md](HISTORY.md)** for how the compiler was built.
185185

186186
The reference compiler — parser, AST, type checker, contract verifier (Z3), WASM code generator, module system, browser runtime, and runtime contract insertion — is working. The language specification is in draft across [13 chapters](spec/).
187187

ROADMAP.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Where the project is going. See [HISTORY.md](HISTORY.md) for what's been built
44

55
## Where we are
66

7-
3,892 tests, 86 conformance programs, 34 examples, 13 spec chapters.
7+
3,893 tests, 86 conformance programs, 34 examples, 13 spec chapters.
88

99
## What's next
1010

TESTING.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This is the single source of truth for Vera's testing infrastructure, coverage d
66

77
| Metric | Value |
88
|--------|-------|
9-
| **Tests** | 3,892 across 32 files (~52,852 lines of test code; 3,862 passed + 16 stress, 14 skipped) |
9+
| **Tests** | 3,893 across 32 files (~52,968 lines of test code; 3,863 passed + 16 stress, 14 skipped) |
1010
| **Compiler code coverage** | 96% of 15,149 statements (CI minimum: 80%) |
1111
| **Conformance programs** | 86 programs across 9 spec chapters, validating every language feature |
1212
| **Example programs** | 34, all validated through `vera check` + `vera verify` |
@@ -58,15 +58,15 @@ python scripts/fix_allowlists.py --fix # auto-fix stale allowlists
5858
| `test_ast.py` | 127 | 1,130 | AST transformation, node structure, serialisation, string escape sequences, ability declarations |
5959
| `test_checker.py` | 521 | 5,939 | Type synthesis, slot resolution, effects, effect subtyping, contracts, exhaustiveness, cross-module typing, visibility, error codes, string built-ins, generic rejection, IO operation types, Markdown types, Regex types, abilities, Map collection, Set collection, Decimal type, Json type, Html type, Http effect, Inference effect, removed legacy name regression |
6060
| `test_verifier.py` | 145 | 2,094 | Z3 verification, counterexamples, tier classification, call-site preconditions, branch-aware preconditions, pipe operator, cross-module contracts, match/ADT verification, decreases verification, mutual recursion, refined Bool/String/Float64 param sorts, **@Nat subtraction underflow obligation** (#520 — Path-A obligation discharge via requires/path-conditions/path-aware Z3 refutation, pure-literal exclusion, Int-Int and Nat-Int → Int exemptions) |
61-
| `test_codegen.py` | 1,104 | 18,510 | WASM compilation, arithmetic, Float64, Byte, arrays (incl. compound element types), ADTs, match (incl. nested patterns), generics, State\<T\>, Exn\<E\> handlers, control flow, strings, string escape sequences, IO (read\_line, read\_file, write\_file, args, exit, get\_env, sleep, time, stderr), bounds checking, quantifiers, assert/assume, refinement type aliases, pipe operator, string built-ins, built-in shadowing, parse\_nat Result, GC, Markdown host bindings, Regex host bindings, Map collection, Set collection, Decimal type, Json type, Html type, Http effect, Inference effect, Random effect, example round-trips, GC shadow stack overflow, **WASM tail-call optimization** (#517 — `return_call` emission for tail-position calls, 50K- and 1M-iteration stress, structural assertions on `return_call`/plain `call` boundary, **GC-aware TCO for allocating fns (#549 — `$gc_sp` restore before each `return_call`)**, postcondition-fallback regression (still reverts to plain `call`), analyzer unit tests covering Block-trailing / IfExpr-both-branches / MatchExpr-arm-bodies / let-value-NOT-marked / call-args-NOT-marked / ExprStmt-statement-NOT-marked / IfExpr-condition-NOT-marked / MatchExpr-scrutinee-NOT-marked) |
61+
| `test_codegen.py` | 1,105 | 18,624 | WASM compilation, arithmetic, Float64, Byte, arrays (incl. compound element types), ADTs, match (incl. nested patterns), generics, State\<T\>, Exn\<E\> handlers, control flow, strings, string escape sequences, IO (read\_line, read\_file, write\_file, args, exit, get\_env, sleep, time, stderr), bounds checking, quantifiers, assert/assume, refinement type aliases, pipe operator, string built-ins, built-in shadowing, parse\_nat Result, GC, Markdown host bindings, Regex host bindings, Map collection, Set collection, Decimal type, Json type, Html type, Http effect, Inference effect, Random effect, example round-trips, GC shadow stack overflow, **WASM tail-call optimization** (#517 — `return_call` emission for tail-position calls, 50K- and 1M-iteration stress, structural assertions on `return_call`/plain `call` boundary, **GC-aware TCO for allocating fns (#549 — `$gc_sp` restore before each `return_call`)**, postcondition-fallback regression (still reverts to plain `call`), analyzer unit tests covering Block-trailing / IfExpr-both-branches / MatchExpr-arm-bodies / let-value-NOT-marked / call-args-NOT-marked / ExprStmt-statement-NOT-marked / IfExpr-condition-NOT-marked / MatchExpr-scrutinee-NOT-marked) |
6262
| `test_codegen_contracts.py` | 32 | 576 | Runtime pre/postconditions, contract fail messages, old/new state postconditions |
6363
| `test_codegen_monomorphize.py` | 71 | 1,326 | Generic instantiation, type inference, monomorphization edge cases, ability constraint satisfaction (Eq/Ord/Hash/Show), operation rewriting (eq/compare), show/hash dispatch, ADT auto-derivation, array operations (slice/map/filter/fold) |
6464
| `test_codegen_closures.py` | 50 | 1,624 | Closure lifting, captured variables, higher-order functions, iterative-builder shadow-stack regressions (#570), closure return-value shadow-push balance for both i32-pair and i32-ADT branches across array_map and array_mapi, plus VERA_EAGER_GC injection self-test (#593), IndexExpr-of-FnCall element-type inference (#614), non-contiguous capture and walker-order miscompiles (#615) |
6565
| `test_codegen_modules.py` | 23 | 836 | Cross-module guard rail, cross-module codegen, name collision detection (E608/E609/E610) |
6666
| `test_codegen_coverage.py` | 5 | 250 | Defensive error paths: E600, E601, E605, E606, unknown module calls |
6767
| `test_walker_defensive_branches_597.py` | 21 | 296 | Synthetic-AST tests for the 11 defensive `isinstance` branches added by #597 (`_scan_io_ops` / `_scan_expr_for_handlers` / `_infer_expr_wasm_type` / `_infer_vera_type`) plus the 5 pr-review fixes (#2/#3/#8 — ModuleCall/AnonFn/QualifiedCall return None; dead `is not None` guards on Block/HandleExpr removed) |
6868
| `test_check_walker_coverage_597.py` | 15 | 311 | Unit tests for `scripts/check_walker_coverage.py` parsing logic — Expr subclass extraction, isinstance flattening (incl. tuple form), checklist-block anchoring (incl. CR-3 regression test: `# Foo → bar` outside WALKER_COVERAGE block not counted), section-header tolerance, auto-discovery invariants, end-to-end main exit code |
69-
| `test_stress.py` | 16 | 551 | Scale-dependent regression tests (#596) — `@pytest.mark.stress`, skipped by default. 9 logical tests × eager-GC lane parametrisation = 16 test instances. 10K `array_map`, 5K nested-array `array_map`, 1K-deep tail recursion with allocating arg, 1M-deep tail recursion with allocating arg (#549 GC-aware TCO), 20×20 nested array-fold-of-array-fold, 100K `array_fold`, 10K String allocations, 1K `State<Int>` get/put cycles, 10K `IO.print` calls. Pins #570 / #515 / #593 / #549 / #487 / #348 / #573 regression coverage |
69+
| `test_stress.py` | 16 | 553 | Scale-dependent regression tests (#596) — `@pytest.mark.stress`, skipped by default. 9 logical tests × eager-GC lane parametrisation = 16 test instances. 10K `array_map`, 5K nested-array `array_map`, 1K-deep tail recursion with allocating arg, 1M-deep tail recursion with allocating arg (#549 GC-aware TCO), 20×20 nested array-fold-of-array-fold, 100K `array_fold`, 10K String allocations, 1K `State<Int>` get/put cycles, 10K `IO.print` calls. Pins #570 / #515 / #593 / #549 / #487 / #348 / #573 regression coverage |
7070
| `test_errors.py` | 52 | 525 | Error code registry, diagnostic formatting, serialisation, SourceLocation, error display sync (README/HTML/spec) |
7171
| `test_formatter.py` | 122 | 1,075 | Comment extraction, interior comment positioning, expression/declaration formatting, match arm block bodies, idempotency, parenthesization, spec rules, ability declarations |
7272
| `test_cli.py` | 217 | 3,021 | CLI commands (check, verify, compile, run, test, fmt, version, quiet), subprocess integration, JSON error paths, runtime traps, arg validation, multi-file resolution, IO exit codes, --explain-slots |

tests/test_codegen.py

Lines changed: 116 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2043,8 +2043,10 @@ def test_allocating_function_uses_gc_aware_tco_549(self) -> None:
20432043
boundary.
20442044

20452045
Pre-#549: the post-process reverted every `return_call` →
2046-
`call` when `ctx.needs_alloc` was True, sacrificing TCO
2047-
for shadow-stack correctness.
2046+
`call` when `ctx.needs_alloc` was True, sacrificing WASM
2047+
call-stack depth (tail recursion eventually trapped with
2048+
`call stack exhausted`) so the GC epilogue could run and
2049+
bound shadow-stack usage.
20482050

20492051
Post-#549: the post-process instead PREPENDS a `$gc_sp`
20502052
restore (`local.get $gc_sp_save; global.set $gc_sp`)
@@ -2161,6 +2163,110 @@ def test_allocating_function_uses_gc_aware_tco_549(self) -> None:
21612163
f"Body:\n{build_body}"
21622164
)
21632165

2166+
def test_allocating_function_gc_aware_tco_patches_both_branches(
2167+
self,
2168+
) -> None:
2169+
"""#549: every tail-position `return_call` gets the preamble.
2170+
2171+
The single-branch test above pins that the patch fires at
2172+
a single tail-recursive call site. This test pins that
2173+
the patch loop fires at MULTIPLE sites in the same
2174+
function — a buggy implementation that bails after the
2175+
first match (e.g. `break` inside the patch loop) or that
2176+
only handles top-level emissions but not if/else-nested
2177+
ones would still pass the single-branch test.
2178+
2179+
The function below uses a `match` with two ADT arms, each
2180+
ending in a tail-recursive `build` call. The analyzer
2181+
marks both arms as tail position (see
2182+
`test_analyzer_marks_match_arm_bodies`), so the codegen
2183+
emits two `return_call $build` sites with DIFFERENT
2184+
leading-whitespace prefixes (one for each match arm).
2185+
Both must have the `local.get N; global.set $gc_sp`
2186+
preamble; the local index N must be the same one captured
2187+
by the GC prologue.
2188+
"""
2189+
source = """\
2190+
private data Choice { Left, Right }
2191+
2192+
private fn build(@Int, @Choice -> @Array<Int>)
2193+
requires(@Int.0 >= 0) ensures(true) decreases(@Int.0) effects(pure)
2194+
{
2195+
if @Int.0 == 0 then { [0] }
2196+
else {
2197+
match @Choice.0 {
2198+
Left -> build(@Int.0 - 1, Left),
2199+
Right -> build(@Int.0 - 1, Right)
2200+
}
2201+
}
2202+
}
2203+
2204+
public fn f(-> @Int)
2205+
requires(true) ensures(true) effects(pure)
2206+
{
2207+
array_length(build(3, Left))
2208+
}
2209+
"""
2210+
result = _compile_ok(source)
2211+
build_match = re.search(r"\(func \$build\b", result.wat)
2212+
assert build_match is not None, (
2213+
f"Could not locate `(func $build` in WAT"
2214+
)
2215+
build_start = build_match.start()
2216+
next_fn = re.search(r"\(func \$", result.wat[build_start + 1:])
2217+
build_end = (
2218+
build_start + 1 + next_fn.start()
2219+
if next_fn is not None
2220+
else len(result.wat)
2221+
)
2222+
build_body = result.wat[build_start:build_end]
2223+
# Capture the exact $gc_sp_save local from the prologue.
2224+
lines = build_body.splitlines()
2225+
prologue_get_idx = next(
2226+
(i for i, ln in enumerate(lines)
2227+
if ln.strip() == "global.get $gc_sp"),
2228+
None,
2229+
)
2230+
assert prologue_get_idx is not None, (
2231+
f"no `global.get $gc_sp` prologue. Body:\n{build_body}"
2232+
)
2233+
prologue_set = lines[prologue_get_idx + 1].strip()
2234+
assert prologue_set.startswith("local.set ")
2235+
gc_sp_save_local = prologue_set[len("local.set "):]
2236+
expected_preamble_get = f"local.get {gc_sp_save_local}"
2237+
# Find every return_call site and require the same
2238+
# preamble at each. This catches a regression where the
2239+
# patch only fires on the first site (e.g. accidental
2240+
# `break`) or where only a subset of nested positions get
2241+
# the restore.
2242+
return_call_indices = [
2243+
i for i, line in enumerate(lines)
2244+
if re.search(r"return_call \$build\b", line)
2245+
]
2246+
assert len(return_call_indices) >= 2, (
2247+
f"Expected at least 2 return_call sites (one per "
2248+
f"match arm), got {len(return_call_indices)}. "
2249+
f"Body:\n{build_body}"
2250+
)
2251+
for idx in return_call_indices:
2252+
assert idx >= 2, (
2253+
f"return_call at line {idx} has no room for "
2254+
f"preamble. Body:\n{build_body}"
2255+
)
2256+
prev1 = lines[idx - 1].strip()
2257+
prev2 = lines[idx - 2].strip()
2258+
assert prev1 == "global.set $gc_sp", (
2259+
f"site {idx} missing `global.set $gc_sp`; got "
2260+
f"{prev1!r}. Body:\n{build_body}"
2261+
)
2262+
assert prev2 == expected_preamble_get, (
2263+
f"site {idx} preamble mismatch: expected "
2264+
f"{expected_preamble_get!r}, got {prev2!r}. "
2265+
f"Both return_call sites must reload the SAME "
2266+
f"local that the prologue saved. Body:\n"
2267+
f"{build_body}"
2268+
)
2269+
21642270
def test_allocating_function_with_postcondition_still_reverts(
21652271
self,
21662272
) -> None:
@@ -15208,6 +15314,14 @@ def test_shadow_stack_overflow_traps(self) -> None:
1520815314
trap class (e.g. heap exhaustion at a different scale).
1520915315
2. Behavioural — `_run_trap` confirms the program actually
1521015316
traps at the chosen 2000-iteration depth.
15317+
15318+
Iteration count calibration: the 16K shadow stack holds
15319+
~4096 pointer slots (4 bytes each). Each `overflow` frame
15320+
pushes 2 `@Array<Bool>` params (8 bytes) + 1 array_append
15321+
tmp root (4 bytes) = 12 bytes/frame, so the guard trips at
15322+
~1,365 frames. 2000 chosen with ~1.5× safety margin so
15323+
the trap fires reliably even if per-frame size shrinks by
15324+
a slot in a future optimisation.
1521115325
"""
1521215326
src = """
1521315327
private fn overflow(@Array<Bool>, @Array<Bool>, @Int -> @Array<Bool>)

tests/test_stress.py

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -244,12 +244,15 @@ def test_tco_with_allocation_1m_iterations(
244244
The high-volume companion to
245245
`test_deep_tail_recursion_with_allocating_arg`. A single
246246
million-iteration run gives the GC-aware TCO patch a sharp
247-
pin: any shadow-stack leak per iteration would either trap on
248-
the overflow guard (16K shadow stack / ~12 bytes per leaked
249-
frame1300 iterations) or — under eager-GC — accumulate
247+
pin: any shadow-stack leak per iteration would trap on the
248+
overflow guard (16K shadow stack / 4 bytes per leaked pointer
249+
root4,096 iterations) or — under eager-GC — accumulate
250250
enough roots to slow mark/sweep to a crawl. Completing 1M
251251
iterations in seconds in both modes proves the shadow stack
252-
is bounded across the entire run.
252+
is bounded across the entire run. (The per-iteration push
253+
is a single i32 pointer for the `Array<Int>` literal; the
254+
two `@Int` params are not pointer-typed and don't push to
255+
the shadow stack.)
253256
254257
The pre-#549 fallback (revert `return_call` → plain `call`
255258
for allocating fns) cannot reach this depth — 1M plain calls
@@ -260,8 +263,7 @@ def test_tco_with_allocation_1m_iterations(
260263
Iteration count is 1M in both default-GC and eager-GC modes
261264
because the per-iteration cost is dominated by WASM execution,
262265
not GC traversal (the array literal is small and short-lived,
263-
so each eager-GC collection is O(1) live roots). Wall-clock
264-
on a 2026 MacBook Pro: ~200ms in both modes.
266+
so each eager-GC collection is O(1) live roots).
265267
"""
266268
src = """
267269
private fn loop(@Int, @Int -> @Int)

0 commit comments

Comments
 (0)