v0.0.126: WASM tail-call optimization (#517)#550
Conversation
Closes #517. Spawns #549 for the GC-aware-TCO follow-up. Pre-fix, every Vera `call` site emitted a plain WASM `call` regardless of tail-position status, so a tail-recursive function pushed one WASM frame per iteration and trapped with `call stack exhausted` at ~tens of thousands of frames. The documented "iteration is tail recursion" idiom from SKILL.md silently failed past ~5-10K iterations. Fix is a per-fn analyzer (vera/codegen/tail_position.py) that marks id(FnCall) AST nodes in syntactic tail position; _translate_call emits `return_call $foo` instead of `call $foo` when the call's id is in the marked set AND the callee's WASM return type matches the caller's (required for WASM return_call semantics — the signature must match). Tail-position analyzer rules: * Body trailing expression IS in tail position * If parent is tail and is IfExpr: both branches tail (cond NOT) * If parent is tail and is MatchExpr: every arm body tail (scrutinee NOT) * If parent is tail and is Block: trailing expr tail (statements NOT) * All other constructs: NOT tail-transparent Allocating-function fallback (#549 follow-up): * WASM return_call discards the current frame, which means the GC epilogue (restore $gc_sp, unwind shadow stack) never runs. For an allocating function, that leaks shadow-stack slots once per iteration and would eventually trap on the next $alloc once gc_sp passes the worklist boundary — strictly WORSE than the pre-fix "stack exhausted" trap. * Post-process at the end of _compile_fn reverts every return_call → call when ctx.needs_alloc is True. Allocating functions pay the WASM frame cost in exchange for correct shadow-stack management until #549 ships GC-aware tail calls. Verified: * 50K count_down (issue's canonical reproducer) returns 0 cleanly * 1M count_down also returns 0 cleanly (constant stack space) * Test fixtures in tests/test_runtime_traps.py updated to keep trap-bearing calls in NON-tail position via let-binding — otherwise TCO discards main's frame and the existing backtrace assertions wouldn't see main in the resolved chain. Tests: 9 new in TestTailCallOptimization517 (50K + 1M behavioural, return_call structural assertion, non-tail plain-call assertion, allocating-function fallback, 4 analyzer unit tests for each tail-transparent construct). Documentation: * stack_exhausted Fix paragraph rewritten to reflect post-fix reality ("Vera compiles tail-position calls to return_call ... if you're still hitting this trap the recursion isn't actually in tail position; restructure with an accumulator parameter"). * KNOWN_ISSUES: #517 row removed; new #549 row tracking the GC-aware TCO follow-up for allocating functions. * ROADMAP: #517 dropped from queue (intro updated, eight remain in campaign), priority rows renumbered. * HISTORY: terse v0.0.126 row per the hardened memory rules ("Tail-recursive iteration runs in constant stack space" ([#517])). 3,613 tests pass (was 3,604; 9 new) + 14 skipped. mypy clean. 81 conformance + 33 examples pass. uv.lock regenerated. Co-Authored-By: Claude <noreply@anthropic.invalid>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughImplements AST-based tail-position detection and conditional WASM Changes
Sequence DiagramsequenceDiagram
participant Compiler as Vera Compiler
participant Analyzer as Tail-Position Analyzer
participant Context as WASM Context
participant CallTx as Call Translator
participant WasmGen as WASM Generator
Compiler->>Analyzer: analyze_function(ast.FnDecl)
Analyzer->>Analyzer: compute_tail_call_sites()
Analyzer-->>Context: set_tail_call_context(sites, self_ret_wt)
Compiler->>CallTx: translate_function_body()
CallTx->>Context: query tail_call_sites & self_ret_wt
alt Tail position && type match
CallTx->>WasmGen: emit return_call $target
else
CallTx->>WasmGen: emit call $target
end
alt Function requires post-body work or allocs
WasmGen->>WasmGen: post-process revert return_call -> call
end
WasmGen-->>Compiler: emit final WASM
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #550 +/- ##
==========================================
+ Coverage 91.04% 91.06% +0.01%
==========================================
Files 58 59 +1
Lines 22139 22178 +39
Branches 259 259
==========================================
+ Hits 20157 20196 +39
Misses 1975 1975
Partials 7 7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@TESTING.md`:
- Line 9: The table row in TESTING.md shows inconsistent counts: update the
Tests row so the total and the breakdown match the canonical pytest collected
count (use the collected-count source you run to produce the numbers), e.g.,
recompute or fetch the collected total and ensure "X passed + Y skipped" equals
the reported total; edit the row that currently reads "| **Tests** | 3,613
across 29 files (~34,500 lines of test code; 3,590 passed, 14 skipped) |" to the
corrected numbers so the aggregate and breakdown are synchronized.
In `@tests/test_codegen.py`:
- Around line 1960-1984: The test test_analyzer_does_not_mark_call_args is too
weak because it only checks len(sites); instead, after computing sites =
compute_tail_call_sites(f_decl) locate the specific call IDs for the outer call
(inner(...)) and the inner argument call (arg_producer()) from f_decl (or the
parsed AST) and assert that the sites set/list contains the outer call ID and
does NOT contain the argument call ID; update the assertions to explicitly check
membership of those call identifiers so the test fails if the wrong call was
marked.
- Around line 1799-1848: The test currently only ensures `caller` does not use
`return_call $count_down` but doesn't assert that the recursive site in
`count_down` still emits `return_call`; update the test to also locate the
`(func $count_down` body (similar to how `caller_body` is extracted from
result.wat) and assert that `"return_call $count_down"` appears there, e.g. find
`f_body_start = result.wat.find("(func $count_down")`, extract until the next
`"(func "` (or end) and add `assert "return_call $count_down" in
count_down_body` so the optimized recursive call is positively checked while
keeping the existing `caller` assertions.
In `@vera/codegen/functions.py`:
- Around line 187-211: The current tail-call rewrite only runs when
ctx.needs_alloc, which leaves surviving "return_call" in body_instrs that can
exit before post_instrs/ensures() run; update the condition that triggers the
replacement so it also runs when the function has a non-empty postcondition
epilogue (i.e., when post_instrs is non-empty or a postcondition flag on ctx is
set). Concretely, change the check around the body_instrs rewrite (the block
that sets self._needs_alloc/self._needs_memory and replaces "return_call " with
"call ") to run when ctx.needs_alloc || post_instrs, or equivalent (or add
ctx.has_postconditions), so all "return_call" occurrences are
suppressed/replaced whenever post_instrs/ensures() must run.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: c45cc4d7-b509-4c2c-867f-6447d9d95140
⛔ Files ignored due to path filters (5)
docs/index.htmlis excluded by!docs/**docs/index.mdis excluded by!docs/**docs/llms-full.txtis excluded by!docs/**docs/llms.txtis excluded by!docs/**uv.lockis excluded by!**/*.lock,!uv.lock
📒 Files selected for processing (15)
CHANGELOG.mdHISTORY.mdKNOWN_ISSUES.mdREADME.mdROADMAP.mdTESTING.mdpyproject.tomltests/test_codegen.pytests/test_runtime_traps.pyvera/__init__.pyvera/codegen/api.pyvera/codegen/functions.pyvera/codegen/tail_position.pyvera/wasm/calls.pyvera/wasm/context.py
…est strengthening
Four CodeRabbit findings actioned. The first is a real Major-
severity correctness bug that would have shipped as a soundness
hole; the other three are test/doc quality improvements.
* vera/codegen/functions.py: postcondition-aware tail-call
fallback. Pre-fix, body_instrs revert (return_call -> call) was
gated only on ctx.needs_alloc. But _compile_postconditions
also emits instructions AFTER body_instrs (local.set $ret,
condition check, trap on failure, local.get $ret to push back),
and WASM return_call discards the current frame — so any
function with a non-trivial ensures clause AND a tail-recursive
call would silently skip the postcondition check on every
iteration, violating the contract. Moved the revert to AFTER
_compile_postconditions runs and added the post_instrs check:
now reverts return_call when EITHER ctx.needs_alloc OR
post_instrs is non-empty.
This was a soundness hole — the v0.0.126 PR as it stood would
have shipped a runtime where ensures clauses on tail-recursive
functions silently passed regardless of the actual return
value. Caught by CodeRabbit before merge.
* tests/test_codegen.py: new
test_postcondition_function_falls_back_to_plain_call.
Constructs a tail-recursive function with a non-trivial
ensures clause (`@Nat.result >= 0`) and asserts that the
recursive call site emits plain `call`, not `return_call`.
Pins the new fallback condition so a future revert of the
postcondition check would fail this test loudly rather than
silently regressing the soundness.
* tests/test_codegen.py: strengthened
test_no_return_call_for_non_tail_position. Was asserting only
the negative (caller's let-bound call is NOT return_call).
Added the positive sibling assertion: count_down's recursive
call IS return_call. Without the positive check, a buggy
analyzer that marked NOTHING would silently pass the negative
assertion while regressing the actual TCO behaviour — both
branches need explicit pinning.
* tests/test_codegen.py: strengthened
test_analyzer_does_not_mark_call_args. Was asserting only
len(sites) == 1. Now locates the specific FnCall AST nodes
(outer `inner(...)` and inner-arg `arg_producer()`) by
walking the parsed AST, asserts id(outer) IN sites AND
id(arg) NOT IN sites, and pins the exact set
(`sites == {id(outer)}`). A buggy analyzer that marked the
WRONG call would have passed the old `len == 1` assertion
while regressing the analyzer's tail-position detection.
* TESTING.md: corrected the Tests row from 3,613 / 3,590
passed / 14 skipped to 3,614 / 3,600 passed / 14 skipped.
3,590 was the previous release's passed count; the v0.0.126
PR added 9 new tests (now 10 with the postcondition test
added in this round) but I forgot to update the breakdown in
the same commit. Aggregate and breakdown are now in sync
and pytest-collected total matches.
3,600 tests pass + 14 skipped (was 3,599 + 14, +1 for the new
postcondition test). mypy clean.
Co-Authored-By: Claude <noreply@anthropic.invalid>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@TESTING.md`:
- Around line 58-59: Update the `test_codegen.py` table row that documents
"**WASM tail-call optimization** (`#517` — `return_call` emission for
tail-position calls, 1M-iteration stress...)" to accurately reflect both stress
tests by changing the phrase to mention "50K- and 1M-iteration stress" (or
similar), so the cell for `test_codegen.py` lists both the 50K and 1M iteration
behavioural stress tests for the tail-call optimization entry.
In `@tests/test_codegen.py`:
- Around line 1984-2091: Add a test that exercises tail-position propagation
through MatchExpr: in tests/test_codegen.py create a small function whose body
is a MatchExpr with multiple arms where each arm's expression is a call (and one
arm contains a nested non-tail call as an argument or let-like binding), then
call vera.codegen.tail_position.compute_tail_call_sites on that function decl
and assert that only the top-level call expressions that are the arm bodies are
present in the returned sites set (use id(...) comparisons like other tests to
ensure the inner argument call or any pattern-binding calls are NOT marked).
This targets the MatchExpr handling in compute_tail_call_sites / visit_tail for
MatchExpr and mirrors the style of existing tests (use parse_to_ast, inspect
f_decl.body.expr to locate calls).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: d35179fa-8472-4d48-a6dd-92125f5ff4e9
📒 Files selected for processing (6)
HISTORY.mdREADME.mdROADMAP.mdTESTING.mdtests/test_codegen.pyvera/codegen/functions.py
Both CodeRabbit findings actioned.
* TESTING.md test_codegen.py row description updated to mention
both stress tests: "50K- and 1M-iteration stress" instead of
just "1M-iteration stress". The 50K test is the issue's
canonical reproducer (count_down(50000)) and the 1M test pins
constant-stack-space behaviour on top. Both ship; the
description now reflects both. Also expanded the analyzer
unit-test list inline ("Block-trailing / IfExpr-both-branches
/ MatchExpr-arm-bodies / let-value-NOT-marked /
call-args-NOT-marked") so the test inventory matches the
shipped contract per CodeRabbit's pattern from previous PRs.
* tests/test_codegen.py: new test_analyzer_marks_match_arm_bodies.
Pre-fix, the analyzer (vera/codegen/tail_position.py) had
MatchExpr handling but no explicit test pinning the behaviour;
a regression that dropped or mis-handled the MatchExpr case
would have slipped past CI silently. New test constructs a
function whose body is a match with two arms — one arm wraps
its tail call around a non-tail argument call — and asserts:
- Both arm-body calls (the outer arm_handler(...) on each arm)
are in sites.
- The nested argument call (arg_producer(())) inside the None
arm is NOT in sites.
- sites is exactly {id(none_arm_call), id(some_arm_call)} —
exhaustive cardinality + identity check.
Same exhaustiveness pattern as the round-1 strengthening of
test_analyzer_does_not_mark_call_args (assert specific ids,
not just count).
3,601 tests pass (was 3,600; +1 for the MatchExpr analyzer test)
+ 14 skipped. mypy clean.
Co-Authored-By: Claude <noreply@anthropic.invalid>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_codegen.py`:
- Around line 1930-2111: Add negative unit tests that assert
compute_tail_call_sites does NOT mark calls in non-tail roots: (1) a Block with
a leading statement that is a FnCall (not the trailing expr), (2) an IfExpr
where the condition contains a FnCall, and (3) a MatchExpr whose scrutinee
contains a FnCall. For each test (e.g.
test_analyzer_does_not_mark_call_in_block_statement,
test_analyzer_does_not_mark_call_in_if_condition,
test_analyzer_does_not_mark_call_in_match_scrutinee) parse a small program with
parse_to_ast, call compute_tail_call_sites on the function decl, locate the
specific ast.FnCall node(s) and assert their id() is NOT in the returned sites
set; mirror existing test patterns and imports used in tests/test_codegen.py to
keep style consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7e9379e1-2147-4dc3-9dd1-3fc8ee454936
📒 Files selected for processing (5)
HISTORY.mdREADME.mdROADMAP.mdTESTING.mdtests/test_codegen.py
CodeRabbit's single finding actioned with three new tests covering the non-tail roots inside tail-transparent constructs. Each test mirrors the existing analyzer-test pattern (parse_to_ast, walk the AST to locate the specific FnCall, assert id() NOT in sites, plus an exhaustive sites == set() check). * test_analyzer_does_not_mark_call_in_block_statement — pins ExprStmt-position calls (the LetStmt.value case is already covered by test_analyzer_does_not_mark_let_value_calls). A Block with a leading ExprStmt(FnCall(...)) and a trailing IntLit must mark NOTHING — statements are not tail-transparent. * test_analyzer_does_not_mark_call_in_if_condition — pins IfExpr.condition position. An if-expression's condition is evaluated and consumed before either branch runs, so a call there is non-tail regardless of whether the if-expression itself is in tail position. * test_analyzer_does_not_mark_call_in_match_scrutinee — pins MatchExpr.scrutinee position. Same reasoning: the scrutinee is evaluated and consumed by the match-dispatch (constructor tag check + field binding) before any arm runs. Each test would catch a regression where the analyzer started visiting non-transparent positions with the parent's tail status — which would silently emit return_call in those positions, discard the current frame, and skip the work that should have consumed the call's result (the if-dispatch, the match-dispatch, or the trailing block expression). The TCO test inventory now covers every transparent / non-transparent position individually: * Transparent (tail propagates): Block trailing expr, IfExpr branches, MatchExpr arm bodies. * Non-transparent (tail does NOT propagate): LetStmt.value, ExprStmt.expr, IfExpr.condition, MatchExpr.scrutinee, FnCall.args. 3,604 tests pass + 14 skipped (was 3,601, +3 for these negative tests). mypy clean. Co-Authored-By: Claude <noreply@anthropic.invalid>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@TESTING.md`:
- Line 58: Update the TESTING.md entry describing test_codegen.py to explicitly
list the three missing negative analyzer unit tests for non-tail-transparent
positions by adding "ExprStmt.expr (calls inside a Block — NOT marked),
IfExpr.condition (NOT marked), MatchExpr.scrutinee (NOT marked)" to the existing
parenthetical list after the positive/negative examples (the clause referencing
Block-trailing / IfExpr-both-branches / MatchExpr-arm-bodies /
let-value-NOT-marked / call-args-NOT-marked), so the documentation exactly
matches the shipped test inventory for `#517`.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 318320cb-ce1f-40fd-ad87-3afa220a4c42
📒 Files selected for processing (5)
HISTORY.mdREADME.mdROADMAP.mdTESTING.mdtests/test_codegen.py
Single CodeRabbit finding actioned. TESTING.md test_codegen.py row description now lists the three round-3 negative analyzer tests explicitly: * ExprStmt-statement-NOT-marked * IfExpr-condition-NOT-marked * MatchExpr-scrutinee-NOT-marked Appended to the existing analyzer-unit-test inventory after the positive coverage (Block-trailing / IfExpr-both-branches / MatchExpr-arm-bodies) and the previously-listed negative cases (let-value-NOT-marked / call-args-NOT-marked). The inventory now exhaustively names every position the analyzer visits or skips — matches the shipped test contract: * Transparent (tail propagates inward, marked tests): Block-trailing, IfExpr-both-branches, MatchExpr-arm-bodies. * Non-transparent (tail does NOT propagate, NOT-marked tests): let-value, call-args, ExprStmt-statement, IfExpr-condition, MatchExpr-scrutinee. Co-Authored-By: Claude <noreply@anthropic.invalid>
Closes #517. Spawns #549 for the GC-aware-TCO follow-up.
Before / after
Before (v0.0.125):
After (v0.0.126):
1M iterations also runs to completion in <1 second.
Implementation
A per-fn analyzer (
vera/codegen/tail_position.py) marksid(FnCall)AST nodes in syntactic tail position._translate_callemitsreturn_call $fooinstead ofcall $foowhen the call's id is in the marked set AND the callee's WASM return type matches the caller's (required for WASMreturn_callsemantics — the signature must match).Tail-position analyzer rules (recursive on the function body):
IfExpr: both branch bodies tail; condition NOT.MatchExpr: every arm body tail; scrutinee NOT.Block: trailing expression tail; statements NOT.assert/assume,handlebodies,AnonFn, indexing) are NOT tail-transparent — calls inside them are not in tail position.GC-aware fallback for allocating functions
WASM
return_calldiscards the current frame, which means the GC epilogue (restore$gc_sp, unwind shadow-stack pointer slots) never runs. For an allocating function with tail calls, that leaks shadow-stack slots once per iteration and would eventually trap on the next$alloconcegc_sppasses the worklist boundary — strictly worse than the pre-fix "stack exhausted" trap (silent corruption rather than a clean trap).Until full GC-aware tail-call support lands (#549 tracks the follow-up), allocating functions revert
return_call→callin a post-process step at the end of_compile_fn. They pay the WASM frame cost in exchange for correct shadow-stack management. Non-allocating functions (the common iteration-style tail recursion case) keep the optimization.Tests
New
TestTailCallOptimization517(9 tests):return_call $count_downappears in WAT for the recursive callcall, notreturn_callcallExisting fixtures in
tests/test_runtime_traps.pyupdated for the TCO interaction: the_DIVIDE_BY_ZERO_USER_FN,_CONTRACT_VIOLATION_PROGRAM, and_DIVZERO_FOR_FIXtest programs originally hadmaincalling the trapping function in tail position, which #517 would now optimize away — discardingmain's frame and shortening the backtrace assertions expect to see. The fixtures now bind the call result withletand produce it via slot reference, keeping the call non-tail and preservingmain's frame on the WASM call stack. Comments document the intentional non-tail shape so a future contributor doesn't "simplify" them back into tail position.stack_exhaustedFix paragraph rewrittenPre-rewrite (v0.0.125):
Post-rewrite (this PR):
Documentation
Test plan
pytest tests/ -q— 3,599 passed, 14 skipped (was 3,590 + 14)mypy vera/— cleanpython scripts/check_conformance.py— all 81 passpython scripts/check_examples.py— all 33 passpython scripts/check_doc_counts.py— consistent (3,613 tests)python scripts/check_version_sync.py— version 0.0.126 consistentcount_downiterations both succeedCo-Authored-By: Claude noreply@anthropic.invalid
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Documentation