Fix #549: GC-aware tail-call optimization for allocating functions by aallan · Pull Request #671 · aallan/vera

aallan · 2026-05-13T17:48:28Z

Summary

Replaces the "allocating fn ⇒ revert return_call → call" fallback with a GC-aware patch. The post-process in vera/codegen/functions.py::_compile_fn now PREPENDS local.get $gc_sp_save; global.set $gc_sp immediately before each return_call site in an allocating fn, restoring the shadow-stack pointer to the caller's entry baseline so the callee saves a clean new baseline. Per-iteration shadow-stack usage stays bounded at caller's entry + n_arg_roots regardless of iteration count.
Args transfer atomically. At the return_call site the args are already on the WASM operand stack; the two-instruction restore touches only the $gc_sp global, not the operand stack, so the patch is non-interfering — return_call still consumes args and atomically transfers control.
Postcondition revert preserved. Functions with a non-trivial runtime postcondition still revert return_call → plain call (the post-check needs to run after each call; return_call would skip it). Dispatch precedence: post_instrs revert > needs_alloc GC-aware patch > untouched.
Closes the last open bug in KNOWN_ISSUES.md bug-tracker section apart from Conservative GC scan can spuriously retain heap objects via wrapper handle field (#573 latent) #578 — the bug list will be empty once Conservative GC scan can spuriously retain heap objects via wrapper handle field (#573 latent) #578 lands.

Test plan

Behavioural impact

Pre-#549, an allocating tail-recursive function like:

private fn loop(@Int, @Int -> @Int)
  requires(@Int.1 >= 0) ensures(true) decreases(@Int.1) effects(pure)
{
  if @Int.1 == 0 then { @Int.0 }
  else {
    let @Array<Int> = [@Int.0, @Int.1];
    loop(@Int.1 - 1, @Int.0 + array_length(@Array<Int>.0))
  }
}

trapped with stack_exhausted at ~30K iterations (the WASM call-stack limit) because the post-process reverted return_call → plain call. Post-#549 the same function runs in constant stack space at 1M+ iterations — the per-iteration $gc_sp restore keeps the shadow stack flat across the entire run.

Test changes

tests/test_codegen.py::TestTailCallOptimization517 — renamed test_allocating_function_falls_back_to_plain_call to test_allocating_function_uses_gc_aware_tco_549 and inverted its assertions: it now verifies return_call $foo is emitted AND every such site is preceded by local.get <N>; global.set $gc_sp. Added a sibling test test_allocating_function_with_postcondition_still_reverts to pin the postcondition-revert precedence.
tests/test_codegen.py::TestGCShadowStackOverflow::test_shadow_stack_overflow_traps — rewritten to use a non-tail-recursive shape (recursive call wrapped in array_append) so the overflow guard still genuinely trips post-GC-aware tail-call optimization for allocating functions #549.
tests/test_stress.py::test_deep_tail_recursion_with_allocating_arg — body switched from string-pool literals (didn't actually set needs_alloc) to genuine array allocation, so the test now actually exercises GC-aware tail-call optimization for allocating functions #549's path.
tests/test_stress.py::test_tco_with_allocation_1m_iterations — new 1M-iteration companion parametrised over default-GC and eager-GC modes (~190ms wall-clock in both).

Doc sweep

KNOWN_ISSUES.md — removed the GC-aware tail-call optimization for allocating functions #549 bug row.
SKILL.md (+ regenerated docs/SKILL.md) — removed the "tail-call optimization disabled for allocating functions" row from the bug table; updated the narrative paragraph below it to describe both Tail-call optimization missing: deep recursion blows the WASM call stack #517 and GC-aware tail-call optimization for allocating functions #549 as live optimisations.
vera/wasm/calls.py — comment near _translate_call updated to describe both branches of the post-process (GC-aware patch vs postcondition revert).
vera/codegen/api.py — the stack_exhausted runtime trap-fix message updated: drops the "allocating functions are an exception" paragraph, adds a brief mention of the postcondition-revert as the remaining exception.
Version bumped to v0.0.154 across pyproject.toml, vera/__init__.py, uv.lock, docs/index.html, README.md.
CHANGELOG.md v0.0.154 entry added with the design rationale, plus HISTORY.md row.
TESTING.md, ROADMAP.md, README.md — test counts updated to 3,892 (was 3,889).

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Restored GC‑aware tail‑call optimisation for allocating functions so shadow‑stack/GC state is preserved across recursive tail calls while retaining correct postcondition behaviour.
Tests
- Added and expanded regression and stress suites (including a 1,000,000‑iteration stress case) to validate GC‑aware TCO, overflow/trap behaviour and updated stress parametrisation and expectations.
Documentation
- Bumped project version; updated changelogs, README, HISTORY, ROADMAP, TESTING and runtime guidance; removed the resolved known‑issue entry.

Pre-fix, WASM return_call reverted to plain call whenever a function had needs_alloc=True, because return_call discards the current frame and would skip the GC epilogue, leaking shadow-stack slots per iteration. This forced agents to restructure allocating tail- recursive code into array_fold/array_map shapes or to hoist allocations outside the recursion. The fix preserves TCO AND the shadow-stack invariant: instead of reverting, the post-process now PREPENDS a two-instruction $gc_sp restore (local.get $gc_sp_save; global.set $gc_sp) immediately before each return_call in an allocating function. Args are already on the WASM operand stack at the return_call site; the restore only touches the $gc_sp global, so args transfer atomically to the callee. The callee's prologue saves a clean new $gc_sp baseline, so per-iteration shadow-stack usage stays bounded at caller's entry + n_arg_roots regardless of iteration count. Postcondition-bearing functions still revert to plain call — the runtime postcondition check needs to run after each call, and return_call would skip it. Dispatch precedence: postcondition- revert > GC-aware-TCO-patch > untouched. Tests: - Updated TestTailCallOptimization517 — renamed the falls-back test to test_allocating_function_uses_gc_aware_tco_549 and inverted assertions; added test_allocating_function_with_ postcondition_still_reverts for precedence. - Updated TestGCShadowStackOverflow::test_shadow_stack_overflow_ traps to use a non-tail-recursive shape that still genuinely overflows post-#549. - tests/test_stress.py — switched the existing 1K-iter deep-tail test from string-pool literals (didn't actually set needs_alloc) to genuine array allocation; added a 1M-iter companion parametrised over default/eager-GC. Closes #549. Co-Authored-By: Claude <noreply@anthropic.invalid>

coderabbitai · 2026-05-13T17:49:33Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1002873d-90a3-4ce5-93df-425f921891f2

📥 Commits

Reviewing files that changed from the base of the PR and between 50f74ba and ae573d3.

⛔ Files ignored due to path filters (6)

docs/SKILL.md is excluded by !docs/**
docs/index.html is excluded by !docs/**
docs/index.md is excluded by !docs/**
docs/llms-full.txt is excluded by !docs/**
docs/llms.txt is excluded by !docs/**
uv.lock is excluded by !**/*.lock, !uv.lock

📒 Files selected for processing (14)

CHANGELOG.md
HISTORY.md
KNOWN_ISSUES.md
README.md
ROADMAP.md
SKILL.md
TESTING.md
pyproject.toml
tests/test_codegen.py
tests/test_stress.py
vera/__init__.py
vera/codegen/api.py
vera/codegen/functions.py
vera/wasm/calls.py

💤 Files with no reviewable changes (1)

KNOWN_ISSUES.md

📝 Walkthrough

Walkthrough

This PR implements issue #549: the compiler now preserves wasm return_call for allocating functions by injecting a $gc_sp restore before each tail jump; functions with runtime postconditions still revert to call. Tests, stress suites, docs, and package version updated to v0.0.154.

Changes

GC-aware allocating tail-call optimization

Layer / File(s)	Summary
Core compiler implementation `vera/codegen/functions.py`, `vera/wasm/calls.py`, `vera/codegen/api.py`, `vera/__init__.py`, `pyproject.toml`	`_compile_fn` now distinguishes postcondition vs allocation handling: when `post_instrs` is present, `return_call` is rewritten to `call`; when allocating with no postconditions, `gc_sp_save` local is preallocated and each `return_call` site is patched to restore `$gc_sp` via `local.get <gc_sp_save>; global.set $gc_sp` immediately before the tail transfer. Supporting comments and trap guidance updated; package/module version bumped to v0.0.154.
Allocating tail-call codegen test coverage `tests/test_codegen.py`	Replaced allocating-tail-call test with `test_allocating_function_uses_gc_aware_tco_549` that parses emitted WAT, extracts the saved-local index, and verifies every `return_call $build` is immediately preceded by `local.get <same_saved_local>; global.set $gc_sp`; added companion test asserting `ensures` causes `return_call`→`call` and absence of the immediate preamble; overflow regression adjusted to non-tail recursion and two-step WAT+runtime checks.
Stress tests for GC-aware TCO `tests/test_stress.py`	Updated `test_deep_tail_recursion_with_allocating_arg` to allocate `Array<Int>` per iteration and adjusted expected assertion; added `test_tco_with_allocation_1m_iterations` (gated by `EAGER_GC_PARAMS`) running a 1,000,000-iteration allocating-array tail loop to validate bounded shadow-stack behaviour.
Documentation and version updates `CHANGELOG.md`, `HISTORY.md`, `KNOWN_ISSUES.md`, `README.md`, `ROADMAP.md`, `SKILL.md`, `TESTING.md`	Top-level changelog/HISTORY entries added for v0.0.154; removed `#549` from KNOWN_ISSUES; updated README/ROADMAP/TESTING/SKILL to document GC-aware TCO, test inventory changes, trap diagnostic revisions, and updated compare link.

Sequence Diagram(s)

sequenceDiagram
  participant _compile_fn as _compile_fn
  participant GCPrologue as GC_Prologue
  participant body_instrs as body_instrs
  participant ReturnPatcher as ReturnPatcher
  _compile_fn->>GCPrologue: ensure gc_sp_save local allocated
  _compile_fn->>body_instrs: emit function body (may contain return_call)
  body_instrs->>ReturnPatcher: locate each return_call site
  ReturnPatcher->>body_instrs: inject "local.get gc_sp_save"
  ReturnPatcher->>body_instrs: inject "global.set $gc_sp"
  ReturnPatcher->>body_instrs: or rewrite to "call" when post_instrs present

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

GC-aware tail-call optimization for allocating functions #549: Core issue being fixed—enables GC-aware tail-call optimization for allocating functions by restoring $gc_sp before return_call instead of reverting to plain call.

Possibly related PRs

aallan/vera#669: Overlaps stress-harness changes for deep tail-recursion with allocation.
aallan/vera#550: Prior tail-call post-processing work related to return_call/call rewriting.
aallan/vera#473: Related GC shadow-stack overflow test changes and coverage.

Suggested labels

compiler, tests, docs

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and accurately describes the main change: GC-aware tail-call optimization now applies to allocating functions, fixing issue `#549`. It is concise, specific, and reflects the primary technical objective.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix-549-tco-allocating-fns

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-13T17:50:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.02%. Comparing base (50f74ba) to head (ae573d3).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #671   +/-   ##
=======================================
  Coverage   91.02%   91.02%           
=======================================
  Files          60       60           
  Lines       23398    23409   +11     
  Branches      259      259           
=======================================
+ Hits        21297    21308   +11     
  Misses       2094     2094           
  Partials        7        7

Flag	Coverage Δ
javascript	`57.36% <ø> (ø)`
python	`94.77% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

The #549 post-process inserts `local.get $gc_sp_save; global.set $gc_sp` immediately before each `return_call` site in an allocating function. Pre-fix the two inserted lines were appended as raw strings with no leading whitespace, so the rendered WAT looked misaligned when the `return_call` was nested inside an `if/else` (operators.py prepends 2 spaces to every instruction inside a then/else arm; functions.py later prepends 4 more at the body-level stringify step — so a nested return_call carries 6 leading spaces while my inserted lines only got 4). Functionally inert — WAT parses by tokens, not whitespace — but visually misleading for anyone reading the WAT for debugging. Fix: copy the leading whitespace from the original return_call line and apply it to both inserted lines so the restore sequence visually nests at the same depth as the return_call it precedes. Co-Authored-By: Claude <noreply@anthropic.invalid>

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/test_codegen.py (1)
15089-15136: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don't let _run_trap() hide the wrong failure mode.

Line 15136 now proves only that something trapped. If this shape regressed into a plain WASM stack overflow or a different allocator trap, the test would still pass without ever exercising the shadow-stack overflow guard it documents. Please pair the runtime trap with a structural assertion against the emitted overflow-check sequence in overflow's WAT.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_codegen.py` around lines 15089 - 15136, The test currently only
calls _run_trap(src) which only verifies that "something" trapped; update the
test to also inspect the emitted WAT for the function overflow and assert the
specific overflow-check instruction sequence is present so we know the
shadow-stack overflow guard was generated. Concretely, after
generating/compiling src (the same code used by _run_trap), extract the WAT or
disassembly for the function named "overflow" and assert it contains the
expected overflow-check pattern (e.g. the shadow-stack load/compare/branch
sequence your backend emits), then keep the runtime _run_trap() call to ensure
the trap actually fires. Reference the test helper _run_trap and the function
symbol overflow when locating where to add the WAT/disassembly assertion.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@HISTORY.md`:
- Line 301: The HISTORY.md footer still reads "153 tagged releases" after adding
the v0.0.154 entry; update that roll-up from 153 to 154 and then update the
matching release count in README.md (the README release figure that displays the
total tagged releases) so both files remain synchronized with the new v0.0.154
entry.

In `@tests/test_codegen.py`:
- Around line 2169-2175: The test only checks that "return_call $build" is not
present but doesn't ensure the GC-restore preamble (e.g. injected local.get ...;
global.set $gc_sp) was rejected, so tighten the assertion by verifying the
preamble isn't present; update the test around the build function/body to also
assert that sequences like "local.get" followed by "global.set $gc_sp" (the
GC-restore preamble) do not appear in build_body, and keep the existing checks
for "call $build" and absence of "return_call $build" so the test ensures
postcondition-bearing functions cannot contain the GC-restore preamble or a
return_call.
- Around line 2097-2122: The test currently only checks that two lines before
each "return_call $build" is a "local.get ..." but doesn't verify it matches the
local saved by the prologue; update the assertion to capture the exact saved
local from the prologue (e.g. parse the prologue to find the "local.set" or
local index used to store the GC baseline, call it the $gc_sp_save local) and
require that the line two lines before each "return_call $build" equals that
exact "local.get <N>" string, while still asserting the immediate previous line
is "global.set $gc_sp"; use the existing variables build_body, prologue,
return_call $build, global.set $gc_sp and local.get to locate and compare the
exact saved local.

---

Outside diff comments:
In `@tests/test_codegen.py`:
- Around line 15089-15136: The test currently only calls _run_trap(src) which
only verifies that "something" trapped; update the test to also inspect the
emitted WAT for the function overflow and assert the specific overflow-check
instruction sequence is present so we know the shadow-stack overflow guard was
generated. Concretely, after generating/compiling src (the same code used by
_run_trap), extract the WAT or disassembly for the function named "overflow" and
assert it contains the expected overflow-check pattern (e.g. the shadow-stack
load/compare/branch sequence your backend emits), then keep the runtime
_run_trap() call to ensure the trap actually fires. Reference the test helper
_run_trap and the function symbol overflow when locating where to add the
WAT/disassembly assertion.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2f2e1fea-10db-427b-8b62-5e4d08d1a865

📥 Commits

Reviewing files that changed from the base of the PR and between 50f74ba and 1df1e8b.

⛔ Files ignored due to path filters (6)

docs/SKILL.md is excluded by !docs/**
docs/index.html is excluded by !docs/**
docs/index.md is excluded by !docs/**
docs/llms-full.txt is excluded by !docs/**
docs/llms.txt is excluded by !docs/**
uv.lock is excluded by !**/*.lock, !uv.lock

📒 Files selected for processing (14)

CHANGELOG.md
HISTORY.md
KNOWN_ISSUES.md
README.md
ROADMAP.md
SKILL.md
TESTING.md
pyproject.toml
tests/test_codegen.py
tests/test_stress.py
vera/__init__.py
vera/codegen/api.py
vera/codegen/functions.py
vera/wasm/calls.py

💤 Files with no reviewable changes (1)

KNOWN_ISSUES.md

Surfaced during #549 review when the user pointed out that the compiler currently stamps WAT indentation onto instruction strings inline at ~14 emission sites scattered across vera/wasm/ and vera/codegen/. This is implicit-by-convention: any new emission site that doesn't know about it produces visually-misaligned WAT (as #549's GC-aware TCO post-process did initially. Vera source has vera fmt because consistent presentation matters for a debuggable surface. Same principle applies to compiler- emitted WAT — the primary surface agents read when debugging codegen bugs. Logged as #672 for a single-pass formatter that walks s-expression and control-flow depth and applies canonical indentation. Co-Authored-By: Claude <noreply@anthropic.invalid> EOF )

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@vera/codegen/functions.py`:
- Line 406: The code currently uses an `assert gc_sp_save is not None` guard
which can be stripped under `-O`; replace this assertion with an explicit
invariant failure by checking `if gc_sp_save is None:` and calling the
module/class `_error()` method with a descriptive message (e.g. "internal
compiler error: gc_sp_save is None") so the failure is always raised; update the
code paths around the `gc_sp_save` reference (the `gc_sp_save` variable usage in
functions.py) to use this explicit check instead of `assert`.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 68cd9def-74d0-42d9-8968-146833da7393

📥 Commits

Reviewing files that changed from the base of the PR and between 1df1e8b and d78028e.

📒 Files selected for processing (2)

ROADMAP.md
vera/codegen/functions.py

1. **Ruff S101** (CI failure) — two `assert gc_sp_save is not None` in `vera/codegen/functions.py` flagged by the `Security lint (ruff S rules)` job. Both are pure mypy type-narrowing asserts (the variable is `int | None` and we're inside an `if ctx.needs_alloc:` block where it's guaranteed non-None). Added `# noqa: S101` to both sites, matching the existing project precedent in `vera/codegen/api.py` (five similar sites). 2. **Finding 1: HISTORY.md footer release count** (valid) — the table-footer total still read "153 tagged releases" after the v0.0.154 row landed. Bumped to 154 to match the README figure. 3. **Finding 2: postcondition test tightened** (valid) — the test only asserted `return_call $build` is absent; a future regression that mistakenly took BOTH the postcondition-revert and the GC-aware patch paths would slip through. Added a structural assertion that no `local.get N; global.set $gc_sp` pair precedes any plain `call $build` site, pinning the dispatch precedence (post_instrs revert and GC-aware patch are mutually exclusive, not additive). Cannot forbid the sequence outright because the GC epilogue at function-end legitimately contains it. 4. **Finding 3: GC-aware TCO test now matches exact prologue local** (valid) — previously the test only required `local.get <any>` two lines before `return_call`. Now we parse the GC prologue (`global.get $gc_sp; local.set <N>`) at function entry to capture the exact `$gc_sp_save` local index, then require the preamble at each return_call site uses *that same* local. A typo or off-by-one in the patch code that picked up an unrelated local would now fail. 5. **Finding 4: shadow_stack_overflow_traps WAT structural check** (partially valid, added) — `_run_trap` alone proves "something trapped" but not "the shadow-stack-overflow guard fired specifically." Added a structural assertion that the distinctive `global.get $gc_stack_limit` token appears in the `$overflow` function body, catching a hypothetical regression where the guard is silently dropped but `_run_trap` still passes via an unrelated trap class. TESTING.md test_codegen.py line count updated to reflect the new test additions (107 lines). Co-Authored-By: Claude <noreply@anthropic.invalid>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@TESTING.md`:
- Line 69: The TESTING.md stress-test counts are inconsistent: the header for
`test_stress.py` says "9 logical tests / 16 instances" but the stress section
still reads "8 initial test programs" and "14 parametrised instances"; update
the stress section text so all three mentions align with the header (i.e.,
change "8 initial test programs" to "9 initial test programs" and "14
parametrised instances" to "16 parametrised instances") and ensure the
descriptive phrase matches the header (e.g., "9 logical tests × eager-GC lane
parametrisation = 16 test instances" referencing `test_stress.py`).

In `@tests/test_codegen.py`:
- Around line 2086-2131: The WAT substring checks can false-match
similarly-prefixed symbols; replace plain string searches in the build-body
extraction and in return/call-site checks with boundary-safe regex searches (use
re.search with word-boundary patterns like r'\b\(func \$build\b',
r'\breturn_call \$build\b', and r'\bcall \$build\b') where
build_start/build_end, build_body, and return_call_indices are computed, and
apply the same replacement to the analogous checks later in the file (the
call-site assertions around the other return_call/call checks).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: fd8f8adc-bf2d-4044-acb2-e5ea5c482a8f

📥 Commits

Reviewing files that changed from the base of the PR and between d78028e and ae25d84.

📒 Files selected for processing (4)

HISTORY.md
TESTING.md
tests/test_codegen.py
vera/codegen/functions.py

@array

**Finding 1: TESTING.md stress count inconsistency** (valid) — header table said `test_stress.py` has 16 instances but the detailed section still read "8 initial test programs" and "14 parametrised instances". Fixed cohesively: - `### The 8 initial test programs` → `### The 9 initial test programs` so the section claim matches the header. - Added entry **9. `test_tco_with_allocation_1m_iterations`** so the section actually enumerates what it claims to. - Updated test #3's stale narrative — this PR switched its body from `let @string = "stress"` (which doesn't trigger needs_alloc because string-pool literals don't allocate) to `let @array<Int> = [_, _]` (genuine heap alloc). Old text asserted == 6,000 from `string_length("stress")`; new text asserts == 2,000 from `array_length([_, _])`. Without this update the doc described a test that no longer exists. - "Six of the eight" → "Seven of the nine" in the eager-GC lane paragraph (math now adds up to 16 with the 9th test). - CLI examples and Budget line updated from 14/8 to 16/9. **Finding 2: boundary-safe regex in WAT searches** (valid) — the three #549 tests used plain `.find("(func $build")` and `"return_call $build" in body` substring matches. Today's test source defines only `$build` and `$f` so there's no false-match risk, but a future symbol like `$build_helper` would match falsely. Switched all three tests to `re.search` with boundary-safe patterns: - `re.search(r"\(func \$build\b", ...)` for function-start. - `re.search(r"return_call \$build\b", ...)` for return_call sites (trailing `\b` excludes `$build_anything`). - `re.search(r"\bcall \$build\b", ...)` for plain call sites (leading `\b` rules out `return_call $build`; trailing `\b` rules out `$build_anything`). - Same boundary-safe extraction applied to `$overflow` in `test_shadow_stack_overflow_traps`. Note on CR's literal suggested pattern `r'\b\(func \$build\b'`: the leading `\b\(` is incorrect — `\b` requires a word-char on one side, but `(` is preceded by whitespace or newline (both non-word) at WAT indentation depths, so that anchor wouldn't match. Dropped the leading anchor; kept the substantive trailing `\b`. Scope decline: did NOT update the pre-existing tests in the file that use the same substring-search pattern (the #517 tests on lines 1899/1934/1949/1968/2016/2027 etc.). Those existed before this PR and changing them is out of scope for #549. CR's relayed wording said "apply the same replacement to the analogous checks later in the file" — interpreted as the analogous checks in MY new/modified tests, not the entire pre- existing test suite. Validation: pytest passes (3862/3862), mypy clean, ruff S rules clean, doc-counts consistent (test_codegen.py grew by +39 lines from the regex switchover; TESTING.md updated to match). Co-Authored-By: Claude <noreply@anthropic.invalid>

@int

Comprehensive PR review (code-reviewer + pr-test-analyzer + comment-analyzer) surfaced 5 actionable items. All fixed here: **Inaccurate comments** (comment-analyzer) 1. `tests/test_stress.py` — `test_tco_with_allocation_1m_iterations` docstring claimed "~12 bytes per leaked frame ≈ 1300 iters". `gc_shadow_push` writes one i32 = 4 bytes, not 12 (the 12-byte figure came from a different fn elsewhere in the file with 2 pointer params + 1 tmp root). The `loop` fn here only pushes 1 pointer per iteration. Corrected to "4 bytes per leaked pointer root ≈ 4,096 iterations" and added the inline note that the @int params aren't pointer-typed. Also dropped the "~200ms on a 2026 MacBook Pro" claim (will rot). 2. `vera/codegen/functions.py` — comment cited "operators.py" unqualified, but `vera/codegen/` and `vera/wasm/` are peer directories — should be `vera/wasm/operators.py`. This matters because #672 explicitly audits the inline-indentation surface; a mis-pathed reference would cost search time. Added a #672 cross-reference too. 3. `vera/codegen/functions.py` dispatch comment said "Two sources of post-body work" but the actual code is a three-branch dispatch (revert / patch / untouched). Reframed as "Three outcomes (precedence: 1 > 2 > 3)" with the third explicitly named — the common non-allocating tail-recursion case from SKILL.md's "Iteration" section. Also trimmed the `alloc_local` implementation-detail digression that didn't add value. 4. `tests/test_codegen.py` — `test_allocating_function_uses_gc_ aware_tco_549` docstring said "sacrificing TCO for shadow- stack correctness". Inverted framing — pre-#549 the GC epilogue ran fine, what was sacrificed was WASM call-stack depth (1M iters trapped with `call stack exhausted` because the function paid a WASM frame per recursion). Reframed. **Test coverage gap** (pr-test-analyzer rating 8) 5. Added `test_allocating_function_gc_aware_tco_patches_both_ branches`. The patch loop iterates every body instruction and patches every `return_call` site individually; a buggy implementation that bails after the first match (e.g. an accidental `break`) would still pass the single-site test above. New test uses a `match` with two ADT arms each ending in `return_call $build`, asserts BOTH sites have the `local.get N; global.set $gc_sp` preamble, and that N matches the same `$gc_sp_save` local captured by the prologue. **Suggestion** (pr-test-analyzer rating 6) 6. `test_shadow_stack_overflow_traps` — added inline calibration for the 2000-iteration count: "16K shadow stack / 12 bytes per frame (2 Array<Bool> params + 1 tmp root) ≈ 1,365 frames; 2000 chosen with ~1.5× safety margin". Previously the threshold was undocumented; a future per-frame size change could silently weaken the test without anyone noticing. **Not addressed**: - pr-test-analyzer's mutual-recursion test suggestion (rating 8) — the dispatch is per-function and each function's correctness is independently pinned. Cross-function GC composition is a deeper concern that doesn't sit in this PR's scope and would benefit from a separate issue. - pr-test-analyzer's zero-`return_call`-site test (rating 6) — edge case is fine; allocating leaf fns just have an unused `$gc_sp_save` local, no harm. - api.py inner-helper workaround mention in `stack_exhausted` message — too tangential for an end-user-facing trap message. **code-reviewer verdict**: no issues found at confidence ≥ 80. The fix is structurally correct, the dispatch precedence is explicitly pinned by tests, and CodeRabbit's prior concerns (S101 noqa, indent alignment, boundary-safe regex) are all addressed. Validation: pytest 3863/3863 ✓, mypy clean ✓, ruff S rules clean ✓, doc-counts consistent (test counts updated to 3,893). Co-Authored-By: Claude <noreply@anthropic.invalid>

aallan · 2026-05-13T19:01:59Z

PR self-review summary

Ran the pr-review-toolkit:review-pr workflow against this PR — three specialised agents in parallel: code-reviewer, pr-test-analyzer, comment-analyzer. Skipped type-design-analyzer (no new types) and silent-failure-hunter (no error-handling paths changed; the only new control flow is dispatch logic).

code-reviewer — clean

"No issues found at confidence ≥ 80. The PR is clean. The core fix is correct, well-commented, and well-tested. The dispatch precedence is explicitly pinned by a dedicated test. The prefix-preservation logic is defensible and the prefix-extraction idiom is correct."

Verified independently:

Dispatch precedence (post_instrs revert > GC-aware patch > untouched) — correct.
gc_sp_save pre-allocation at functions.py:380-382 is index-safe; no conflict with later gc_ret_ptr / gc_ret_len / gc_ret locals.
Prefix-preservation idiom instr[: len(instr) - len(instr.lstrip())] correct for WAT (spaces only, no tabs).
Substring safety on lstrip().startswith("return_call ") confirmed — return_call only appears as instruction prefix in emission, never as comment or literal.
Edge cases (needs_alloc with zero return_call sites; post_instrs without needs_alloc) handled correctly.

pr-test-analyzer — one important gap, one suggestion

Important — no multi-branch return_call test (rating 8). The patch loop iterates every line and patches every site, but the single-branch fixture wouldn't catch a buggy implementation that bails after the first match (e.g. accidental break).

Fixed in ae573d3 — added test_allocating_function_gc_aware_tco_patches_both_branches. Uses a match with two ADT arms each ending in return_call $build, asserts BOTH sites have the local.get N; global.set $gc_sp preamble AND that N matches the $gc_sp_save local captured by the prologue. A loop-early-exit bug would now fail loudly.

Suggestion — undocumented 2000-iteration figure in test_shadow_stack_overflow_traps. Fixed: inline comment documents the calibration (16K shadow stack / 12 bytes per frame ≈ 1,365 frames; 2000 = ~1.5× safety margin).

Declined:

Mutual-recursion test with asymmetric needs_alloc (rating 8) — dispatch is per-function and each is independently pinned; cross-function GC composition is a deeper concern that doesn't sit in GC-aware tail-call optimization for allocating functions #549's scope. Worth a follow-up issue if the property matters.
Zero-return_call-site test for allocating leaf fns (rating 6) — the edge case is benign (gc_sp_save is allocated but unused; no harm).

comment-analyzer — four factual fixes

All four flagged in the comment-analyzer report and fixed in ae573d3:

#	File	Fix
1	`tests/test_stress.py:248`	"~12 bytes per leaked frame ≈ 1300 iters" was wrong. `gc_shadow_push` writes one i32 = 4 bytes. The 12-byte figure came from a different fn elsewhere in the file. The `loop` here pushes 1 pointer per iter = 4 bytes; overflow threshold ≈ 4,096 iters. Also dropped the wall-clock "~200ms on 2026 MacBook Pro" claim (will rot).
2	`vera/codegen/functions.py:415`	Comment cited `operators.py` unqualified, but `vera/codegen/` and `vera/wasm/` are peer directories. Fully qualified to `vera/wasm/operators.py`. Added a #672 cross-reference (the broader fixup for inline-indent stamping).
3	`vera/codegen/functions.py:350`	Comment header said "Two sources of post-body work" but the actual dispatch is three-branch (revert / patch / untouched). Reframed as "Three outcomes (precedence: 1 > 2 > 3)" with the third explicitly named — the common non-allocating tail-recursion case.
4	`tests/test_codegen.py:2034`	Docstring said "sacrificing TCO for shadow-stack correctness". Inverted framing — pre-#549 the GC epilogue ran fine; what was sacrificed was WASM call-stack depth (1M iters trapped `call stack exhausted`). Reframed correctly.

Declined:

Adding "split into inner helper without postcondition + outer wrapper" workaround to api.py's stack_exhausted trap message — too tangential for end-user-facing text.

Validation

After ae573d3:

pytest 3,863 passed, 14 skipped, 16 stress-deselected (test count went 3,892 → 3,893 from the new multi-branch test)
mypy clean (59 source files)
ruff S rules clean
doc-counts script consistent (3,893 tests across 32 files)
All 18 pre-commit hooks green (conformance 86/86, examples 34/34, walker coverage, license, site assets, browser parity, etc.)

Commit trail on this PR

Commit	Purpose
`1df1e8b`	Initial #549 fix + tests + doc sweep + v0.0.154 release prep
`d7e54e4`	Preserve `return_call` leading whitespace when patching `$gc_sp` restore
`d78028e`	Add #672 (canonicalize WAT formatting) to ROADMAP CI tooling
`ae25d84`	Address CodeRabbit round-1 + fix lint failure
`846ce05`	Address CodeRabbit round-2 findings (TESTING.md sync + boundary-safe regex)
`ae573d3`	Address self-review findings (this comment)

aallan · 2026-05-13T19:10:03Z

@coderabbitai full review

coderabbitai · 2026-05-13T19:10:08Z

✅ Actions performed

Full review triggered.

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

Comment thread HISTORY.md

Comment thread tests/test_codegen.py Outdated

Comment thread tests/test_codegen.py

aallan mentioned this pull request May 13, 2026

Canonicalize WAT output formatting via single-pass formatter (drop inline indent-stamping) #672

Open

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

Comment thread vera/codegen/functions.py Outdated

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

Comment thread TESTING.md Outdated

Comment thread tests/test_codegen.py Outdated

aallan and others added 2 commits May 13, 2026 19:36

aallan merged commit 27594f9 into main May 13, 2026
24 checks passed

aallan deleted the fix-549-tco-allocating-fns branch May 13, 2026 19:37

Conversation

aallan commented May 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Behavioural impact

Test changes

Doc sweep

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Uh oh!

codecov Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

aallan commented May 13, 2026

PR self-review summary

code-reviewer — clean

pr-test-analyzer — one important gap, one suggestion

comment-analyzer — four factual fixes

Validation

Commit trail on this PR

Uh oh!

aallan commented May 13, 2026

Uh oh!

coderabbitai Bot commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aallan commented May 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 13, 2026 •

edited

Loading

codecov Bot commented May 13, 2026 •

edited

Loading