Skip to content

Fix #549: GC-aware tail-call optimization for allocating functions#671

Merged
aallan merged 6 commits into
mainfrom
fix-549-tco-allocating-fns
May 13, 2026
Merged

Fix #549: GC-aware tail-call optimization for allocating functions#671
aallan merged 6 commits into
mainfrom
fix-549-tco-allocating-fns

Conversation

@aallan

@aallan aallan commented May 13, 2026

Copy link
Copy Markdown
Owner

Summary

  • Replaces the "allocating fn ⇒ revert return_callcall" fallback with a GC-aware patch. The post-process in vera/codegen/functions.py::_compile_fn now PREPENDS local.get $gc_sp_save; global.set $gc_sp immediately before each return_call site in an allocating fn, restoring the shadow-stack pointer to the caller's entry baseline so the callee saves a clean new baseline. Per-iteration shadow-stack usage stays bounded at caller's entry + n_arg_roots regardless of iteration count.
  • Args transfer atomically. At the return_call site the args are already on the WASM operand stack; the two-instruction restore touches only the $gc_sp global, not the operand stack, so the patch is non-interfering — return_call still consumes args and atomically transfers control.
  • Postcondition revert preserved. Functions with a non-trivial runtime postcondition still revert return_call → plain call (the post-check needs to run after each call; return_call would skip it). Dispatch precedence: post_instrs revert > needs_alloc GC-aware patch > untouched.
  • Closes the last open bug in KNOWN_ISSUES.md bug-tracker section apart from Conservative GC scan can spuriously retain heap objects via wrapper handle field (#573 latent) #578 — the bug list will be empty once Conservative GC scan can spuriously retain heap objects via wrapper handle field (#573 latent) #578 lands.

Test plan

  • pytest tests/ — 3,862 passed, 14 skipped, 16 deselected
  • mypy vera/ — clean
  • python scripts/check_conformance.py — 86/86
  • python scripts/check_examples.py — 34/34
  • python scripts/check_doc_counts.py — consistent
  • python scripts/check_version_sync.py — 0.0.154 across 6 files
  • python scripts/check_site_assets.py — up-to-date
  • python scripts/check_limitations_sync.py — consistent
  • Pre-commit hooks — all 18 passed including conformance suite + example validation
  • Stress test sample (pytest tests/test_stress.py::test_tco_with_allocation_1m_iterations -m stress) — 1M iterations under both default and eager GC in ~190ms each

Behavioural impact

Pre-#549, an allocating tail-recursive function like:

private fn loop(@Int, @Int -> @Int)
  requires(@Int.1 >= 0) ensures(true) decreases(@Int.1) effects(pure)
{
  if @Int.1 == 0 then { @Int.0 }
  else {
    let @Array<Int> = [@Int.0, @Int.1];
    loop(@Int.1 - 1, @Int.0 + array_length(@Array<Int>.0))
  }
}

trapped with stack_exhausted at ~30K iterations (the WASM call-stack limit) because the post-process reverted return_call → plain call. Post-#549 the same function runs in constant stack space at 1M+ iterations — the per-iteration $gc_sp restore keeps the shadow stack flat across the entire run.

Test changes

  • tests/test_codegen.py::TestTailCallOptimization517 — renamed test_allocating_function_falls_back_to_plain_call to test_allocating_function_uses_gc_aware_tco_549 and inverted its assertions: it now verifies return_call $foo is emitted AND every such site is preceded by local.get <N>; global.set $gc_sp. Added a sibling test test_allocating_function_with_postcondition_still_reverts to pin the postcondition-revert precedence.
  • tests/test_codegen.py::TestGCShadowStackOverflow::test_shadow_stack_overflow_traps — rewritten to use a non-tail-recursive shape (recursive call wrapped in array_append) so the overflow guard still genuinely trips post-GC-aware tail-call optimization for allocating functions #549.
  • tests/test_stress.py::test_deep_tail_recursion_with_allocating_arg — body switched from string-pool literals (didn't actually set needs_alloc) to genuine array allocation, so the test now actually exercises GC-aware tail-call optimization for allocating functions #549's path.
  • tests/test_stress.py::test_tco_with_allocation_1m_iterations — new 1M-iteration companion parametrised over default-GC and eager-GC modes (~190ms wall-clock in both).

Doc sweep

  • KNOWN_ISSUES.md — removed the GC-aware tail-call optimization for allocating functions #549 bug row.
  • SKILL.md (+ regenerated docs/SKILL.md) — removed the "tail-call optimization disabled for allocating functions" row from the bug table; updated the narrative paragraph below it to describe both Tail-call optimization missing: deep recursion blows the WASM call stack #517 and GC-aware tail-call optimization for allocating functions #549 as live optimisations.
  • vera/wasm/calls.py — comment near _translate_call updated to describe both branches of the post-process (GC-aware patch vs postcondition revert).
  • vera/codegen/api.py — the stack_exhausted runtime trap-fix message updated: drops the "allocating functions are an exception" paragraph, adds a brief mention of the postcondition-revert as the remaining exception.
  • Version bumped to v0.0.154 across pyproject.toml, vera/__init__.py, uv.lock, docs/index.html, README.md.
  • CHANGELOG.md v0.0.154 entry added with the design rationale, plus HISTORY.md row.
  • TESTING.md, ROADMAP.md, README.md — test counts updated to 3,892 (was 3,889).

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Restored GC‑aware tail‑call optimisation for allocating functions so shadow‑stack/GC state is preserved across recursive tail calls while retaining correct postcondition behaviour.
  • Tests

    • Added and expanded regression and stress suites (including a 1,000,000‑iteration stress case) to validate GC‑aware TCO, overflow/trap behaviour and updated stress parametrisation and expectations.
  • Documentation

    • Bumped project version; updated changelogs, README, HISTORY, ROADMAP, TESTING and runtime guidance; removed the resolved known‑issue entry.

Review Change Stack

Pre-fix, WASM return_call reverted to plain call whenever a function
had needs_alloc=True, because return_call discards the current frame
and would skip the GC epilogue, leaking shadow-stack slots per
iteration.  This forced agents to restructure allocating tail-
recursive code into array_fold/array_map shapes or to hoist
allocations outside the recursion.

The fix preserves TCO AND the shadow-stack invariant: instead of
reverting, the post-process now PREPENDS a two-instruction $gc_sp
restore (local.get $gc_sp_save; global.set $gc_sp) immediately
before each return_call in an allocating function.  Args are
already on the WASM operand stack at the return_call site; the
restore only touches the $gc_sp global, so args transfer atomically
to the callee.  The callee's prologue saves a clean new $gc_sp
baseline, so per-iteration shadow-stack usage stays bounded at
caller's entry + n_arg_roots regardless of iteration count.

Postcondition-bearing functions still revert to plain call — the
runtime postcondition check needs to run after each call, and
return_call would skip it.  Dispatch precedence: postcondition-
revert > GC-aware-TCO-patch > untouched.

Tests:
- Updated TestTailCallOptimization517 — renamed the falls-back
  test to test_allocating_function_uses_gc_aware_tco_549 and
  inverted assertions; added test_allocating_function_with_
  postcondition_still_reverts for precedence.
- Updated TestGCShadowStackOverflow::test_shadow_stack_overflow_
  traps to use a non-tail-recursive shape that still genuinely
  overflows post-#549.
- tests/test_stress.py — switched the existing 1K-iter deep-tail
  test from string-pool literals (didn't actually set needs_alloc)
  to genuine array allocation; added a 1M-iter companion
  parametrised over default/eager-GC.

Closes #549.

Co-Authored-By: Claude <noreply@anthropic.invalid>
@coderabbitai

coderabbitai Bot commented May 13, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1002873d-90a3-4ce5-93df-425f921891f2

📥 Commits

Reviewing files that changed from the base of the PR and between 50f74ba and ae573d3.

⛔ Files ignored due to path filters (6)
  • docs/SKILL.md is excluded by !docs/**
  • docs/index.html is excluded by !docs/**
  • docs/index.md is excluded by !docs/**
  • docs/llms-full.txt is excluded by !docs/**
  • docs/llms.txt is excluded by !docs/**
  • uv.lock is excluded by !**/*.lock, !uv.lock
📒 Files selected for processing (14)
  • CHANGELOG.md
  • HISTORY.md
  • KNOWN_ISSUES.md
  • README.md
  • ROADMAP.md
  • SKILL.md
  • TESTING.md
  • pyproject.toml
  • tests/test_codegen.py
  • tests/test_stress.py
  • vera/__init__.py
  • vera/codegen/api.py
  • vera/codegen/functions.py
  • vera/wasm/calls.py
💤 Files with no reviewable changes (1)
  • KNOWN_ISSUES.md

📝 Walkthrough

Walkthrough

This PR implements issue #549: the compiler now preserves wasm return_call for allocating functions by injecting a $gc_sp restore before each tail jump; functions with runtime postconditions still revert to call. Tests, stress suites, docs, and package version updated to v0.0.154.

Changes

GC-aware allocating tail-call optimization

Layer / File(s) Summary
Core compiler implementation
vera/codegen/functions.py, vera/wasm/calls.py, vera/codegen/api.py, vera/__init__.py, pyproject.toml
_compile_fn now distinguishes postcondition vs allocation handling: when post_instrs is present, return_call is rewritten to call; when allocating with no postconditions, gc_sp_save local is preallocated and each return_call site is patched to restore $gc_sp via local.get <gc_sp_save>; global.set $gc_sp immediately before the tail transfer. Supporting comments and trap guidance updated; package/module version bumped to v0.0.154.
Allocating tail-call codegen test coverage
tests/test_codegen.py
Replaced allocating-tail-call test with test_allocating_function_uses_gc_aware_tco_549 that parses emitted WAT, extracts the saved-local index, and verifies every return_call $build is immediately preceded by local.get <same_saved_local>; global.set $gc_sp; added companion test asserting ensures causes return_callcall and absence of the immediate preamble; overflow regression adjusted to non-tail recursion and two-step WAT+runtime checks.
Stress tests for GC-aware TCO
tests/test_stress.py
Updated test_deep_tail_recursion_with_allocating_arg to allocate Array<Int> per iteration and adjusted expected assertion; added test_tco_with_allocation_1m_iterations (gated by EAGER_GC_PARAMS) running a 1,000,000-iteration allocating-array tail loop to validate bounded shadow-stack behaviour.
Documentation and version updates
CHANGELOG.md, HISTORY.md, KNOWN_ISSUES.md, README.md, ROADMAP.md, SKILL.md, TESTING.md
Top-level changelog/HISTORY entries added for v0.0.154; removed #549 from KNOWN_ISSUES; updated README/ROADMAP/TESTING/SKILL to document GC-aware TCO, test inventory changes, trap diagnostic revisions, and updated compare link.

Sequence Diagram(s)

sequenceDiagram
  participant _compile_fn as _compile_fn
  participant GCPrologue as GC_Prologue
  participant body_instrs as body_instrs
  participant ReturnPatcher as ReturnPatcher
  _compile_fn->>GCPrologue: ensure gc_sp_save local allocated
  _compile_fn->>body_instrs: emit function body (may contain return_call)
  body_instrs->>ReturnPatcher: locate each return_call site
  ReturnPatcher->>body_instrs: inject "local.get gc_sp_save"
  ReturnPatcher->>body_instrs: inject "global.set $gc_sp"
  ReturnPatcher->>body_instrs: or rewrite to "call" when post_instrs present
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

  • aallan/vera#669: Overlaps stress-harness changes for deep tail-recursion with allocation.
  • aallan/vera#550: Prior tail-call post-processing work related to return_call/call rewriting.
  • aallan/vera#473: Related GC shadow-stack overflow test changes and coverage.

Suggested labels

compiler, tests, docs

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately describes the main change: GC-aware tail-call optimization now applies to allocating functions, fixing issue #549. It is concise, specific, and reflects the primary technical objective.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-549-tco-allocating-fns

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented May 13, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.02%. Comparing base (50f74ba) to head (ae573d3).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #671   +/-   ##
=======================================
  Coverage   91.02%   91.02%           
=======================================
  Files          60       60           
  Lines       23398    23409   +11     
  Branches      259      259           
=======================================
+ Hits        21297    21308   +11     
  Misses       2094     2094           
  Partials        7        7           
Flag Coverage Δ
javascript 57.36% <ø> (ø)
python 94.77% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

The #549 post-process inserts `local.get $gc_sp_save; global.set
$gc_sp` immediately before each `return_call` site in an allocating
function.  Pre-fix the two inserted lines were appended as raw
strings with no leading whitespace, so the rendered WAT looked
misaligned when the `return_call` was nested inside an `if/else`
(operators.py prepends 2 spaces to every instruction inside a
then/else arm; functions.py later prepends 4 more at the body-level
stringify step — so a nested return_call carries 6 leading spaces
while my inserted lines only got 4).

Functionally inert — WAT parses by tokens, not whitespace — but
visually misleading for anyone reading the WAT for debugging.

Fix: copy the leading whitespace from the original return_call
line and apply it to both inserted lines so the restore sequence
visually nests at the same depth as the return_call it precedes.

Co-Authored-By: Claude <noreply@anthropic.invalid>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/test_codegen.py (1)

15089-15136: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don't let _run_trap() hide the wrong failure mode.

Line 15136 now proves only that something trapped. If this shape regressed into a plain WASM stack overflow or a different allocator trap, the test would still pass without ever exercising the shadow-stack overflow guard it documents. Please pair the runtime trap with a structural assertion against the emitted overflow-check sequence in overflow's WAT.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_codegen.py` around lines 15089 - 15136, The test currently only
calls _run_trap(src) which only verifies that "something" trapped; update the
test to also inspect the emitted WAT for the function overflow and assert the
specific overflow-check instruction sequence is present so we know the
shadow-stack overflow guard was generated. Concretely, after
generating/compiling src (the same code used by _run_trap), extract the WAT or
disassembly for the function named "overflow" and assert it contains the
expected overflow-check pattern (e.g. the shadow-stack load/compare/branch
sequence your backend emits), then keep the runtime _run_trap() call to ensure
the trap actually fires. Reference the test helper _run_trap and the function
symbol overflow when locating where to add the WAT/disassembly assertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@HISTORY.md`:
- Line 301: The HISTORY.md footer still reads "153 tagged releases" after adding
the v0.0.154 entry; update that roll-up from 153 to 154 and then update the
matching release count in README.md (the README release figure that displays the
total tagged releases) so both files remain synchronized with the new v0.0.154
entry.

In `@tests/test_codegen.py`:
- Around line 2169-2175: The test only checks that "return_call $build" is not
present but doesn't ensure the GC-restore preamble (e.g. injected local.get ...;
global.set $gc_sp) was rejected, so tighten the assertion by verifying the
preamble isn't present; update the test around the build function/body to also
assert that sequences like "local.get" followed by "global.set $gc_sp" (the
GC-restore preamble) do not appear in build_body, and keep the existing checks
for "call $build" and absence of "return_call $build" so the test ensures
postcondition-bearing functions cannot contain the GC-restore preamble or a
return_call.
- Around line 2097-2122: The test currently only checks that two lines before
each "return_call $build" is a "local.get ..." but doesn't verify it matches the
local saved by the prologue; update the assertion to capture the exact saved
local from the prologue (e.g. parse the prologue to find the "local.set" or
local index used to store the GC baseline, call it the $gc_sp_save local) and
require that the line two lines before each "return_call $build" equals that
exact "local.get <N>" string, while still asserting the immediate previous line
is "global.set $gc_sp"; use the existing variables build_body, prologue,
return_call $build, global.set $gc_sp and local.get to locate and compare the
exact saved local.

---

Outside diff comments:
In `@tests/test_codegen.py`:
- Around line 15089-15136: The test currently only calls _run_trap(src) which
only verifies that "something" trapped; update the test to also inspect the
emitted WAT for the function overflow and assert the specific overflow-check
instruction sequence is present so we know the shadow-stack overflow guard was
generated. Concretely, after generating/compiling src (the same code used by
_run_trap), extract the WAT or disassembly for the function named "overflow" and
assert it contains the expected overflow-check pattern (e.g. the shadow-stack
load/compare/branch sequence your backend emits), then keep the runtime
_run_trap() call to ensure the trap actually fires. Reference the test helper
_run_trap and the function symbol overflow when locating where to add the
WAT/disassembly assertion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2f2e1fea-10db-427b-8b62-5e4d08d1a865

📥 Commits

Reviewing files that changed from the base of the PR and between 50f74ba and 1df1e8b.

⛔ Files ignored due to path filters (6)
  • docs/SKILL.md is excluded by !docs/**
  • docs/index.html is excluded by !docs/**
  • docs/index.md is excluded by !docs/**
  • docs/llms-full.txt is excluded by !docs/**
  • docs/llms.txt is excluded by !docs/**
  • uv.lock is excluded by !**/*.lock, !uv.lock
📒 Files selected for processing (14)
  • CHANGELOG.md
  • HISTORY.md
  • KNOWN_ISSUES.md
  • README.md
  • ROADMAP.md
  • SKILL.md
  • TESTING.md
  • pyproject.toml
  • tests/test_codegen.py
  • tests/test_stress.py
  • vera/__init__.py
  • vera/codegen/api.py
  • vera/codegen/functions.py
  • vera/wasm/calls.py
💤 Files with no reviewable changes (1)
  • KNOWN_ISSUES.md

Comment thread HISTORY.md
Comment thread tests/test_codegen.py Outdated
Comment thread tests/test_codegen.py
Surfaced during #549 review when the user pointed out that the
compiler currently stamps WAT indentation onto instruction strings
inline at ~14 emission sites scattered across vera/wasm/ and
vera/codegen/.  This is implicit-by-convention: any new emission
site that doesn't know about it produces visually-misaligned WAT
(as #549's GC-aware TCO post-process did initially.

Vera source has vera fmt because consistent presentation matters
for a debuggable surface.  Same principle applies to compiler-
emitted WAT — the primary surface agents read when debugging
codegen bugs.  Logged as #672 for a single-pass formatter that
walks s-expression and control-flow depth and applies canonical
indentation.

Co-Authored-By: Claude <noreply@anthropic.invalid>
EOF
)

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@vera/codegen/functions.py`:
- Line 406: The code currently uses an `assert gc_sp_save is not None` guard
which can be stripped under `-O`; replace this assertion with an explicit
invariant failure by checking `if gc_sp_save is None:` and calling the
module/class `_error()` method with a descriptive message (e.g. "internal
compiler error: gc_sp_save is None") so the failure is always raised; update the
code paths around the `gc_sp_save` reference (the `gc_sp_save` variable usage in
functions.py) to use this explicit check instead of `assert`.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 68cd9def-74d0-42d9-8968-146833da7393

📥 Commits

Reviewing files that changed from the base of the PR and between 1df1e8b and d78028e.

📒 Files selected for processing (2)
  • ROADMAP.md
  • vera/codegen/functions.py

Comment thread vera/codegen/functions.py Outdated
1. **Ruff S101** (CI failure) — two `assert gc_sp_save is not None`
   in `vera/codegen/functions.py` flagged by the `Security lint
   (ruff S rules)` job.  Both are pure mypy type-narrowing asserts
   (the variable is `int | None` and we're inside an
   `if ctx.needs_alloc:` block where it's guaranteed non-None).
   Added `# noqa: S101` to both sites, matching the existing
   project precedent in `vera/codegen/api.py` (five similar sites).

2. **Finding 1: HISTORY.md footer release count** (valid) — the
   table-footer total still read "153 tagged releases" after the
   v0.0.154 row landed.  Bumped to 154 to match the README figure.

3. **Finding 2: postcondition test tightened** (valid) — the
   test only asserted `return_call $build` is absent; a future
   regression that mistakenly took BOTH the postcondition-revert
   and the GC-aware patch paths would slip through.  Added a
   structural assertion that no `local.get N; global.set $gc_sp`
   pair precedes any plain `call $build` site, pinning the
   dispatch precedence (post_instrs revert and GC-aware patch are
   mutually exclusive, not additive).  Cannot forbid the sequence
   outright because the GC epilogue at function-end legitimately
   contains it.

4. **Finding 3: GC-aware TCO test now matches exact prologue local**
   (valid) — previously the test only required `local.get <any>`
   two lines before `return_call`.  Now we parse the GC prologue
   (`global.get $gc_sp; local.set <N>`) at function entry to
   capture the exact `$gc_sp_save` local index, then require the
   preamble at each return_call site uses *that same* local.  A
   typo or off-by-one in the patch code that picked up an
   unrelated local would now fail.

5. **Finding 4: shadow_stack_overflow_traps WAT structural check**
   (partially valid, added) — `_run_trap` alone proves "something
   trapped" but not "the shadow-stack-overflow guard fired
   specifically."  Added a structural assertion that the
   distinctive `global.get $gc_stack_limit` token appears in the
   `$overflow` function body, catching a hypothetical regression
   where the guard is silently dropped but `_run_trap` still
   passes via an unrelated trap class.

TESTING.md test_codegen.py line count updated to reflect the new
test additions (107 lines).

Co-Authored-By: Claude <noreply@anthropic.invalid>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@TESTING.md`:
- Line 69: The TESTING.md stress-test counts are inconsistent: the header for
`test_stress.py` says "9 logical tests / 16 instances" but the stress section
still reads "8 initial test programs" and "14 parametrised instances"; update
the stress section text so all three mentions align with the header (i.e.,
change "8 initial test programs" to "9 initial test programs" and "14
parametrised instances" to "16 parametrised instances") and ensure the
descriptive phrase matches the header (e.g., "9 logical tests × eager-GC lane
parametrisation = 16 test instances" referencing `test_stress.py`).

In `@tests/test_codegen.py`:
- Around line 2086-2131: The WAT substring checks can false-match
similarly-prefixed symbols; replace plain string searches in the build-body
extraction and in return/call-site checks with boundary-safe regex searches (use
re.search with word-boundary patterns like r'\b\(func \$build\b',
r'\breturn_call \$build\b', and r'\bcall \$build\b') where
build_start/build_end, build_body, and return_call_indices are computed, and
apply the same replacement to the analogous checks later in the file (the
call-site assertions around the other return_call/call checks).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: fd8f8adc-bf2d-4044-acb2-e5ea5c482a8f

📥 Commits

Reviewing files that changed from the base of the PR and between d78028e and ae25d84.

📒 Files selected for processing (4)
  • HISTORY.md
  • TESTING.md
  • tests/test_codegen.py
  • vera/codegen/functions.py

Comment thread TESTING.md Outdated
Comment thread tests/test_codegen.py Outdated
aallan and others added 2 commits May 13, 2026 19:36
**Finding 1: TESTING.md stress count inconsistency** (valid) —
header table said `test_stress.py` has 16 instances but the
detailed section still read "8 initial test programs" and
"14 parametrised instances".  Fixed cohesively:
- `### The 8 initial test programs` → `### The 9 initial test
  programs` so the section claim matches the header.
- Added entry **9. `test_tco_with_allocation_1m_iterations`** so
  the section actually enumerates what it claims to.
- Updated test #3's stale narrative — this PR switched its body
  from `let @string = "stress"` (which doesn't trigger
  needs_alloc because string-pool literals don't allocate) to
  `let @array<Int> = [_, _]` (genuine heap alloc).  Old text
  asserted == 6,000 from `string_length("stress")`; new text
  asserts == 2,000 from `array_length([_, _])`.  Without this
  update the doc described a test that no longer exists.
- "Six of the eight" → "Seven of the nine" in the eager-GC lane
  paragraph (math now adds up to 16 with the 9th test).
- CLI examples and Budget line updated from 14/8 to 16/9.

**Finding 2: boundary-safe regex in WAT searches** (valid) — the
three #549 tests used plain `.find("(func $build")` and
`"return_call $build" in body` substring matches.  Today's test
source defines only `$build` and `$f` so there's no false-match
risk, but a future symbol like `$build_helper` would match
falsely.  Switched all three tests to `re.search` with
boundary-safe patterns:
- `re.search(r"\(func \$build\b", ...)` for function-start.
- `re.search(r"return_call \$build\b", ...)` for return_call
  sites (trailing `\b` excludes `$build_anything`).
- `re.search(r"\bcall \$build\b", ...)` for plain call sites
  (leading `\b` rules out `return_call $build`; trailing `\b`
  rules out `$build_anything`).
- Same boundary-safe extraction applied to `$overflow` in
  `test_shadow_stack_overflow_traps`.

Note on CR's literal suggested pattern `r'\b\(func \$build\b'`:
the leading `\b\(` is incorrect — `\b` requires a word-char on
one side, but `(` is preceded by whitespace or newline (both
non-word) at WAT indentation depths, so that anchor wouldn't
match.  Dropped the leading anchor; kept the substantive
trailing `\b`.

Scope decline: did NOT update the pre-existing tests in the
file that use the same substring-search pattern (the #517 tests
on lines 1899/1934/1949/1968/2016/2027 etc.).  Those existed
before this PR and changing them is out of scope for #549.
CR's relayed wording said "apply the same replacement to the
analogous checks later in the file" — interpreted as the
analogous checks in MY new/modified tests, not the entire pre-
existing test suite.

Validation: pytest passes (3862/3862), mypy clean, ruff S
rules clean, doc-counts consistent (test_codegen.py grew by
+39 lines from the regex switchover; TESTING.md updated to
match).

Co-Authored-By: Claude <noreply@anthropic.invalid>
Comprehensive PR review (code-reviewer + pr-test-analyzer +
comment-analyzer) surfaced 5 actionable items.  All fixed here:

**Inaccurate comments** (comment-analyzer)

1. `tests/test_stress.py` — `test_tco_with_allocation_1m_iterations`
   docstring claimed "~12 bytes per leaked frame ≈ 1300 iters".
   `gc_shadow_push` writes one i32 = 4 bytes, not 12 (the 12-byte
   figure came from a different fn elsewhere in the file with 2
   pointer params + 1 tmp root).  The `loop` fn here only pushes
   1 pointer per iteration.  Corrected to "4 bytes per leaked
   pointer root ≈ 4,096 iterations" and added the inline note
   that the @int params aren't pointer-typed.  Also dropped the
   "~200ms on a 2026 MacBook Pro" claim (will rot).

2. `vera/codegen/functions.py` — comment cited "operators.py"
   unqualified, but `vera/codegen/` and `vera/wasm/` are peer
   directories — should be `vera/wasm/operators.py`.  This
   matters because #672 explicitly audits the inline-indentation
   surface; a mis-pathed reference would cost search time.
   Added a #672 cross-reference too.

3. `vera/codegen/functions.py` dispatch comment said "Two sources
   of post-body work" but the actual code is a three-branch
   dispatch (revert / patch / untouched).  Reframed as "Three
   outcomes (precedence: 1 > 2 > 3)" with the third explicitly
   named — the common non-allocating tail-recursion case from
   SKILL.md's "Iteration" section.  Also trimmed the
   `alloc_local` implementation-detail digression that didn't
   add value.

4. `tests/test_codegen.py` — `test_allocating_function_uses_gc_
   aware_tco_549` docstring said "sacrificing TCO for shadow-
   stack correctness".  Inverted framing — pre-#549 the GC
   epilogue ran fine, what was sacrificed was WASM call-stack
   depth (1M iters trapped with `call stack exhausted` because
   the function paid a WASM frame per recursion).  Reframed.

**Test coverage gap** (pr-test-analyzer rating 8)

5. Added `test_allocating_function_gc_aware_tco_patches_both_
   branches`.  The patch loop iterates every body instruction
   and patches every `return_call` site individually; a buggy
   implementation that bails after the first match (e.g. an
   accidental `break`) would still pass the single-site test
   above.  New test uses a `match` with two ADT arms each ending
   in `return_call $build`, asserts BOTH sites have the
   `local.get N; global.set $gc_sp` preamble, and that N matches
   the same `$gc_sp_save` local captured by the prologue.

**Suggestion** (pr-test-analyzer rating 6)

6. `test_shadow_stack_overflow_traps` — added inline calibration
   for the 2000-iteration count: "16K shadow stack / 12 bytes
   per frame (2 Array<Bool> params + 1 tmp root) ≈ 1,365 frames;
   2000 chosen with ~1.5× safety margin".  Previously the
   threshold was undocumented; a future per-frame size change
   could silently weaken the test without anyone noticing.

**Not addressed**:
- pr-test-analyzer's mutual-recursion test suggestion (rating
  8) — the dispatch is per-function and each function's
  correctness is independently pinned.  Cross-function GC
  composition is a deeper concern that doesn't sit in this
  PR's scope and would benefit from a separate issue.
- pr-test-analyzer's zero-`return_call`-site test (rating 6) —
  edge case is fine; allocating leaf fns just have an unused
  `$gc_sp_save` local, no harm.
- api.py inner-helper workaround mention in `stack_exhausted`
  message — too tangential for an end-user-facing trap message.

**code-reviewer verdict**: no issues found at confidence ≥ 80.
The fix is structurally correct, the dispatch precedence is
explicitly pinned by tests, and CodeRabbit's prior concerns
(S101 noqa, indent alignment, boundary-safe regex) are all
addressed.

Validation: pytest 3863/3863 ✓, mypy clean ✓, ruff S rules
clean ✓, doc-counts consistent (test counts updated to 3,893).

Co-Authored-By: Claude <noreply@anthropic.invalid>
@aallan

aallan commented May 13, 2026

Copy link
Copy Markdown
Owner Author

PR self-review summary

Ran the pr-review-toolkit:review-pr workflow against this PR — three specialised agents in parallel: code-reviewer, pr-test-analyzer, comment-analyzer. Skipped type-design-analyzer (no new types) and silent-failure-hunter (no error-handling paths changed; the only new control flow is dispatch logic).

code-reviewer — clean

"No issues found at confidence ≥ 80. The PR is clean. The core fix is correct, well-commented, and well-tested. The dispatch precedence is explicitly pinned by a dedicated test. The prefix-preservation logic is defensible and the prefix-extraction idiom is correct."

Verified independently:

  • Dispatch precedence (post_instrs revert > GC-aware patch > untouched) — correct.
  • gc_sp_save pre-allocation at functions.py:380-382 is index-safe; no conflict with later gc_ret_ptr / gc_ret_len / gc_ret locals.
  • Prefix-preservation idiom instr[: len(instr) - len(instr.lstrip())] correct for WAT (spaces only, no tabs).
  • Substring safety on lstrip().startswith("return_call ") confirmed — return_call only appears as instruction prefix in emission, never as comment or literal.
  • Edge cases (needs_alloc with zero return_call sites; post_instrs without needs_alloc) handled correctly.

pr-test-analyzer — one important gap, one suggestion

Important — no multi-branch return_call test (rating 8). The patch loop iterates every line and patches every site, but the single-branch fixture wouldn't catch a buggy implementation that bails after the first match (e.g. accidental break).

Fixed in ae573d3 — added test_allocating_function_gc_aware_tco_patches_both_branches. Uses a match with two ADT arms each ending in return_call $build, asserts BOTH sites have the local.get N; global.set $gc_sp preamble AND that N matches the $gc_sp_save local captured by the prologue. A loop-early-exit bug would now fail loudly.

Suggestion — undocumented 2000-iteration figure in test_shadow_stack_overflow_traps. Fixed: inline comment documents the calibration (16K shadow stack / 12 bytes per frame ≈ 1,365 frames; 2000 = ~1.5× safety margin).

Declined:

  • Mutual-recursion test with asymmetric needs_alloc (rating 8) — dispatch is per-function and each is independently pinned; cross-function GC composition is a deeper concern that doesn't sit in GC-aware tail-call optimization for allocating functions #549's scope. Worth a follow-up issue if the property matters.
  • Zero-return_call-site test for allocating leaf fns (rating 6) — the edge case is benign (gc_sp_save is allocated but unused; no harm).

comment-analyzer — four factual fixes

All four flagged in the comment-analyzer report and fixed in ae573d3:

# File Fix
1 tests/test_stress.py:248 "~12 bytes per leaked frame ≈ 1300 iters" was wrong. gc_shadow_push writes one i32 = 4 bytes. The 12-byte figure came from a different fn elsewhere in the file. The loop here pushes 1 pointer per iter = 4 bytes; overflow threshold ≈ 4,096 iters. Also dropped the wall-clock "~200ms on 2026 MacBook Pro" claim (will rot).
2 vera/codegen/functions.py:415 Comment cited operators.py unqualified, but vera/codegen/ and vera/wasm/ are peer directories. Fully qualified to vera/wasm/operators.py. Added a #672 cross-reference (the broader fixup for inline-indent stamping).
3 vera/codegen/functions.py:350 Comment header said "Two sources of post-body work" but the actual dispatch is three-branch (revert / patch / untouched). Reframed as "Three outcomes (precedence: 1 > 2 > 3)" with the third explicitly named — the common non-allocating tail-recursion case.
4 tests/test_codegen.py:2034 Docstring said "sacrificing TCO for shadow-stack correctness". Inverted framing — pre-#549 the GC epilogue ran fine; what was sacrificed was WASM call-stack depth (1M iters trapped call stack exhausted). Reframed correctly.

Declined:

  • Adding "split into inner helper without postcondition + outer wrapper" workaround to api.py's stack_exhausted trap message — too tangential for end-user-facing text.

Validation

After ae573d3:

  • pytest 3,863 passed, 14 skipped, 16 stress-deselected (test count went 3,892 → 3,893 from the new multi-branch test)
  • mypy clean (59 source files)
  • ruff S rules clean
  • doc-counts script consistent (3,893 tests across 32 files)
  • All 18 pre-commit hooks green (conformance 86/86, examples 34/34, walker coverage, license, site assets, browser parity, etc.)

Commit trail on this PR

Commit Purpose
1df1e8b Initial #549 fix + tests + doc sweep + v0.0.154 release prep
d7e54e4 Preserve return_call leading whitespace when patching $gc_sp restore
d78028e Add #672 (canonicalize WAT formatting) to ROADMAP CI tooling
ae25d84 Address CodeRabbit round-1 + fix lint failure
846ce05 Address CodeRabbit round-2 findings (TESTING.md sync + boundary-safe regex)
ae573d3 Address self-review findings (this comment)

@aallan

aallan commented May 13, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented May 13, 2026

Copy link
Copy Markdown
✅ Actions performed

Full review triggered.

@aallan aallan merged commit 27594f9 into main May 13, 2026
24 checks passed
@aallan aallan deleted the fix-549-tco-allocating-fns branch May 13, 2026 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant