Skip to content

Fix GC shadow stack overflow causing silent array corruption#473

Merged
aallan merged 6 commits into
mainfrom
fix-gc-shadow-stack-overflow
Apr 16, 2026
Merged

Fix GC shadow stack overflow causing silent array corruption#473
aallan merged 6 commits into
mainfrom
fix-gc-shadow-stack-overflow

Conversation

@aallan

@aallan aallan commented Apr 16, 2026

Copy link
Copy Markdown
Owner

Summary

  • Fix Type-checked pure program produces incorrect runtime output that isolated probes cannot reproduce #464: recursive functions with multiple Array parameters overflowed the 4K GC shadow stack into the adjacent worklist region, causing silent data corruption in array elements 0-3
  • Increase shadow stack from 4K to 16K (handles ~1,365 frames at 12 bytes each)
  • Increase GC worklist from 4K to 16K (proportional)
  • Add overflow guard in gc_shadow_push — traps with unreachable instead of silently corrupting memory
  • Add 2 regression tests for deep array accumulation (3,250 → 3,252 tests)

Root cause

The GC shadow stack and worklist are adjacent regions in linear memory. Each recursive frame of a function with 2 Array<Bool> parameters pushes 12 bytes (2 pointer params + 1 array_append destination). At 450 recursion depth (e.g. Conway's Game of Life on a 30x15 grid), the shadow stack overflowed at frame ~341 into the worklist region.

When GC triggered during the deep recursion, Phase 2's root scan and worklist push operated on the same memory — a concurrent read-write corruption. Intermediate arrays from deep frames weren't marked as roots, were swept (freed), and their first bytes were overwritten with free-list pointers.

Files changed

  • vera/codegen/assembly.py — shadow stack size, $gc_stack_limit global, worklist offset
  • vera/wasm/helpers.py — overflow guard in gc_shadow_push
  • tests/test_codegen.py — 2 regression tests + updated heap offset assertions
  • TESTING.md, ROADMAP.md — doc count updates

Test plan

  • Full Game of Life reproduction program now produces correct output
  • Glider-only variant also correct
  • 2 new regression tests pass
  • Full test suite: 3,241 passed, 11 skipped
  • mypy clean
  • All 23 pre-commit hooks pass

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Prevented silent corruption by adding a guard that traps on GC shadow‑stack overflow and adjusted GC memory layout to avoid overlap.
  • Tests

    • Added regression tests for deep recursion and shadow‑stack overflow.
    • Updated total test count to 3,253 (passed tests 3,242).
  • Documentation

    • Bumped project to v0.0.112 and updated ROADMAP, TESTING, CHANGELOG, HISTORY and README metrics and GC memory‑layout docs.

The GC shadow stack was 4K (4096 bytes). Recursive functions with
multiple Array parameters push ~12 bytes per frame, overflowing
at ~341 frames into the adjacent GC worklist region. When GC
triggered during the deep recursion, the corrupted worklist caused
intermediate arrays to be incorrectly freed, overwriting their
first bytes with free-list pointers.

Fix:
- Increase shadow stack from 4K to 16K (handles ~1,365 frames)
- Increase worklist from 4K to 16K (proportional)
- Add overflow guard in gc_shadow_push: trap instead of silent
  corruption
- Add regression tests for deep array accumulation
- Update doc counts (3,250 -> 3,252)

Closes #464

Co-Authored-By: Claude <noreply@anthropic.invalid>
@coderabbitai

coderabbitai Bot commented Apr 16, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1933cc56-bfdf-4a35-be7c-45abe62ffb77

📥 Commits

Reviewing files that changed from the base of the PR and between f0bd53f and 1ff9263.

📒 Files selected for processing (1)
  • KNOWN_ISSUES.md
💤 Files with no reviewable changes (1)
  • KNOWN_ISSUES.md

📝 Walkthrough

Walkthrough

GC shadow‑stack overflow detection added: assembly now exposes gc_stack_limit and computes gc_stack_size = 16384 and gc_worklist_size = 16384, adjusting gc_heap_start. gc_shadow_push emits an unsigned compare against gc_stack_limit and traps (unreachable) on overflow. Tests and docs updated; total tests increased to 3,253 and a TestGCShadowStackOverflow suite was added.

Changes

Cohort / File(s) Summary
Documentation
ROADMAP.md, TESTING.md, CHANGELOG.md, HISTORY.md, README.md, KNOWN_ISSUES.md
Version bump to 0.0.112; test counts and metrics updated (total tests → 3,253, passed tests adjusted in TESTING.md); changelog/history entries describing GC shadow‑stack fix; two bug entries removed from KNOWN_ISSUES.md.
Project metadata
pyproject.toml, vera/__init__.py
Project/package version advanced from 0.0.1110.0.112.
GC Implementation
vera/codegen/assembly.py
GC layout refactor: replace fixed 8KiB region with gc_stack_size = 16384 and gc_worklist_size = 16384; compute gc_heap_start = data_end + gc_stack_size + gc_worklist_size; add exported global gc_stack_limit; set worklist bounds to gc_stack_limit .. gc_heap_start in $gc_collect.
Shadow stack guard
vera/wasm/helpers.py
gc_shadow_push now emits an unsigned i32.ge_u compare of $gc_sp vs $gc_stack_limit and invokes unreachable on overflow before performing the store and advancing $gc_sp.
Tests
tests/test_codegen.py
Adjusted heap‑pointer expectations from 8192/819732768/32773 in TestHeapAllocation; added TestGCShadowStackOverflow with three regression tests (deep Array accumulators and an overflow trap check).

Sequence Diagram(s)

sequenceDiagram
    participant Compiler
    participant WasmModule
    participant Runtime
    participant Memory
    Compiler->>WasmModule: emit globals (`gc_stack_base`, `gc_stack_limit`, `gc_heap_start`)
    Compiler->>WasmModule: generate `gc_shadow_push` with check (`$gc_sp >= $gc_stack_limit`) -> `unreachable` on true
    Runtime->>WasmModule: call function that pushes root via `gc_shadow_push`
    alt shadow stack overflow
        WasmModule->>Runtime: trap (unreachable)
    else within bounds
        WasmModule->>Memory: store root, increment `$gc_sp`
    end
    Runtime->>WasmModule: trigger `$gc_collect`
    WasmModule->>Memory: use worklist bounds `wl_base = gc_stack_limit`, `wl_end = gc_heap_start` for marking
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

compiler, tests, docs

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix GC shadow stack overflow causing silent array corruption' directly reflects the core change: addressing overflow in the GC shadow stack that caused array corruption, with specific technical detail.
Linked Issues check ✅ Passed The PR fully addresses #464 requirements: reproduces and fixes the overflow bug, increases shadow stack from 4K to 16K, adds overflow guard in gc_shadow_push, adds regression tests, and eliminates silent corruption.
Out of Scope Changes check ✅ Passed All changes are narrowly scoped to the GC shadow stack overflow fix: codegen assembly layout, overflow guard in helpers, test assertions, and version/documentation updates are all directly related to the issue.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-gc-shadow-stack-overflow

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented Apr 16, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.35%. Comparing base (b3696e0) to head (1ff9263).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #473   +/-   ##
=======================================
  Coverage   90.35%   90.35%           
=======================================
  Files          50       50           
  Lines       19544    19548    +4     
  Branches      225      225           
=======================================
+ Hits        17659    17663    +4     
  Misses       1881     1881           
  Partials        4        4           
Flag Coverage Δ
javascript 50.67% <ø> (ø)
python 95.28% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_codegen.py`:
- Around line 10468-10536: Add a regression test method to
TestGCShadowStackOverflow that deliberately exceeds the shadow-stack guard and
asserts the runtime traps by calling _run_trap(...) on a program invoking
build_acc (or equivalent recursive main) with a very large depth (e.g., far
above the guarded limit); name it something like
test_deep_array_accumulation_trap and have it invoke the same build_acc/main
pattern used in existing tests but with a depth large enough to trigger the
overflow guard so the test fails with a trap instead of returning.

In `@vera/codegen/assembly.py`:
- Around line 508-514: In _emit_gc_collect where the Phase 2 worklist pushes use
wl_base/wl_end and currently skip enqueuing on overflow, replace the silent-skip
behavior with a fail-fast check: before incrementing or storing via wl_ptr,
compare wl_ptr and wl_end (or compute next_ptr) and if the worklist is full emit
a trap/abort or call the runtime panic path so the collector fails immediately
instead of dropping marks; update both push sites that reference wl_ptr/wl_end
to perform this check and call the same failure routine (or wasm unreachable) to
make overflow deterministic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5471e747-5ebf-4e52-a6ae-1a1243a5e1e7

📥 Commits

Reviewing files that changed from the base of the PR and between b3696e0 and 95e7f49.

📒 Files selected for processing (5)
  • ROADMAP.md
  • TESTING.md
  • tests/test_codegen.py
  • vera/codegen/assembly.py
  • vera/wasm/helpers.py

Comment thread tests/test_codegen.py
Comment thread vera/codegen/assembly.py
aallan and others added 2 commits April 16, 2026 11:31
- Bump version to 0.0.112
- Add CHANGELOG entry for shadow stack overflow fix
- Add HISTORY.md Stage 11 (real-world programming)
- Update by-the-numbers table (remove v0.0.103 column, add v0.0.111)
- Add shadow stack overflow trap test (CodeRabbit feedback)
- Update doc counts (3,253 tests)
- Update docs/index.html, README.md, site assets

Co-Authored-By: Claude <noreply@anthropic.invalid>
Shadow stack and worklist sizes updated to reflect the v0.0.112
fix. Document the overflow guard in gc_shadow_push.

Co-Authored-By: Claude <noreply@anthropic.invalid>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ROADMAP.md`:
- Line 229: Update the completed-phases summary in ROADMAP.md where it currently
reads "**630+ commits, 111 tagged releases, 3,253 tests, 96% coverage, 73
conformance programs, 30 examples, 13 spec chapters.**" to reflect the new
release count by changing "111 tagged releases" to "112 tagged releases"
(matching the 0.0.112 PR); ensure the single-line summary containing that exact
phrase is edited so other docs remain consistent.

In `@tests/test_codegen.py`:
- Around line 1657-1658: The assertions are too loose and can match the same
numeric constant used for `$gc_heap_start`; update the checks to bind the
constant to the `$heap_ptr` global specifically by asserting the initializer of
`global $heap_ptr` contains `i32.const 32773` (e.g. use a regex or parse
`result.wat` to find the `global $heap_ptr` line and verify it includes
`i32.const 32773`), and apply the same tighter check for the other test case
that currently checks `i32.const 32768` so both constants are validated only as
`$heap_ptr` initializers.
- Around line 10542-10563: The test
test_deep_array_accumulation_preserves_content only asserts
array_length(build_acc([], 500)) == 500 and therefore doesn't validate element
values; update the test to actually check contents produced by build_acc (or
rename it to indicate it's length-only). Specifically, in the test referencing
build_acc and main/array_length, either (preferred) iterate over build_acc([],
500) and assert every element is false (or compare to an expected array of 500
falses) to ensure values weren't corrupted, or change the test name to
test_deep_array_accumulation_preserves_length and leave only the length
assertion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4165b488-f0fe-4c12-a06f-f4805346f423

📥 Commits

Reviewing files that changed from the base of the PR and between 95e7f49 and 5d3cfd3.

⛔ Files ignored due to path filters (5)
  • docs/index.html is excluded by !docs/**
  • docs/index.md is excluded by !docs/**
  • docs/llms-full.txt is excluded by !docs/**
  • docs/llms.txt is excluded by !docs/**
  • docs/sitemap.xml is excluded by !docs/**
📒 Files selected for processing (8)
  • CHANGELOG.md
  • HISTORY.md
  • README.md
  • ROADMAP.md
  • TESTING.md
  • pyproject.toml
  • tests/test_codegen.py
  • vera/__init__.py

Comment thread ROADMAP.md Outdated
Comment thread tests/test_codegen.py
Comment thread tests/test_codegen.py Outdated
aallan and others added 2 commits April 16, 2026 11:39
- ROADMAP.md: 630+ commits → 800+, 111 releases → 112
- Rename test_deep_array_accumulation_preserves_content →
  test_deep_array_accumulation_preserves_length (name now
  matches what it actually checks)

Co-Authored-By: Claude <noreply@anthropic.invalid>
Co-Authored-By: Claude <noreply@anthropic.invalid>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tests/test_codegen.py (1)

10480-10513: ⚠️ Potential issue | 🟡 Minor

Assert the corrupted prefix window, not only element 0.

Issue #464 manifested in array elements 0–3, but this regression only inspects index 0. A future bug that scribbles indices 1–3 would still pass here, so please tighten the check to cover at least that prefix.

Suggested tightening
-private fn first_bool(`@Array`<Bool> -> `@Int`)
-  requires(array_length(`@Array`<Bool>.0) > 0)
+private fn first_four_true_count(`@Array`<Bool> -> `@Int`)
+  requires(array_length(`@Array`<Bool>.0) >= 4)
   ensures(true)
   effects(pure)
 {
-  if `@Array`<Bool>.0[0] then { 1 } else { 0 }
+  (if `@Array`<Bool>.0[0] then { 1 } else { 0 }) +
+  (if `@Array`<Bool>.0[1] then { 1 } else { 0 }) +
+  (if `@Array`<Bool>.0[2] then { 1 } else { 0 }) +
+  (if `@Array`<Bool>.0[3] then { 1 } else { 0 })
 }
@@
-  first_bool(build_acc([false], [false, false, false], 450))
+  first_four_true_count(build_acc([false], [false, false, false], 450))
 }
 """
-        assert _run(src) == 0  # first element must be false
+        assert _run(src) == 0  # prefix 0..3 must all remain false

As per coding guidelines, tests/test_codegen.py: Add codegen/runtime tests in tests/test_codegen.py — cover normal cases, edge cases (empty inputs, zero values), and composition with other built-ins.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_codegen.py` around lines 10480 - 10513, The test only checks index
0 via first_bool(main), so tighten it to verify a prefix (e.g., indices 0..3) to
catch corruption of the first few elements: update the Wasm source in
test_deep_array_accumulation_bool (functions build_acc, first_bool, public fn
main) so that main inspects at least elements 0 through 3 of the result of
build_acc (returning 0 only if all four are false, 1 otherwise), then keep the
Python assertion assert _run(src) == 0; this ensures the test fails if any of
the first four elements are corrupted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ROADMAP.md`:
- Line 11: Update the three occurrences of the test total that currently read
"3,253 tests" to "3,252 tests" so the document matches the PR's reported results
(3,250 → 3,252 and 3,241 passed + 11 skipped = 3,252); locate and replace the
exact sentence fragment "The compiler is complete end-to-end: ... The compiler
has 3,253 tests" (and the two other identical lines noted in the review) so all
three instances show "3,252" consistently.

---

Duplicate comments:
In `@tests/test_codegen.py`:
- Around line 10480-10513: The test only checks index 0 via first_bool(main), so
tighten it to verify a prefix (e.g., indices 0..3) to catch corruption of the
first few elements: update the Wasm source in test_deep_array_accumulation_bool
(functions build_acc, first_bool, public fn main) so that main inspects at least
elements 0 through 3 of the result of build_acc (returning 0 only if all four
are false, 1 otherwise), then keep the Python assertion assert _run(src) == 0;
this ensures the test fails if any of the first four elements are corrupted.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1da7a05d-6097-4f62-badc-bba89295835d

📥 Commits

Reviewing files that changed from the base of the PR and between 1557a8d and f0bd53f.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock, !uv.lock
📒 Files selected for processing (2)
  • ROADMAP.md
  • tests/test_codegen.py

Comment thread ROADMAP.md
#359 fixed in v0.0.109, #464 fixed in this PR.

Co-Authored-By: Claude <noreply@anthropic.invalid>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Type-checked pure program produces incorrect runtime output that isolated probes cannot reproduce

1 participant