Skip to content

Widen GC header size field from 16 to 31 bits (closes #484)#488

Merged
aallan merged 2 commits into
mainfrom
widen-gc-size-field
Apr 17, 2026
Merged

Widen GC header size field from 16 to 31 bits (closes #484)#488
aallan merged 2 commits into
mainfrom
widen-gc-size-field

Conversation

@aallan

@aallan aallan commented Apr 17, 2026

Copy link
Copy Markdown
Owner

Summary

  • Strip the i32.const 65535; i32.and pair from all 5 read sites in vera/codegen/assembly.py — 2 in $alloc free-list walks, 3 in $gc_collect (phase 1 clear-marks advance, phase 2b conservative scan, phase 3 sweep). Header still stores (size << 1) | mark; we just stop throwing away the high 15 bits on readback.
  • Max single allocation is now ~2 GB (bounded by WASM's 4 GB memory ceiling and the mark bit).
  • Uncap the array_map / array_filter stress tests from 8K → 10K elements (80,000 byte output).

Discovered along the way

Filed #487: $alloc grows memory by exactly 1 page regardless of need, so single allocations >~64 KB larger than free heap still trap. Orthogonal to this fix; stays under the existing 10K ceiling so it's not blocking.

Test plan

  • mypy vera/ clean (57 files)
  • pytest tests/ -q — 3,321 passed, 11 skipped
  • 10K array_map and array_filter stress tests now pass (previously compiled but returned corrupt data)
  • Pre-commit hooks green

Closes #484.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Corrected GC header sizing so large allocations at the 65,536‑byte boundary are handled correctly.
  • Documentation

    • Updated known issues to reflect current allocation failure mode and adjusted roadmap/test counts.
  • Tests

    • Increased stress-test input sizes for array map/filter and added a regression for the 65,536‑byte boundary.

`$alloc` has always stored object size as `(size << 1) | mark` in the
full i32 header, but the GC's read sites were masking with `0xFFFF`,
truncating any allocation >= 65536 bytes. The sweeper then
interpreted middle-of-payload bytes as tiny zero-size headers and
linked each 8-byte chunk into the free list, shredding the live
object and returning corrupted pointers to the caller.

Fix: strip the `i32.const 65535; i32.and` pair from all five read
sites in vera/codegen/assembly.py (two in `$alloc` free-list walks,
three in `$gc_collect` — phase 1 clear-marks advance, phase 2b scan,
phase 3 sweep). Max single allocation is now ~2 GB, bounded by
WASM's 4 GB memory ceiling and the leading mark bit.

Stress tests for array_map and array_filter uncapped from 8K back to
10K elements (80,000-byte output allocations). Updated test
docstrings to note the previous cap was #484 workaround, now
resolved. Removed the 16-bit caveat text from the translator
docstrings in calls_arrays.py. Removed the entry from KNOWN_ISSUES.

Testing the fix surfaced a separate bug: `$alloc` grows memory by
exactly 1 page regardless of how many pages the allocation needs,
so single allocations more than ~64 KB larger than free heap space
still trap. Filed as #487 and added to KNOWN_ISSUES.

Closes #484.

Co-Authored-By: Claude <noreply@anthropic.invalid>
@coderabbitai

coderabbitai Bot commented Apr 17, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 97d2ec2d-e347-4cee-95cd-10a4cf617671

📥 Commits

Reviewing files that changed from the base of the PR and between 8f45ec1 and 5df1b51.

📒 Files selected for processing (4)
  • ROADMAP.md
  • TESTING.md
  • tests/test_codegen_monomorphize.py
  • vera/codegen/assembly.py

📝 Walkthrough

Walkthrough

Removed 16-bit masking of GC object-header size everywhere it was applied, treating the header size as a full 31-bit value (derived from header >> 1, bit 0 = mark). Added an upfront trap for oversize allocations; updated tests and documentation to restore larger stress limits and reflect the new behaviour.

Changes

Cohort / File(s) Summary
Changelog & roadmap
CHANGELOG.md, ROADMAP.md
Inserted Unreleased entry documenting GC header-size fix; updated reported test counts in roadmap (3,332 → 3,333) in three places.
Known issues
KNOWN_ISSUES.md
Replaced prior 16-bit header-corruption entry with a different $alloc failure mode description (single-page growth leading to OOB trap for very-large allocations).
Testing metadata
TESTING.md
Adjusted total/passed test counts (3,332 → 3,333 / 3,321 → 3,322) and updated test_codegen_monomorphize.py per-file counts/line numbers.
Unit tests
tests/test_codegen_monomorphize.py
Increased stress inputs from 8,000 → 10,000 for two tests; updated expected values and docstrings; added test_array_map_exact_65536_byte_boundary (8,192 elements) to target the 65,536-byte boundary.
GC/allocator implementation
vera/codegen/assembly.py
Removed i32.const 65535 + i32.and masking in free-list traversal, post-GC retry, mark/sweep size extraction and conservative scan bounds; now derive size via header >> 1. Added an early unreachable trap in $alloc if size >= 0x80000000.
Docstring cleanup
vera/wasm/calls_arrays.py
Removed obsolete size-limit warnings from _translate_array_map and _translate_array_filter docstrings referencing the old 16-bit limitation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

compiler, tests, docs

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly summarises the main change: widening the GC header size field from 16 to 31 bits, which is the core technical fix implemented across assembly.py and verified by stress tests.
Linked Issues check ✅ Passed All coding requirements from #484 are satisfied: the 16-bit mask is removed from five header read sites, the invariant guard traps on oversized allocations, stress tests are restored to 10K elements, and the new exact 65536-byte boundary test is added.
Out of Scope Changes check ✅ Passed All changes are directly scoped to #484: assembly.py fixes the header extraction, test files verify the fix, documentation updates reflect the corrected test counts and known issues.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch widen-gc-size-field

Warning

Review ran into problems

🔥 Problems

Timed out fetching pipeline failures after 30000ms


Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented Apr 17, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.40%. Comparing base (bb99efb) to head (5df1b51).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #488   +/-   ##
=======================================
  Coverage   90.40%   90.40%           
=======================================
  Files          58       58           
  Lines       19845    19845           
  Branches      225      225           
=======================================
  Hits        17941    17941           
  Misses       1900     1900           
  Partials        4        4           
Flag Coverage Δ
javascript 50.67% <ø> (ø)
python 95.25% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_codegen_monomorphize.py`:
- Around line 907-928: Add two new tests (e.g., test_array_map_exact_65535_bytes
and test_array_map_exact_65536_bytes) modeled on
test_array_map_large_input_no_stack_overflow that exercise the exact 65,535 and
65,536-byte output boundaries: construct an `@Array` of an element type with a
known fixed byte-size (or use a byte/UInt8 element) and choose the array length
so the serialized/output payload is exactly 65,535 bytes and exactly 65,536
bytes, map over it with array_map and assert the final element via
_run(fn="main"); keep names array_range and array_map and reuse the same
pattern/assert style as the existing test.

In `@vera/codegen/assembly.py`:
- Around line 332-339: The allocator must reject sizes with the high (31st) bit
set before constructing the header so header = size << 1 cannot wrap; in the
$alloc entry (the block that constructs header via "local.get $size; i32.const
1; i32.shl" or equivalent) add a guard that tests (size & 0x80000000) == 0
(e.g., i32.const 0x80000000; i32.and; i32.eqz) and trap or return an OOM/error
if the test fails, ensuring the 31-bit invariant the collector relies on; apply
the same guard to the other header-construction site referenced (the block
around lines 724-728) so both allocation paths enforce the ~2GB bound.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e3ddba29-ad07-4568-ba8f-3f098f21f5f6

📥 Commits

Reviewing files that changed from the base of the PR and between bb99efb and 8f45ec1.

📒 Files selected for processing (6)
  • CHANGELOG.md
  • KNOWN_ISSUES.md
  • TESTING.md
  • tests/test_codegen_monomorphize.py
  • vera/codegen/assembly.py
  • vera/wasm/calls_arrays.py
💤 Files with no reviewable changes (1)
  • vera/wasm/calls_arrays.py

Comment thread tests/test_codegen_monomorphize.py
Comment thread vera/codegen/assembly.py
Two findings, both addressed:

- assembly.py: add a 31-bit invariant guard at the top of $alloc.
  Tests `(size & 0x80000000) != 0` and traps via `unreachable` if
  set.  Prevents silent zero-size headers if a caller ever requests
  >= 2^31 bytes (impossible today given WASM's 4 GB memory cap, but
  makes the invariant the sweeper depends on explicit).  Only added
  at the caller-input site (bump-alloc entry); the free-list reuse
  path reconstructs from node_size read back from a header that was
  already validated at its original $alloc, so it's safe by
  induction.

- tests: add test_array_map_exact_65536_byte_boundary — 8,192 Int
  elements × 8 bytes = 65,536 bytes, the first size that was
  corrupted under the old mask.  Tight regression guard: a future
  partial-mask regression could pass the 10K stress test but trip
  this one.  The suggested matching 65,535-byte test isn't cleanly
  reachable with Int granularity (8-byte aligned), and Byte support
  in array_map is independently broken — int_to_byte(3) returns
  32772 — so only the 65,536 case is tested.

Co-Authored-By: Claude <noreply@anthropic.invalid>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GC object header size field is 16-bit — allocations >65535 bytes corrupt memory

1 participant