Skip to content

GC object header size field is 16-bit — allocations >65535 bytes corrupt memory #484

@aallan

Description

@aallan

Bug

The GC's object header packs size into a 16-bit field (masked with 0xFFFF in vera/codegen/assembly.py at lines 339 and 409), but $alloc happily accepts larger size parameters. When an allocation is ≥ 65536 bytes:

  1. $alloc stores (size << 1) | mark as the header. For size=80000, that's 0x27100.
  2. On the next GC cycle, $gc_collect reads the header back and extracts size as (header >> 1) & 0xFFFF = 80000 & 0xFFFF = 14464.
  3. The sweep advances by that truncated size (~14472 bytes), finds what it thinks are more headers in the middle of the live object, sees their low 32 bits as size=0 mark=0, and happily links every 8-byte chunk into the free list — clobbering the payload with free-list next pointers.

Reproducer

Add this test to tests/test_codegen_monomorphize.py::TestArrayOperations:

def test_large_allocation_gc_corruption(self) -> None:
    source = '''
public fn main(-> @Int)
  requires(true) ensures(true) effects(pure)
{
  let @Array<Int> = array_range(0, 10000);
  let @Array<Int> = array_map(@Array<Int>.0, fn(@Int -> @Int) effects(pure) { @Int.0 * 2 });
  @Array<Int>.0[9999]
}
'''
    assert _run(source, fn='main') == 19998  # Fails: returns 225512

The array_map output is 10000 × 8 = 80000 bytes — triggers memory.grow + GC during allocation, and sweep shreds the (still-live) input array.

Impact

  • Affects any single allocation > 65535 bytes: Strings, Array<Int|Nat|Float64> past ~8K elements, Array<String|Array<T>> past ~8K elements (pair stride), Array<Bool|Byte> past ~64K elements.
  • Discovered while implementing Reimplement higher-order array ops as iterative WASM loops #480 PR 1 (iterative array_map) — the stress test was originally 10K elements and hit this. Test reduced to 8K (64KB) as a workaround; raise it back once the header is widened.

Fix options

  1. Widen the size field to 31 bits (keep mark in bit 0): use the full i32 after the shift, no mask. Changes two i32.const 65535 / i32.and sites in codegen/assembly.py.
  2. Grow the header to 8 bytes (mark in one word, size in the other). More work, less efficient, but leaves room for generational bits later.

Option 1 is a ~4-line change and gets us to 2 GB objects — more than enough. Suggest this path.

Scope

Out of scope for #480 itself (that PR is about iterative combinators, not the allocator). Split into its own small PR right after #480 PR 1 lands, so the stress test can go back to 10K+.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions