Skip to content

GC collect itself faults with out-of-bounds memory access under sustained allocation pressure #515

@aallan

Description

@aallan

Summary

$gc_collect itself faults with out-of-bounds memory access when collecting under sustained allocation pressure. The collector is the top frame of the crashing stack, not the user code that triggered it. A 40×20 Game of Life grid running 200 generations reliably reproduces this on v0.0.119.

Reproducer

Attach life.vera (40×20 grid, 200 generations, builds each generation via tail-recursive array_append over an immutable Array<Array<Bool>>, with each cell lookup going through an Option<Nat> from int_to_nat). ~14,400 small allocations per generation.

vera run life.vera

Consistently crashes at some generation during next_row_rec:

    0:    0x33e - gc_collect
    1:    0x1b9 - alloc
    2:    0x55f - cell_at
    3:    0x662 - count_neighbors
    4:    0x6c3 - next_cell
    5:    0x9ce - next_row_rec
    ... (deep recursion of next_row_rec)

Caused by:
    0: memory fault at wasm address 0x180000 in linear memory of size 0x180000
    1: wasm trap: out of bounds memory access

The faulting address 0x180000 equals the current memory size — the GC is walking one byte past the end of linear memory.

Why this isn't user OOM

  • 40×20 = 800 cells. Grid state is trivially small.
  • Same program with 10×10×10 runs to completion.
  • The fault is inside $gc_collect, not inside $alloc failing to find a slot.
  • The memory is exactly at the page boundary (0x180000 = 24 × 64 KB).

The collector itself shouldn't fault regardless of how much pressure the mutator puts on it. A sweep that walks the heap needs to stop at $heap_ptr, not at the linear-memory bound — or memory-grow-on-exhaust if it's trying to mark beyond the current allocation.

Related

  • #487$alloc grows memory by only 1 page; single large requests trap. That's about the mutator's allocator path; this is about the collector's sweep path. They may share a root cause (memory growth policy during GC) but have distinct symptoms.
  • #484 fixed the sweeper's 16-bit size-field truncation. This is a separate crash — the sweep loop's bound check is wrong in a different way.

Initial investigation pointer

The fault is inside $gc_collect at WASM offset 0x33e. Inspect _emit_gc_collect in vera/codegen/assembly.py — specifically the Phase 2 mark-seed loop and the Phase 3 sweep loop — and look for any load/store that uses a pointer without an explicit < $heap_ptr bound check, or any memory.grow call missing its return-value check.

Discovered by an agent writing Conway's Game of Life in Vera. Same agent whose earlier feedback drove the Stage 11 primitives push.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions