You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Map<K, T_heap> values (heap-allocated WASM blocks stored as Python ints in _map_store[handle]) are invisible to the conservative GC scan. A $gc_collect triggered after the map is constructed will reclaim those blocks, leaving map_get returning pointers to freed memory.
Surfaced as an outside-diff observation by CodeRabbit on PR #693 (the #692 host-walker GC rooting fix) — specifically pointing at the JObject branch of vera/wasm/json_serde.py::write_json. The concern is real but pre-dates #692 and applies to the broader Map<K, T_heap> contract, so deferring as a separate issue rather than expanding #693.
Root cause
_alloc_map_wrapper (vera/codegen/api.py) stores its argument dict in a Python-side _map_store[handle]. For Map<String, Json> or Map<String, HtmlNode> the values are i32 heap pointers — but they only exist in Python memory, not WASM linear memory.
The conservative scan in $gc_collect Phase 2a walks the WASM shadow stack and any reachable heap blocks looking for i32-shaped pointers in the heap range. It never enters _map_store. The wrapper ADT carries the raw handle ORed with 0x80000000 (per #578) at body+4, which is structurally outside the heap range and correctly skipped by the scan — but that means the handle is dead-end from a tracing perspective. Heap blocks pointed to only from _map_store[handle] are unmarked → sweep reclaims them.
Which Map element types are affected
✅ Vulnerable: Map<K, Json>, Map<K, HtmlNode>, Map<K, Md*>, any user Map<K, T> where T is heap-allocated (e.g. Result<...>, Option<...>, ADTs, lists, arrays, …)
✅ Safe: Map<K, V_inline> where values are inline scalars (Int, Bool, Float, etc.) — those are stored as Python ints/bools and never reach the WASM heap
let @Json = json_parse("{\"key\": [1,2,3,4,5,6,7,8,9,10]}");
match @Json.0 {
Ok(@Json) -> {
-- After parse: wrapper_ptr is rooted via @Json; _map_store[handle]["key"]
-- points to a JArray heap block that is NOT reachable from the WASM scan.
-- Force GC pressure with a large unrelated alloc:
let @Array<Int> = array_range(0, 100000);
-- That alloc grows memory → triggers gc_collect → frees the JArray block
-- referenced from _map_store["key"].
let @Option<Json> = json_get(@Json.0, "key");
-- @Option now wraps a pointer to freed memory. Subsequent access:
match @Option<Json>.0 {
Some(@Json) -> {
let @Int = json_array_length(@Json.0);
-- Either traps or returns garbage depending on what landed at the freed slot.
IO.print(int_to_string(@Int.0))
},
None -> IO.print("none")
}
},
Err(_) -> IO.print("err")
}
The reporter should verify whether this reliably traps or returns garbage; the contract is broken either way.
Fix options
Option A — host-side WASM container per map entry. In _alloc_map_wrapper, for each value that is a heap pointer, allocate a tiny WASM container holding the i32, store the container pointer in _map_store[handle] instead of the raw int. Conservative scan reaches the container via the wrapper ADT and follows the contained pointer.
Option B — Extend GC tracing to walk _map_store. Phase 2c (which already iterates the wrap-table looking for unreachable wrappers) gains a second sweep: for each REACHABLE Map wrapper, mark every heap-pointer-shaped value in _map_store[handle]. Requires the host to know which values are heap pointers vs inline scalars (currently _map_store is untyped).
Option C — Don't store heap pointers in _map_store. All values are serialised into WASM memory at insertion time, deserialised at access. Heaviest change; possibly best long-term.
Option A is the surgical fix. Option B requires per-handle type info. Option C changes the Map contract.
The four TestHostWalkerGCRooting692 tests + the conformance test exercise json_parse then immediately match-and-print. No allocation happens between json_parse returning and the program exiting, so GC never fires post-walk. The val_ptrs in _map_store are technically leaked but the program ends before anything observes it.
A test that does json_parse(...) → match → trigger_large_alloc → map_get(..., "key") would surface the bug. Adding such a test is part of the work for this issue.
Workaround for users today
For inputs that genuinely require Map<K, T_heap> semantics: keep the map structurally shallow and avoid intermediate allocations between map construction and use. For JSON specifically: prefer JArray over JObject where the data model allows (JArray's backing IS in WASM memory and visible to GC).
Suggested next steps
Verify the reproducer above actually triggers the bug (write a unit test that fires GC between json_parse and json_get).
Pick a fix option (A is the surgical default).
Implement + add a regression test.
Audit other Map<K, T_heap> usage in the standard library / examples for shape-similar code.
Summary
Map<K, T_heap>values (heap-allocated WASM blocks stored as Python ints in_map_store[handle]) are invisible to the conservative GC scan. A$gc_collecttriggered after the map is constructed will reclaim those blocks, leavingmap_getreturning pointers to freed memory.Surfaced as an outside-diff observation by CodeRabbit on PR #693 (the #692 host-walker GC rooting fix) — specifically pointing at the JObject branch of
vera/wasm/json_serde.py::write_json. The concern is real but pre-dates #692 and applies to the broaderMap<K, T_heap>contract, so deferring as a separate issue rather than expanding #693.Root cause
_alloc_map_wrapper(vera/codegen/api.py) stores its argument dict in a Python-side_map_store[handle]. ForMap<String, Json>orMap<String, HtmlNode>the values are i32 heap pointers — but they only exist in Python memory, not WASM linear memory.The conservative scan in
$gc_collectPhase 2a walks the WASM shadow stack and any reachable heap blocks looking for i32-shaped pointers in the heap range. It never enters_map_store. The wrapper ADT carries the raw handle ORed with0x80000000(per #578) atbody+4, which is structurally outside the heap range and correctly skipped by the scan — but that means the handle is dead-end from a tracing perspective. Heap blocks pointed to only from_map_store[handle]are unmarked → sweep reclaims them.Which Map element types are affected
Map<K, Json>,Map<K, HtmlNode>,Map<K, Md*>, any userMap<K, T>whereTis heap-allocated (e.g.Result<...>,Option<...>, ADTs, lists, arrays, …)Map<K, V_inline>where values are inline scalars (Int, Bool, Float, etc.) — those are stored as Python ints/bools and never reach the WASM heapMap<K, String>— strings are stored as Pythonstrin_map_store, then re-encoded into WASM memory onmap_get. HtmlElement.attrs is in this category, which is why no html_parse and json_parse trap with Out-of-bounds memory access on inputs that pressure GC during host-side tree walk #692-equivalent bug fired thereReproducer (sketch — not yet verified)
The reporter should verify whether this reliably traps or returns garbage; the contract is broken either way.
Fix options
Option A — host-side WASM container per map entry. In
_alloc_map_wrapper, for each value that is a heap pointer, allocate a tiny WASM container holding the i32, store the container pointer in_map_store[handle]instead of the raw int. Conservative scan reaches the container via the wrapper ADT and follows the contained pointer.Option B — Extend GC tracing to walk
_map_store. Phase 2c (which already iterates the wrap-table looking for unreachable wrappers) gains a second sweep: for each REACHABLE Map wrapper, mark every heap-pointer-shaped value in_map_store[handle]. Requires the host to know which values are heap pointers vs inline scalars (currently_map_storeis untyped).Option C — Don't store heap pointers in
_map_store. All values are serialised into WASM memory at insertion time, deserialised at access. Heaviest change; possibly best long-term.Option A is the surgical fix. Option B requires per-handle type info. Option C changes the Map contract.
Why this didn't trip the #692 fix's tests
The four
TestHostWalkerGCRooting692tests + the conformance test exercisejson_parsethen immediately match-and-print. No allocation happens betweenjson_parsereturning and the program exiting, so GC never fires post-walk. The val_ptrs in_map_storeare technically leaked but the program ends before anything observes it.A test that does
json_parse(...) → match → trigger_large_alloc → map_get(..., "key")would surface the bug. Adding such a test is part of the work for this issue.Workaround for users today
For inputs that genuinely require
Map<K, T_heap>semantics: keep the map structurally shallow and avoid intermediate allocations between map construction and use. For JSON specifically: preferJArrayoverJObjectwhere the data model allows (JArray's backing IS in WASM memory and visible to GC).Suggested next steps
Map<K, T_heap>usage in the standard library / examples for shape-similar code.Related
_map_storerather than Python locals during host walks.