Summary
Sibling to #695. Set<T_heap> values stored in _set_store have the same architectural problem as Map<K, T_heap>: heap-allocated WASM blocks held only as Python ints in _set_store[handle] are invisible to the conservative GC scan. A $gc_collect triggered after the set is constructed will reclaim those blocks, leaving subsequent membership checks and iterations returning pointers to freed memory.
Surfaced during the analysis of #695 (which focuses on Map); the Set surface uses the identical mechanism but wasn't called out in the original report because the JObject/HtmlElement parsers exercise Map, not Set.
Root cause
_set_store is declared in vera/codegen/api.py line 2513:
_set_store: dict[int, set[object]] = {}
For Set<JObject> or Set<HtmlNode>, the set elements are i32 heap pointers — but they only exist in Python memory, not WASM linear memory. Same conservative-scan blind spot as #695: Phase 2a walks WASM heap blocks, never _set_store. The wrapper handle's bit-31 tagging makes it correctly skipped by the scan, but blocks pointed to only from _set_store[handle] are unreachable and get reclaimed.
Which Set element types are affected
- ✅ Vulnerable:
Set<Json>, Set<HtmlNode>, Set<Md*>, any user Set<T> where T is heap-allocated (ADTs, lists, arrays, user records, …)
- ✅ Safe:
Set<V_inline> where elements are inline scalars (Set<Int>, Set<Bool>, Set<Float>) — stored as Python primitives, no WASM heap pointer involved
- ✅ Safe:
Set<String> — strings stored as Python str, re-encoded into WASM memory on iteration
Why this didn't surface earlier
Same reason as #695: existing tests construct a Set, immediately iterate or membership-check, and exit. No allocation between construction and use → no GC pressure → reachability bug latent. A test that does set_insert(heap_value) → trigger_large_alloc → set_contains(...) would surface it.
Fix
This will share a fix with #695. The three options outlined in #695's body (host-side WASM container per entry / extend Phase 2c tracing / serialise values into WASM at insertion) apply symmetrically: the per-entry WASM container approach generalises cleanly to both Map and Set with the same insertion-time hook.
Closing this issue together with #695 in the same PR is appropriate — the codegen surface is shared, the regression test pattern is parallel, and the architectural shape is identical.
Related
- #695 — Map sibling; this issue tracks the Set side of the same architectural family
- #570 / #515 / #593 / #692 — same missing-shadow-root bug class but in WAT-emitted code / host walk locals rather than host-side stores
- #578 — bit-31 handle tagging that makes the conservative scan correctly skip the handle field; this issue is the dual concern of "what the handle points to is also unreachable to the scan"
Summary
Sibling to #695.
Set<T_heap>values stored in_set_storehave the same architectural problem asMap<K, T_heap>: heap-allocated WASM blocks held only as Python ints in_set_store[handle]are invisible to the conservative GC scan. A$gc_collecttriggered after the set is constructed will reclaim those blocks, leaving subsequent membership checks and iterations returning pointers to freed memory.Surfaced during the analysis of #695 (which focuses on Map); the Set surface uses the identical mechanism but wasn't called out in the original report because the JObject/HtmlElement parsers exercise Map, not Set.
Root cause
_set_storeis declared invera/codegen/api.pyline 2513:For
Set<JObject>orSet<HtmlNode>, the set elements are i32 heap pointers — but they only exist in Python memory, not WASM linear memory. Same conservative-scan blind spot as #695: Phase 2a walks WASM heap blocks, never_set_store. The wrapper handle's bit-31 tagging makes it correctly skipped by the scan, but blocks pointed to only from_set_store[handle]are unreachable and get reclaimed.Which Set element types are affected
Set<Json>,Set<HtmlNode>,Set<Md*>, any userSet<T>whereTis heap-allocated (ADTs, lists, arrays, user records, …)Set<V_inline>where elements are inline scalars (Set<Int>,Set<Bool>,Set<Float>) — stored as Python primitives, no WASM heap pointer involvedSet<String>— strings stored as Pythonstr, re-encoded into WASM memory on iterationWhy this didn't surface earlier
Same reason as #695: existing tests construct a Set, immediately iterate or membership-check, and exit. No allocation between construction and use → no GC pressure → reachability bug latent. A test that does
set_insert(heap_value) → trigger_large_alloc → set_contains(...)would surface it.Fix
This will share a fix with #695. The three options outlined in #695's body (host-side WASM container per entry / extend Phase 2c tracing / serialise values into WASM at insertion) apply symmetrically: the per-entry WASM container approach generalises cleanly to both Map and Set with the same insertion-time hook.
Closing this issue together with #695 in the same PR is appropriate — the codegen surface is shared, the regression test pattern is parallel, and the architectural shape is identical.
Related