Skip to content

Move Map/Set host stores from Python mirror to bucket-as-truth #706

@aallan

Description

@aallan

Architectural follow-up to the mirror fix landing in the PR that closes #695 and #705.

Context

The mirror fix closes the immediate bugs correctness-wise:

  • The conservative GC scan now reaches heap-pointer values via shadow stack → wrapper → WASM-resident bucket array → val_ptr.
  • The Python-side _map_store and _set_store remain the source of truth for actual map/set contents.
  • The bucket array is a write-only mirror, populated by host_attach_bucket (CLI) and imports.vera.attach_bucket_to_wrapper (browser).

The architectural debt: data lives in two places. Drift between the Python store and the WASM bucket is possible if a future change writes to one but not the other. Code paths multiply. Browser parity required reimplementing the population logic in JavaScript.

Goal: move to bucket-as-truth

Delete _map_store and _set_store. Make the WASM bucket array the sole source of truth. Host imports take wrapper_ptr (not opaque handle) and read/write the bucket directly. The wrapper IS the map / set value.

Three places this needs to land

  1. CLI Map host imports (vera/codegen/api.py):

    • host_map_new, host_map_size, _define_map_insert, _define_map_get, _define_map_contains, _define_map_remove, _define_map_keys, _define_map_values — all 8 currently take handle and read _map_store[handle]. Move: take wrapper_ptr, read bucket via wrapper_ptr + 8 (offset to bucket_ptr), use _dict_from_bucket to decode and _build_map_wrapper to return a new wrapper.
    • Delete _map_store, _map_alloc, host_attach_bucket (Map branch).
    • Codegen update (vera/wasm/calls_containers.py): drop _emit_unwrap_handle and _emit_wrap_handle for Map call sites — replace the post-call wrap with a simpler shadow-root sequence that pushes the returned wrapper_ptr onto the shadow stack.
  2. CLI Set host imports (vera/codegen/api.py): same pattern as Map. host_set_new, host_set_size, _define_set_add, _define_set_contains, _define_set_remove, _define_set_to_array.

  3. Browser runtime (vera/browser/runtime.mjs): JS parallel. Delete mapStore and setStore JS Maps; rewrite all imports.vera.map_* and imports.vera.set_* to use the WASM bucket layout. Equivalent encode/decode helpers in JS.

Bucket layout (preparatory work, not in the mirror PR)

The mirror PR ships with a 12-byte slot layout (key_word_0, key_word_1, val_word) and no bucket header. For the move, the layout needs to grow:

  • Slot size: 20 bytes (occupancy flag at +0, key low/high at +4/+8, val low/high at +12/+16). The occupancy flag lets non-string keys distinguish empty vs live without relying on key_word_0 == 0 (which fails for Int 0 keys). Val word pair lets string values store (ptr, len) inline without an extra heap allocation.
  • Bucket header: 8 bytes (capacity at +0, count at +4). Lets map_size return in O(1) via header.count instead of scanning slots.

Decimal exempt

Decimal is value-typed (PyDecimal in Python, BigInt in JS) — no heap pointers in the store entry, so the #695 class of bug cannot apply. The bucket_ptr field on Decimal wrappers stays 0 forever; host_attach_bucket short-circuits when kind == 3. Phase D leaves Decimal alone.

Performance challenge (what blocks the move)

A first-pass implementation hit O(N²) in pure-Python decode/encode: per-slot reads through caller["memory"].data_ptr(store)[addr:addr+4] are ~5μs each, so a 10000-element map_insert chain hangs (tens of minutes vs the old _map_store dict-copy at low-microsecond constants).

The fix is to batch the whole bucket region in one wasmtime memory access plus one struct.unpack / struct.pack call instead of per-i32. The encoders for string keys / string values still need per-string _alloc_string calls, but the slot writes can be batched at the end. Mechanical work, but ~1–2 hours of careful instrumentation to confirm the 10000-chain test (TestHostHandleReclamation573::test_map_chain_reclaims_transients) runs in reasonable time.

Acceptance

  • _map_store and _set_store deleted.
  • JSON parser path (_alloc_map_wrapper for Map<String, Json>) routes through the new bucket builder.
  • All existing tests pass, including the 10000-element reclamation chain.
  • Browser tests pass.
  • No mapStore / setStore in runtime.mjs.
  • Architecturally, the WASM bucket array is the single source of truth.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions