Skip to content

html_parse and json_parse trap with Out-of-bounds memory access on inputs that pressure GC during host-side tree walk #692

@aallan

Description

@aallan

Summary

Both html_parse and json_parse trap with Out-of-bounds memory access on inputs large enough to pressure GC during host-side tree marshalling, escaping their declared Result<HtmlNode, String> / Result<Json, String> contracts. Confirmed root cause: missing shadow-stack rooting in vera.wasm.html_serde.write_html and vera.wasm.json_serde.write_json — the same bug class as #570 (array_map), #515 (fold accumulator), and #593 (closure-return shadow asymmetry), but on the host side rather than WAT-emitted user code.

External report from an agent building a summarise_urls example surfaced the html_parse trap. The agent's bisection ruled out a pure size threshold — same byte count triggers or doesn't depending on what's at both ends of the window — which is exactly what GC-pressure-driven corruption looks like (it depends on what got allocated before the host walker ran).

The parallel write_json bug is proven, not claimed — reproducer and trap output below.

Confirmed reproducers

html_parse (original report)

effect IO { op print(String -> Unit); }

public fn main(@Unit -> @Unit)
  requires(true) ensures(true) effects(<IO>)
{
  let @Result<String, String> = IO.read_file("/path/to/FAQ.md");
  match @Result<String, String>.0 {
    Ok(@String) -> {
      let @Result<HtmlNode, String> = html_parse(@String.0);
      match @Result<HtmlNode, String>.0 {
        Ok(@HtmlNode) -> IO.print("Ok\n"),
        Err(@String) -> IO.print("Err\n")
      }
    },
    Err(@String) -> IO.print("read failed\n")
  }
}

Output:

Error: Out-of-bounds memory access
Source backtrace:
  (suppressed 1 runtime-helper frame above first user code)
  in main  (file.vera:8-22)

Neither Ok nor Err is printed — the trap bypasses both match arms.

json_parse (verified during triage)

Generate a deep nested array (1000 sub-arrays × 50 numbers, ~192 KB input):

import json
data = [list(range(50)) for _ in range(1000)]
text = json.dumps(data)
# embed `text` into a Vera string literal that calls json_parse(@String.0)
effect IO { op print(String -> Unit); }

public fn main(@Unit -> @Unit)
  requires(true) ensures(true) effects(<IO>)
{
  let @String = "<the 192-KB JSON literal>";
  let @Result<Json, String> = json_parse(@String.0);
  match @Result<Json, String>.0 {
    Ok(@Json) -> IO.print("Ok\n"),
    Err(@String) -> IO.print(string_concat("Err: ", @String.0))
  }
}

Output:

Error: Unhandled WASM trap: wasm trap: out of bounds memory access

Same root cause, same fix needed.

Root cause (one bug class, two call sites)

The trap is a wasmtime out-of-bounds access at WASM address 0xfffffffd (= 1 - 4 in signed i32) from inside $alloc's free-list traversal:

loop $fl_loop
  local.get $node     ;; $node = 1 here
  i32.eqz
  br_if $fl_done
  local.get $node
  i32.const 4
  i32.sub             ;; 1 - 4 = 0xFFFFFFFD
  i32.load            ;; <-- TRAP: address 0xfffffffd out of bounds

The free list contains a corrupt node pointer of value 1. 1 is the tag value for HtmlText (and Result.Err). The corruption sequence:

  1. write_html (or write_json) allocates arr_ptr = alloc(child_count * 4) for an array of child pointers.
  2. Recursive call walks children, allocating strings + sub-trees. Eventually one of those allocs triggers $gc_collect (bump-alloc would overflow current memory).
  3. gc_collect scans the WASM shadow stack for live roots. arr_ptr is in a Python local — not on the shadow stack — so GC treats the block as garbage and links it into the free list. The free-list link is written into body[0] of the freed block.
  4. After GC returns, host code continues: writes a HtmlText tag (= 1) into a freshly-allocated body that happens to be at the same address as the freed arr_ptr (because the freed block got picked up by the next alloc). The free-list link in body[0] gets overwritten with 1.
  5. Some later alloc traverses the free list — picks up a block whose body[0] (i.e. node.next) is 1. Sets $gc_free_head = 1.
  6. The next alloc reads $node = 1, tries i32.load (node - 4) = read at 0xFFFFFFFD. Trap.

The exact same vulnerable shape lives in both files:

# vera/wasm/html_serde.py::write_html — element branch
arr_ptr = alloc(caller, child_count * 4)         # <- not rooted
for i, child in enumerate(children):
    child_ptr = write_html(...)                  # <- may GC and free arr_ptr
    write_i32(caller, arr_ptr + i * 4, child_ptr)  # <- writes to freed block

# vera/wasm/json_serde.py::write_json — JArray branch
arr_ptr = alloc(caller, count * 4)               # <- not rooted
for i, elem in enumerate(value):
    elem_ptr = write_json(...)                   # <- may GC and free arr_ptr
    write_i32(caller, arr_ptr + i * 4, elem_ptr)  # <- writes to freed block

name_ptr / wrapper_ptr (in write_html) and intermediate value pointers stored into map_dict (in write_json's JObject branch) have the same exposure.

Empirical confirmation

Patching the host so memory is pre-grown by 256 pages (16 MB) before entering write_html makes the html_parse trap disappear and returns Ok as expected:

mem = caller["memory"]
if mem.size(caller) < 250:
    mem.grow(caller, 256)

This proves the bug is GC-pressure-dependent: when no alloc during the walk actually needs to grow memory, $gc_collect never fires, no stale Python-held pointers get reclaimed, and the walk completes correctly. The same fix works for write_json by symmetry — same pattern, same workaround applies.

Scope

This affects every host walker that allocates more than one WASM object across Python-held pointers:

  • vera/wasm/html_serde.py::write_html — confirmed
  • vera/wasm/json_serde.py::write_json — confirmed
  • Any future host import that builds a complex ADT — same trap waiting

Fix options (sorted by effort)

Option 1 — pre-grow memory in host walkers (small, surgical). Have host_html_parse and host_json_parse estimate an upper bound on tree size from the input length, call caller["memory"].grow(...) to ensure enough headroom, then walk normally. Estimate: 2 * len(input) + 65536 bytes is comfortably conservative for both formats. No WAT changes, no new exports. Cons: over-grows memory for small inputs; doesn't fix the architectural issue; still vulnerable if a future walker forgets to do this.

Option 2 — add $gc_disabled flag + host helpers (medium). Add a (global $gc_disabled (mut i32) (i32.const 0)), modify $alloc to skip gc_collect when set (go straight to memory.grow), and export setter/getter helpers. Host walkers set it before walking, clear after. Cons: requires WAT changes; clearing-on-exception is easy to forget.

Option 3 — export $gc_sp + $gc_stack_limit, push intermediate pointers from the host (clean, larger). Architecturally correct: host walkers push their arr_ptr / name_ptr / wrapper_ptr onto the shadow stack before the next alloc. GC sees them as live. Cons: requires WAT exports + host helpers + careful push/pop discipline in every walker; the abstraction needs to handle exception unwinding.

Recommendation: ship Option 1 immediately to close both reported call sites. File Option 3 as a follow-up for the architectural fix that prevents the next host walker from inheriting the bug.

Test that pins the fix

Two tests/conformance/ entries:

  • html_parse on a checked-in copy of the current FAQ body — asserts Ok (or a well-formed Err).
  • json_parse on a generated nested-JSON fixture sized to trigger memory growth — asserts Ok.

Either Ok or a well-formed Err is contractually acceptable — a trap is not.

Workaround for users today

  • Per the original reporter's note: removing the html_parse call works if the input was always going to be plain text anyway.
  • For genuinely-HTML or genuinely-JSON inputs there is no Vera-level workaround — the pre-flight string_slice size cap the reporter considered would have caught their specific FAQ case but is unsound as a general fix because the bisection shows length alone isn't the trigger.

Files

  • vera/wasm/html_serde.py::write_html — primary site (confirmed)
  • vera/wasm/json_serde.py::write_json — same bug class, same fix needed (confirmed)
  • vera/codegen/api.py::host_html_parse / host_json_parse — would gain the pre-grow call (Option 1)
  • vera/codegen/assembly.py::_emit_alloc — would gain the $gc_disabled check (Option 2)

Original report

The reporter's full write-up is reproduced below.

Full bug report (click to expand)

[report verbatim]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions