You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A Vera program that calls IO.print with a String whose underlying bytes are not valid UTF-8 (typically because an upstream codegen bug corrupted a String's (ptr, len) pair) causes the runtime's host_print host import to raise an unhandled UnicodeDecodeError. wasmtime-py wraps this as a "python exception" cause and the full Python traceback escapes to the user's terminal — even though the program is valid Vera and the caller did nothing wrong.
A user-level program should NEVER produce a Python traceback. This violates the WasmTrapError contract established by #516 / #522 / #547 (every runtime trap is classified, carries a source backtrace, and surfaces a Vera-native fix paragraph).
Severity
Critical UX bug. The user sees a 30+ line Python traceback ending in UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 123: invalid start byte instead of a Vera-native runtime trap. The traceback exposes wasmtime-py internals (enter_wasm, trampoline, maybe_raise_last_exn) which are noise for the Vera user.
The trigger here is #588 (captured-Array indexing in closure produces silent string corruption that leaks into IO.print), but the right fix is independent: the host-print path must always handle invalid UTF-8 gracefully regardless of why the bytes are corrupt.
Reproducer
Any program that triggers a codegen bug producing corrupt String pointers can hit this. The simplest known reproducer is the full Conway's Game of Life from issue #588:
$ vera run life_full_program.vera
[... ANSI animation runs for one generation ...]
Generation 0 of 200
Traceback (most recent call last):
File "/.../wasmtime/_func.py", line 234, in enter_wasm
yield byref(trap)
File "/.../wasmtime/_func.py", line 103, in __call__
raise WasmtimeError._from_ptr(error)
wasmtime._error.WasmtimeError: error while executing at wasm backtrace:
0: 0xf0f - <unknown>!run_loop
1: 0x1335 - <unknown>!run_loop
Caused by:
python exception
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/.../vera/cli.py", line 1237, in main
sys.exit(cmd_run(...))
File "/.../vera/cli.py", line 600, in cmd_run
exec_result = execute(...)
File "/.../vera/codegen/api.py", line 3140, in execute
raw_result = func(store, *call_args)
File "/.../wasmtime/_func.py", line 93, in __call__
with enter_wasm(store) as trap:
File "/.../wasmtime/_func.py", line 240, in enter_wasm
maybe_raise_last_exn()
File "/.../wasmtime/_func.py", line 250, in maybe_raise_last_exn
raise exn
File "/.../wasmtime/_func.py", line 199, in trampoline
pyresults = func(*pyparams)
File "/.../vera/codegen/api.py", line 1060, in host_print
text = data.decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 123: invalid start byte
Synthetic minimal reproducer (independent of any codegen bug) — call host_print directly via the linker with raw invalid UTF-8 bytes; same crash.
Root cause
vera/codegen/api.py has five sites that decode UTF-8 from WASM linear memory using strict mode (the default):
All five raise UnicodeDecodeError if the bytes aren't valid UTF-8. wasmtime-py propagates exceptions from host-import callbacks back to the caller as a wrapped "python exception" — the traceback escapes through cmd_run to the user's terminal.
This is purely a defensive-coding gap. There's no architectural reason any of these sites should crash on bad input.
Fix
Two complementary changes:
Per-site errors="replace" for host_print / host_stderr / host_contract_fail / _read_wasm_string / _extract_string. Invalid UTF-8 bytes become U+FFFD (replacement character), output continues, no crash. Standard Unix-tool defensive behaviour for stdout/stderr pipes.
Optionally, a Vera-native trap on first invalid byte for host_print specifically — surfaces the underlying bug instead of silently masking it. But errors="replace" alone closes the user-facing severity (no Python traceback) and lets programs continue running long enough to surface other diagnostic info.
Recommendation: ship errors="replace" first (closes the severity); design a Vera-native diagnostic for the underlying corruption as a follow-up.
Acceptance
The Conway's Life reproducer no longer crashes Python; it either:
Continues running with U+FFFD characters in the output (replace mode), or
Traps with a WasmTrapError carrying a Vera-native message.
New regression test in tests/test_codegen.py calling execute() on a program that produces invalid UTF-8 bytes; assertion that no Python exception escapes (only WasmTrapError or a clean stdout with replacement chars).
Summary
A Vera program that calls
IO.printwith aStringwhose underlying bytes are not valid UTF-8 (typically because an upstream codegen bug corrupted a String's(ptr, len)pair) causes the runtime'shost_printhost import to raise an unhandledUnicodeDecodeError. wasmtime-py wraps this as a "python exception" cause and the full Python traceback escapes to the user's terminal — even though the program is valid Vera and the caller did nothing wrong.A user-level program should NEVER produce a Python traceback. This violates the
WasmTrapErrorcontract established by #516 / #522 / #547 (every runtime trap is classified, carries a source backtrace, and surfaces a Vera-native fix paragraph).Severity
Critical UX bug. The user sees a 30+ line Python traceback ending in
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 123: invalid start byteinstead of a Vera-native runtime trap. The traceback exposes wasmtime-py internals (enter_wasm,trampoline,maybe_raise_last_exn) which are noise for the Vera user.The trigger here is #588 (captured-Array indexing in closure produces silent string corruption that leaks into
IO.print), but the right fix is independent: the host-print path must always handle invalid UTF-8 gracefully regardless of why the bytes are corrupt.Reproducer
Any program that triggers a codegen bug producing corrupt String pointers can hit this. The simplest known reproducer is the full Conway's Game of Life from issue #588:
Synthetic minimal reproducer (independent of any codegen bug) — call
host_printdirectly via the linker with raw invalid UTF-8 bytes; same crash.Root cause
vera/codegen/api.pyhas five sites that decode UTF-8 from WASM linear memory using strict mode (the default):host_print(line 1060) —IO.printhost_stderr(line 1212) —IO.stderrhost_contract_fail(line 1265) — runtime contract violation messages_read_wasm_stringhelper (line 850) — used byread_file,write_file,get_env, etc._extract_string(line 3239) — String return-value decoder added by v0.0.135: fix #584 (Unit fn non-tail), #583 (Array<T> aliases), #568 (url_parse leading colon) #585All five raise
UnicodeDecodeErrorif the bytes aren't valid UTF-8. wasmtime-py propagates exceptions from host-import callbacks back to the caller as a wrapped "python exception" — the traceback escapes throughcmd_runto the user's terminal.This is purely a defensive-coding gap. There's no architectural reason any of these sites should crash on bad input.
Fix
Two complementary changes:
Per-site
errors="replace"forhost_print/host_stderr/host_contract_fail/_read_wasm_string/_extract_string. Invalid UTF-8 bytes become U+FFFD (replacement character), output continues, no crash. Standard Unix-tool defensive behaviour for stdout/stderr pipes.Optionally, a Vera-native trap on first invalid byte for
host_printspecifically — surfaces the underlying bug instead of silently masking it. Buterrors="replace"alone closes the user-facing severity (no Python traceback) and lets programs continue running long enough to surface other diagnostic info.Recommendation: ship
errors="replace"first (closes the severity); design a Vera-native diagnostic for the underlying corruption as a follow-up.Acceptance
WasmTrapErrorcarrying a Vera-native message.tests/test_codegen.pycallingexecute()on a program that produces invalid UTF-8 bytes; assertion that no Python exception escapes (onlyWasmTrapErroror a clean stdout with replacement chars).Related