Summary
The TestHostPrintInvalidUtf8589 test class in tests/test_runtime_traps.py has six structural tests (one per affected site) plus one end-to-end synthetic-WAT test. The end-to-end test only covers host_print. The other five sites — host_stderr, host_contract_fail, _read_wasm_string, vera/wasm/markdown.py::_read_string, _extract_string — are pinned only by the structural tests.
Reported by pr-test-analyzer during the #590 review (rated 6/10 importance):
If a refactor moves any of those to call a _safe_decode() helper, all three structural tests break despite preserved behavior — and conversely, a regression that re-introduces a strict decode in a refactored helper called from these sites would not be caught at all. A single behavioral test per site (each is ~20 lines of WAT) would be more durable. Lowest-cost win: parametrize the existing synthetic test over the four import names.
The structural tests catch the most likely regression (someone drops errors="replace" from one site); the missing end-to-end coverage matters when refactoring centralises the decode into a helper.
Scope
Five additional end-to-end tests, modelled on test_invalid_utf8_through_host_print_does_not_raise:
host_stderr — synthetic WAT importing vera.stderr, calling with crafted invalid UTF-8 bytes
host_contract_fail — synthetic WAT importing vera.contract_fail, similar
_read_wasm_string (used by read_file / get_env / etc.) — call through host_read_file or host_get_env
vera/wasm/markdown.py::_read_string — call through host_md_render or one of the four md_* host imports
_extract_string — set up a Vera program with a String-returning fn, manually patch the result memory before execute() returns, verify decoded value is U+FFFD-laced str (not int)
Or: parametrize a single test over an (import_name, type_signature, payload_construction) tuple — five pytest.mark.parametrize rows.
Why it's not blocking
- Today the structural tests catch the most likely regression (manual deletion of
errors="replace" from any site).
- The single existing end-to-end test pins the wasmtime-trampoline contract (a Python
UnicodeDecodeError inside a host import escapes as a "python exception" cause iff the host decode is strict).
- A refactor that centralises the decodes into a helper would touch all sites at once and any decent regression suite would catch a defective helper via at least one of the affected programs.
Why it's worth doing
- The structural tests use string greps on source files. Renaming functions, splitting comments across lines, or moving the implementation to a helper all break the structural tests even when behaviour is preserved.
- A future refactor that's purely additive (e.g. introducing a
_safe_decode() helper that all five sites use) would benefit from end-to-end tests that survive the refactor, vs structural tests that break.
- ~100 lines of test code; mechanical to write.
Acceptance
- Five additional tests in
TestHostPrintInvalidUtf8589 (or one parametrized test with five rows) covering the remaining decode sites end-to-end.
- All assert no Python exception escapes wasmtime's trampoline.
- All assert U+FFFD appears in the decoded output.
Related
Summary
The
TestHostPrintInvalidUtf8589test class intests/test_runtime_traps.pyhas six structural tests (one per affected site) plus one end-to-end synthetic-WAT test. The end-to-end test only covershost_print. The other five sites —host_stderr,host_contract_fail,_read_wasm_string,vera/wasm/markdown.py::_read_string,_extract_string— are pinned only by the structural tests.Reported by
pr-test-analyzerduring the #590 review (rated 6/10 importance):The structural tests catch the most likely regression (someone drops
errors="replace"from one site); the missing end-to-end coverage matters when refactoring centralises the decode into a helper.Scope
Five additional end-to-end tests, modelled on
test_invalid_utf8_through_host_print_does_not_raise:host_stderr— synthetic WAT importingvera.stderr, calling with crafted invalid UTF-8 byteshost_contract_fail— synthetic WAT importingvera.contract_fail, similar_read_wasm_string(used byread_file/get_env/ etc.) — call throughhost_read_fileorhost_get_envvera/wasm/markdown.py::_read_string— call throughhost_md_renderor one of the fourmd_*host imports_extract_string— set up a Vera program with a String-returning fn, manually patch the result memory beforeexecute()returns, verify decoded value is U+FFFD-laced str (not int)Or: parametrize a single test over an
(import_name, type_signature, payload_construction)tuple — fivepytest.mark.parametrizerows.Why it's not blocking
errors="replace"from any site).UnicodeDecodeErrorinside a host import escapes as a "python exception" cause iff the host decode is strict).Why it's worth doing
_safe_decode()helper that all five sites use) would benefit from end-to-end tests that survive the refactor, vs structural tests that break.Acceptance
TestHostPrintInvalidUtf8589(or one parametrized test with five rows) covering the remaining decode sites end-to-end.Related