Skip to content

Extend TestHostPrintInvalidUtf8589 with end-to-end tests for the remaining 5 decode sites #592

@aallan

Description

@aallan

Summary

The TestHostPrintInvalidUtf8589 test class in tests/test_runtime_traps.py has six structural tests (one per affected site) plus one end-to-end synthetic-WAT test. The end-to-end test only covers host_print. The other five sites — host_stderr, host_contract_fail, _read_wasm_string, vera/wasm/markdown.py::_read_string, _extract_string — are pinned only by the structural tests.

Reported by pr-test-analyzer during the #590 review (rated 6/10 importance):

If a refactor moves any of those to call a _safe_decode() helper, all three structural tests break despite preserved behavior — and conversely, a regression that re-introduces a strict decode in a refactored helper called from these sites would not be caught at all. A single behavioral test per site (each is ~20 lines of WAT) would be more durable. Lowest-cost win: parametrize the existing synthetic test over the four import names.

The structural tests catch the most likely regression (someone drops errors="replace" from one site); the missing end-to-end coverage matters when refactoring centralises the decode into a helper.

Scope

Five additional end-to-end tests, modelled on test_invalid_utf8_through_host_print_does_not_raise:

  1. host_stderr — synthetic WAT importing vera.stderr, calling with crafted invalid UTF-8 bytes
  2. host_contract_fail — synthetic WAT importing vera.contract_fail, similar
  3. _read_wasm_string (used by read_file / get_env / etc.) — call through host_read_file or host_get_env
  4. vera/wasm/markdown.py::_read_string — call through host_md_render or one of the four md_* host imports
  5. _extract_string — set up a Vera program with a String-returning fn, manually patch the result memory before execute() returns, verify decoded value is U+FFFD-laced str (not int)

Or: parametrize a single test over an (import_name, type_signature, payload_construction) tuple — five pytest.mark.parametrize rows.

Why it's not blocking

  • Today the structural tests catch the most likely regression (manual deletion of errors="replace" from any site).
  • The single existing end-to-end test pins the wasmtime-trampoline contract (a Python UnicodeDecodeError inside a host import escapes as a "python exception" cause iff the host decode is strict).
  • A refactor that centralises the decodes into a helper would touch all sites at once and any decent regression suite would catch a defective helper via at least one of the affected programs.

Why it's worth doing

  • The structural tests use string greps on source files. Renaming functions, splitting comments across lines, or moving the implementation to a helper all break the structural tests even when behaviour is preserved.
  • A future refactor that's purely additive (e.g. introducing a _safe_decode() helper that all five sites use) would benefit from end-to-end tests that survive the refactor, vs structural tests that break.
  • ~100 lines of test code; mechanical to write.

Acceptance

  • Five additional tests in TestHostPrintInvalidUtf8589 (or one parametrized test with five rows) covering the remaining decode sites end-to-end.
  • All assert no Python exception escapes wasmtime's trampoline.
  • All assert U+FFFD appears in the decoded output.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions