Fix #589: host_print never escapes Python UnicodeDecodeError on bad bytes#590
Conversation
…ytes A user-level program should NEVER produce a Python traceback regardless of what the program does. This is the WasmTrapError contract from #516 / #522 / #547 — runtime traps must surface as classified Vera-native errors, not raw wasmtime/Python internals. The host UTF-8 decode paths violated this contract. When an upstream codegen bug produced a corrupt String (ptr, len) pair pointing at non- UTF-8 bytes, host_print's strict-mode `bytes.decode("utf-8")` raised UnicodeDecodeError. wasmtime-py wrapped the exception as a "python exception" cause and the user's CLI saw a 30+ line Python traceback ending in `UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1`. Fix is per-site `errors="replace"` for the four affected decode paths: - host_print (vera.print) - host_stderr (vera.stderr) - host_contract_fail (vera.contract_fail) - _read_wasm_string (used by read_file / get_env / etc.) Invalid bytes now surface as U+FFFD replacement characters in the user's output rather than escaping as a Python traceback. The program continues running until either main returns normally or hits a different trap. Surfaced by #588 (captured-Array indexing in closure body produces corrupt String pointers in Conway's Game of Life programs); fixing #588 removes the most common trigger but the defensive-coding hygiene applies regardless of source. Other future codegen bugs that corrupt strings will surface visibly (U+FFFD in output) instead of crashing Python. Tests added (tests/test_runtime_traps.py TestHostPrintInvalidUtf8589): - Four structural assertions (one per affected site) that the fix is in place — `errors="replace"` is the regression we're pinning. - End-to-end test using a synthetic WAT module that imports vera.print and invokes it with raw invalid UTF-8 bytes. Asserts no Python exception escapes wasmtime's trampoline and that U+FFFD is in the decoded output. Independent of any Vera-source codegen bug. Conway's Life from #588 now runs without a Python traceback (corrupt strings render as U+FFFD chars instead of crashing). Doc updates: - CHANGELOG entry under [Unreleased] Fixed (no version bump — rolls into whatever 0.0.136 ends up being once PR #587 lands). - KNOWN_ISSUES.md gets #588 (the upstream bug). #589 doesn't go in KNOWN_ISSUES because this PR closes it. - test count surfaces in TESTING.md and ROADMAP.md updated for the +5 new tests. Co-Authored-By: Claude <noreply@anthropic.invalid>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughHardened UTF‑8 decoding for WASM-to-Python string paths so invalid bytes are surfaced as U+FFFD (replacement character) instead of raising ChangesUTF‑8 decoding resilience and tests (
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #590 +/- ##
=======================================
Coverage 90.97% 90.98%
=======================================
Files 59 59
Lines 22930 22927 -3
Branches 259 259
=======================================
- Hits 20860 20859 -1
+ Misses 2063 2061 -2
Partials 7 7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
@coderabbitai help |
ChatThere are 3 ways to chat with CodeRabbit:
CodeRabbit commands
Other keywords and placeholders
Status, support, documentation and community
|
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
Code-reviewer flagged a 5th decode site missed by the initial pass: vera/wasm/markdown.py::_read_string — invoked from host_md_render / host_md_has_heading / host_md_extract_text / host_md_count_blocks for every Markdown host import. Same failure mode as #589 just routed through Markdown. Now uses errors="replace" with a docstring cross-referencing the api.py sites. Silent-failure-hunter flagged _extract_string at api.py:3259 as inconsistent: it had a try/except UnicodeDecodeError → pointer fallback that silently mutated the return type from str to int. Worse silent failure than U+FFFD because downstream consumers (CLI printer) printed an integer where a string was expected — looks like a valid result. Switched to errors="replace" matching the four sibling sites; value stays a str, invalid bytes surface as U+FFFD. PR-test-analyzer flagged two test-quality issues: - Structural marker for host_print test was anchored on a comment string ("Host function: vera.print"). Anchored on "def host_print(" instead, matching the pattern used for the other three structural tests. Less fragile to comment refactoring. - Class docstring claimed "Tests here are structural assertions" but the class also contains an end-to-end test. Reworded to "primarily structural assertions" plus an explicit note that the e2e test pins the wasmtime-trampoline contract rather than the production path. Added two new structural tests: - test_markdown_read_string_uses_errors_replace - test_extract_string_uses_errors_replace Plus a private _file_body_after helper so future tests in this class can assert on any source file, not just api.py. CHANGELOG entry expanded to mention the markdown.py site and the _extract_string str→int fix. Test counts bumped 3,733 → 3,735 (+2 tests in test_runtime_traps.py). The HTTP/Inference network-decode sites at api.py:756 / 2809 / 2841 were considered but left for a follow-up — they're a different layer (network response, not WASM memory) and deserve their own analysis of whether replace-mode or structured-error is the right choice. Co-Authored-By: Claude <noreply@anthropic.invalid>
Review findings (logged for future reference)Ran Critical — fixed1. Coverage gap: 5th UTF-8 decode site missed. 2. Silent type mutation in Important — fixed3. Structural test anchored on fragile comment string. 4. Class docstring undersells the e2e test. Said "Tests here are structural assertions on the source" but the class also contains Important — partial5. End-to-end coverage thin for sites 2–6. Only Mild — explicitly deferred6. "Test-the-test" concern in 7. HTTP / Inference network-decode sites. 8. Strengths confirmed by all four agents
Final test surface
Total tests on the PR: 3,728 → 3,735 (+7). |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@TESTING.md`:
- Line 9: The test-count row in TESTING.md is inconsistent (it shows 3,735 total
but lists 3,714 passed + 14 skipped = 3,728); run the canonical count command
(pytest tests/ --collect-only -q or python scripts/check_doc_counts.py) to get
the authoritative numbers and then update the table row in TESTING.md so the
total equals the sum of all status buckets (or add the missing status bucket
explicitly), e.g., adjust the total to 3,728 or include the omitted bucket and
its count to reconcile the math.
In `@tests/test_runtime_traps.py`:
- Around line 1866-1894: The current assertions in
test_read_wasm_string_uses_errors_replace and
test_markdown_read_string_uses_errors_replace only check for the literal
'errors="replace"' which can live in docstrings/comments; update each test (the
bodies obtained via _api_body_after and _file_body_after) to assert that the
decode call actually includes the errors argument — e.g. verify the body
contains a decode invocation with errors="replace" (use a regex like
'\.decode\([^)]*errors\s*=\s*["\']replace["\']' or check both '.decode(' and
'errors="replace"' occur near each other) instead of the simple substring check
so the tests fail if the real .decode(..., errors="replace") call is removed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: d5fe3429-702c-46b3-ac74-f75a6ded446b
📒 Files selected for processing (7)
CHANGELOG.mdKNOWN_ISSUES.mdROADMAP.mdTESTING.mdtests/test_runtime_traps.pyvera/codegen/api.pyvera/wasm/markdown.py
Two follow-up issues filed from the PR #590 review findings, now logged in KNOWN_ISSUES so they don't get lost in the PR comment thread once this PR merges. #591 — HTTP/Inference network response UTF-8 decode hygiene. Three sites in vera/codegen/api.py use strict-mode bytes.decode("utf-8"): - line 756 (Inference.complete via _call_inference_provider) - line 2809 (Http.get host_http_get) - line 2841 (Http.post host_http_post) All three are wrapped in try/except Exception → Result::Err so they DON'T escape as Python tracebacks (lower severity than #589). But the error message handed to the user contains "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x..." — Python-internals noise rather than a Vera-native diagnostic. Filed in KNOWN_ISSUES under Bugs. #592 — e2e test parametrization for the remaining 5 UTF-8 decode sites. TestHostPrintInvalidUtf8589 has 6 structural source-greps and 1 end-to-end synthetic-WAT test (host_print only). The other 5 sites are pinned only structurally. A future refactor that centralises the decodes into a _safe_decode() helper would break the structural greps even with preserved behaviour. Filed in KNOWN_ISSUES under a new "Test coverage gaps" section since it doesn't fit cleanly into "Refactoring needed" (which is about file decomposition). Co-Authored-By: Claude <noreply@anthropic.invalid>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@KNOWN_ISSUES.md`:
- Line 12: Update the KNOWN_ISSUES.md entry to correct the claim about exception
handling: state that only Http.get and Http.post are wrapped in try/except
Exception (so they return a Vera-level Err), while Inference.complete performs
the UTF-8 decode inside a with block with no exception handler and therefore can
directly raise UnicodeDecodeError; adjust the severity/language to reflect that
Inference.complete is exposed and not protected like Http.get/Http.post and
suggest the intended decision options (`errors="replace"` vs explicit detection
with a Vera-native Err) remain under consideration.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 93c3d27f-8f96-4e9b-9567-321fcdd4876d
📒 Files selected for processing (1)
KNOWN_ISSUES.md
…ertions Both findings on PR #590 verified valid; fixed both with minimal changes. 1. TESTING.md line 9 had stale passed count: "3,735 across 29 files (3,714 passed + 14 skipped)". 3,714 + 14 = 3,728, not 3,735. The earlier sed in the review-fixes commit (c22980f) targeted "3,719 passed" → "3,721 passed" but the original was "3,714 passed", so the substitution missed. Updated to "3,721 passed" so the math reconciles (3,721 + 14 = 3,735, matching pytest's collected count). 2. test_read_wasm_string_uses_errors_replace and test_markdown_read_string_uses_errors_replace asserted only that the substring 'errors="replace"' appeared in the source body. Both sites have docstrings I expanded with prose mentions of `errors="replace"` (explaining the strategy), so the substring would still match if the actual `.decode(...)` call was reverted to strict mode — the tests would pass spuriously. Tightened both to assert on the literal call form '.decode("utf-8", errors="replace")' which appears only in actual call sites (the docstrings use ``errors="replace"`` with markdown backticks, never the leading `.decode(` prefix). Verified by negative test: temporarily reverting markdown.py's call to strict mode triggers the tightened assertion failure as expected. Updated docstrings on both tests to explain why the literal-call form is required. The other four structural tests (host_print / host_stderr / host_contract_fail / _extract_string) already used the literal call form and were not affected. Co-Authored-By: Claude <noreply@anthropic.invalid>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@TESTING.md`:
- Line 82: Update the long "What it covers" summary for the test row to mention
the new UTF-8 hardening suite TestHostPrintInvalidUtf8589: state that it adds
the six structural decode-site assertions plus the synthetic WAT end-to-end
no-traceback regression and that these tests cover invalid-UTF-8 handling in
host printing; edit the row text that lists covered items (the multi-clause
paragraph referencing trap kinds, tee suite, source-mapping, etc.) to append a
short phrase like "and UTF‑8 hardening: TestHostPrintInvalidUtf8589 (6
decode-site assertions + 1 synthetic WAT end-to-end no-traceback test)" so the
documented coverage matches the updated test count.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0976b806-5f60-4f43-a9f1-d92eb81a3930
📒 Files selected for processing (2)
TESTING.mdtests/test_runtime_traps.py
Rabbit finding: the test_runtime_traps.py row had the test count bumped (53→60) and line count bumped (1,784→2,001) to reflect the +7 tests I added for #589, but the long "What it covers" prose paragraph wasn't updated to describe what those new tests do. Reading the row, you'd see "this file grew by 7 tests" with no clue why or what they cover. Appended a clause matching the existing per-suite style (the row already lists "v0.0.123 tee suite", "v0.0.124 source-mapping suite", "v0.0.125 Stage 3 suite") with: - the suite name (TestHostPrintInvalidUtf8589) and issue link (#589) - the 6 structural decode-site assertions naming each affected site (host_print / host_stderr / host_contract_fail / _read_wasm_string / markdown::_read_string / _extract_string) - the 1 synthetic-WAT end-to-end test and its specific contract (Python UnicodeDecodeError escapes wasmtime's trampoline as a "python exception" cause iff the host decode is strict) Pure documentation update. No code changes. Co-Authored-By: Claude <noreply@anthropic.invalid>
apply_fn(closure, ()) on a (Unit -> X) closure failed WASM validation with "expected i64, found i32" at compile time. Same falsy-pitfall class as #584's _fn_ret_types filter: _translate_apply_fn in vera/wasm/closures.py defaulted the value-arg's WASM type to "i64" when _infer_expr_wasm_type returned None, but None means BOTH "couldn't infer" AND "Unit has no representation" - conflating the two cases. Result: a UnitLit pushed nothing onto the stack but registered a phantom i64 param in the call_indirect signature. The closure-lift side correctly skipped Unit params, so the two sides disagreed and validation rejected the call (the func_table_idx landing where the phantom value-arg was expected). Fix is three lines: change "arg_wasm_types.append(wt or 'i64')" to "elif wt is not None: arg_wasm_types.append(wt)" so Unit args contribute zero entries to the call_indirect sig. Discovered while smoke-testing #514 / #535 closure-capture work on v0.0.135 to confirm those v0.0.119-era bugs were closed. Both #514 and #535 ARE closed; this is a separate adjacent codegen bug. Conformance: tests/conformance/ch05_unit_arg_closure.vera covers three positions: - No-capture Unit-arg closure - Int-capture Unit-arg closure - Array<Bool>-capture Unit-arg closure (exercises #514's heap- capture work + #586's fix together) Conformance count: 84 -> 85. Total tests 3,735 -> 3,740 (+5 from the parametrized conformance runner: parse, check, verify, run, fmt). Rebased on top of PR #590 (which merged #589's UTF-8 hardening into [Unreleased] without a version bump). The rebase rolls #589's [Unreleased] entry into [0.0.136] alongside this PR's #586 entry, so the v0.0.136 release ships both fixes together. HISTORY.md row for v0.0.136 updated to mention both #586 and #589. Doc updates: - CHANGELOG entry under [0.0.136] Fixed for both #586 and #589 - HISTORY.md Stage 11 row for v0.0.136 names both issues - All count surfaces (TESTING.md, ROADMAP.md, etc.) bumped to 3,740 tests / 85 conformance - uv.lock regenerated Co-Authored-By: Claude <noreply@anthropic.invalid>
…ep test Both findings valid; fixed both. 1. TESTING.md row 82 was missing the new TestHostSleepKeyboardInterrupt suite from its coverage prose, even though the test count and line count were already bumped. Appended a clause matching the existing per-suite style: names the suite, links #595 (the related residual issue this guards the in-our-control half of), summarises both tests in the class. 2. test_host_sleep_keyboard_interrupt_propagates_as_vera_exit was a "test-the-test" — it defined a LOCAL host_sleep_under_test mirror of the production code, patched time.sleep, and asserted the conversion. If the production code regressed (KeyboardInterrupt guard removed), this test still passed because it was checking the LOCAL copy. Replaced with test_host_sleep_keyboard_interrupt_end_to_end that: - Compiles a real Vera program calling IO.sleep(120) - Monkey-patches time.sleep to raise KeyboardInterrupt - Runs via execute() (the real production entrypoint) - Asserts ExecuteResult.exit_code == 130 - Asserts pre-sleep IO.print preserved (#522 contract) - Asserts post-sleep IO.print did NOT run Negative-test verified: temporarily reverting the production guard makes this test fail (wasmtime crashes when KeyboardInterrupt escapes the trampoline). Restoring the guard makes it pass again. So the test now genuinely pins the production path. Same "test-the-test" gap rabbit caught on PR #590's #589 e2e test. Fixing this one symmetrically — for #589 the gap was less tractable (would have required exporting wasmtime linker setup); here it was just a 30-line rewrite using execute() directly. test_runtime_traps.py grows from 2,100 to 2,138 lines (the e2e is slightly longer than the local-helper form). Test count unchanged (still 62: structural + e2e is the same shape as before). Co-Authored-By: Claude <noreply@anthropic.invalid>
Summary
Closes #589.
A user-level Vera program triggered a 30+ line Python traceback when an upstream codegen bug produced a corrupt
String(ptr, len)pair pointing at non-UTF-8 bytes.host_printdecoded the bytes with strict-mode UTF-8, raisedUnicodeDecodeError, and wasmtime-py wrapped the exception as a "python exception" cause that escaped through wasmtime's trampoline back to the user's CLI:This violates the WasmTrapError contract from #516 / #522 / #547 — runtime traps must surface as classified Vera-native errors, not raw wasmtime/Python internals. A user-level program should never produce a Python traceback regardless of what the program does.
Surfaced in the wild by #588 (captured-Array indexing in a closure body produces corrupt String pointers in Conway's Game of Life programs). Fixing #588 removes the most common trigger but the defensive-coding gap is independent of source: any future codegen bug that corrupts a String would crash here too.
Fix
Per-site
errors="replace"for the four affected decode paths invera/codegen/api.py:host_printIO.printhost_stderrIO.stderrhost_contract_fail_read_wasm_stringIO.read_file/get_env/ etc.Invalid bytes now surface as U+FFFD replacement characters in the user's output rather than escaping as a Python traceback. The program continues running until either
mainreturns normally or hits a different trap.The
_extract_stringsite added by #585 (String-return decoder) already had atry/except UnicodeDecodeErrorguard with a pointer-fallback, so it didn't need changing.Test plan
TestHostPrintInvalidUtf8589test class intests/test_runtime_traps.py:errors="replace"is used. If a future change accidentally drops it, these fail.vera.printand invokes it with raw invalid UTF-8 bytes (b"hello \xc1 world"). Asserts no Python exception escapes wasmtime's trampoline and that U+FFFD appears in the decoded output between the valid-byte halves.█�Fpatterns (the�is U+FFFD) instead of crashing.pytest tests/— 3,719 passed, 14 skipped (5 new, 0 regressions).mypy vera/clean.Doc updates
CHANGELOG.md— entry under[Unreleased] / Fixed(no version bump — PR v0.0.136: fix #586 (apply_fn(closure, ()) on Unit-arg closure trips WASM validation) #587 is also pending and will bump to 0.0.136; whichever lands first claims that version, the other rolls into 0.0.137).KNOWN_ISSUES.md— adds Indexing a captured Array<T> inside a closure body produces invalid WASM #588 (the upstream captured-Array-indexing bug). host_print / host_stderr / host_contract_fail crash with raw Python UnicodeDecodeError on invalid UTF-8 bytes #589 is NOT added because this PR closes it.TESTING.mdandROADMAP.md— test count surfaces updated for the +5 new tests (3,728 → 3,733;test_runtime_traps.py53 → 58 tests, 1,784 → 1,943 lines).Coordination with other open PRs
fix #586 apply_fn Unit-arg) is also open against main with a version bump to 0.0.136. This PR does NOT bump version and lives only in[Unreleased]. Whichever merges first claims 0.0.136; the second rebases and gets 0.0.137. Files touched are disjoint (v0.0.136: fix #586 (apply_fn(closure, ()) on Unit-arg closure trips WASM validation) #587:vera/wasm/closures.py; this:vera/codegen/api.py+tests/test_runtime_traps.py) so no merge conflict.Why not also classify as a Vera-native trap?
A future improvement could detect the first invalid byte in
host_printand raise aWasmTrapErrorwith a Vera-native fix paragraph ("your String value points at corrupt memory; this typically indicates a codegen bug or a manual_allocmisuse"). That would surface the underlying corruption more loudly. But:errors="replace"alone.Recommend shipping
errors="replace"first; queue the diagnostic improvement as a separate issue.🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests
Documentation