Fix #589: host_print never escapes Python UnicodeDecodeError on bad bytes by aallan · Pull Request #590 · aallan/vera

aallan · 2026-05-06T16:05:27Z

Summary

Closes #589.

A user-level Vera program triggered a 30+ line Python traceback when an upstream codegen bug produced a corrupt String (ptr, len) pair pointing at non-UTF-8 bytes. host_print decoded the bytes with strict-mode UTF-8, raised UnicodeDecodeError, and wasmtime-py wrapped the exception as a "python exception" cause that escaped through wasmtime's trampoline back to the user's CLI:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 123: invalid start byte

This violates the WasmTrapError contract from #516 / #522 / #547 — runtime traps must surface as classified Vera-native errors, not raw wasmtime/Python internals. A user-level program should never produce a Python traceback regardless of what the program does.

Surfaced in the wild by #588 (captured-Array indexing in a closure body produces corrupt String pointers in Conway's Game of Life programs). Fixing #588 removes the most common trigger but the defensive-coding gap is independent of source: any future codegen bug that corrupts a String would crash here too.

Fix

Per-site errors="replace" for the four affected decode paths in vera/codegen/api.py:

Site	Used by
`host_print`	`IO.print`
`host_stderr`	`IO.stderr`
`host_contract_fail`	runtime contract violation messages
`_read_wasm_string`	`IO.read_file` / `get_env` / etc.

Invalid bytes now surface as U+FFFD replacement characters in the user's output rather than escaping as a Python traceback. The program continues running until either main returns normally or hits a different trap.

The _extract_string site added by #585 (String-return decoder) already had a try/except UnicodeDecodeError guard with a pointer-fallback, so it didn't need changing.

Test plan

New TestHostPrintInvalidUtf8589 test class in tests/test_runtime_traps.py:
- Four structural assertions (one per affected site) pinning that errors="replace" is used. If a future change accidentally drops it, these fail.
- End-to-end test using a synthetic WAT module that imports vera.print and invokes it with raw invalid UTF-8 bytes (b"hello \xc1 world"). Asserts no Python exception escapes wasmtime's trampoline and that U+FFFD appears in the decoded output between the valid-byte halves.
User reproducer (Conway's Life from Indexing a captured Array<T> inside a closure body produces invalid WASM #588) verified manually: no more Python traceback. Corrupt strings render as █�F patterns (the � is U+FFFD) instead of crashing.
pytest tests/ — 3,719 passed, 14 skipped (5 new, 0 regressions).
mypy vera/ clean.
All pre-commit hooks pass (doc-counts, version-sync, limitations-sync, site assets all consistent).

Doc updates

CHANGELOG.md — entry under [Unreleased] / Fixed (no version bump — PR v0.0.136: fix #586 (apply_fn(closure, ()) on Unit-arg closure trips WASM validation) #587 is also pending and will bump to 0.0.136; whichever lands first claims that version, the other rolls into 0.0.137).
KNOWN_ISSUES.md — adds Indexing a captured Array<T> inside a closure body produces invalid WASM #588 (the upstream captured-Array-indexing bug). host_print / host_stderr / host_contract_fail crash with raw Python UnicodeDecodeError on invalid UTF-8 bytes #589 is NOT added because this PR closes it.
TESTING.md and ROADMAP.md — test count surfaces updated for the +5 new tests (3,728 → 3,733; test_runtime_traps.py 53 → 58 tests, 1,784 → 1,943 lines).

Coordination with other open PRs

PR v0.0.136: fix #586 (apply_fn(closure, ()) on Unit-arg closure trips WASM validation) #587 (fix #586 apply_fn Unit-arg) is also open against main with a version bump to 0.0.136. This PR does NOT bump version and lives only in [Unreleased]. Whichever merges first claims 0.0.136; the second rebases and gets 0.0.137. Files touched are disjoint (v0.0.136: fix #586 (apply_fn(closure, ()) on Unit-arg closure trips WASM validation) #587: vera/wasm/closures.py; this: vera/codegen/api.py + tests/test_runtime_traps.py) so no merge conflict.

Why not also classify as a Vera-native trap?

A future improvement could detect the first invalid byte in host_print and raise a WasmTrapError with a Vera-native fix paragraph ("your String value points at corrupt memory; this typically indicates a codegen bug or a manual _alloc misuse"). That would surface the underlying corruption more loudly. But:

The user-facing severity (Python traceback escape) is closed by errors="replace" alone.
A loud trap blocks programs that legitimately produce non-UTF-8 output (e.g. binary protocols, raw byte streams). Replacement-char display is the standard Unix-tool behaviour.
The trap classification can land as a follow-up once we have more data on what programs hit this.

Recommend shipping errors="replace" first; queue the diagnostic improvement as a separate issue.

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Host I/O and WASM string reads now tolerate invalid UTF‑8 by surfacing replacement characters (U+FFFD) instead of raising decode errors.
Tests
- Added tests covering UTF‑8 decoding safety across multiple host I/O sites and end‑to‑end behaviour; test counts updated to 3,735.
Documentation
- ROADMAP and TESTING metrics updated; KNOWN_ISSUES added for array‑indexing-in-closure WASM bug and strict UTF‑8 decoding of network responses.

…ytes A user-level program should NEVER produce a Python traceback regardless of what the program does. This is the WasmTrapError contract from #516 / #522 / #547 — runtime traps must surface as classified Vera-native errors, not raw wasmtime/Python internals. The host UTF-8 decode paths violated this contract. When an upstream codegen bug produced a corrupt String (ptr, len) pair pointing at non- UTF-8 bytes, host_print's strict-mode `bytes.decode("utf-8")` raised UnicodeDecodeError. wasmtime-py wrapped the exception as a "python exception" cause and the user's CLI saw a 30+ line Python traceback ending in `UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1`. Fix is per-site `errors="replace"` for the four affected decode paths: - host_print (vera.print) - host_stderr (vera.stderr) - host_contract_fail (vera.contract_fail) - _read_wasm_string (used by read_file / get_env / etc.) Invalid bytes now surface as U+FFFD replacement characters in the user's output rather than escaping as a Python traceback. The program continues running until either main returns normally or hits a different trap. Surfaced by #588 (captured-Array indexing in closure body produces corrupt String pointers in Conway's Game of Life programs); fixing #588 removes the most common trigger but the defensive-coding hygiene applies regardless of source. Other future codegen bugs that corrupt strings will surface visibly (U+FFFD in output) instead of crashing Python. Tests added (tests/test_runtime_traps.py TestHostPrintInvalidUtf8589): - Four structural assertions (one per affected site) that the fix is in place — `errors="replace"` is the regression we're pinning. - End-to-end test using a synthetic WAT module that imports vera.print and invokes it with raw invalid UTF-8 bytes. Asserts no Python exception escapes wasmtime's trampoline and that U+FFFD is in the decoded output. Independent of any Vera-source codegen bug. Conway's Life from #588 now runs without a Python traceback (corrupt strings render as U+FFFD chars instead of crashing). Doc updates: - CHANGELOG entry under [Unreleased] Fixed (no version bump — rolls into whatever 0.0.136 ends up being once PR #587 lands). - KNOWN_ISSUES.md gets #588 (the upstream bug). #589 doesn't go in KNOWN_ISSUES because this PR closes it. - test count surfaces in TESTING.md and ROADMAP.md updated for the +5 new tests. Co-Authored-By: Claude <noreply@anthropic.invalid>

coderabbitai · 2026-05-06T16:05:40Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 43ac6f2c-8fab-4892-afef-69c4b511087d

📥 Commits

Reviewing files that changed from the base of the PR and between c07ff83 and 9d8a0f2.

📒 Files selected for processing (1)

TESTING.md

📝 Walkthrough

Walkthrough

Hardened UTF‑8 decoding for WASM-to-Python string paths so invalid bytes are surfaced as U+FFFD (replacement character) instead of raising UnicodeDecodeError. Adds targeted tests covering host_print, host_stderr, host_contract_fail, WASM string reads and markdown read paths, and updates documentation and test metrics to 3,735 tests.

Changes

UTF‑8 decoding resilience and tests (#589)

Layer / File(s)	Summary
Data Shape / Helper `vera/codegen/api.py`, `vera/wasm/markdown.py`	WASM-linear-memory string readers now decode bytes using UTF‑8 with `errors="replace"` (changes in `_read_wasm_string` and `_read_string`) so invalid sequences become U+FFFD rather than raising.
Core Implementation `vera/codegen/api.py`	Host-side IO handlers (`host_print`, `host_stderr`, `host_contract_fail`) and the post-return `String` decoding in `Execute` switched to `errors="replace"`, removing strict-decode failure modes and the prior pointer-fallback.
Integration / Consumer `vera/codegen/api.py`	Call sites updated where decoded text is appended to stdout/stderr buffers or recorded as contract violation text to consume replacement-character-decoded strings.
Tests `tests/test_runtime_traps.py`	Added `TestHostPrintInvalidUtf8589` with tests verifying `errors="replace"` is used across `host_print`, `host_stderr`, `host_contract_fail`, `_read_wasm_string`, markdown's `_read_string`, extract-string paths, and an end‑to‑end scenario that ensures no Python `UnicodeDecodeError` escapes.
Docs / Changelog / Metrics `CHANGELOG.md`, `TESTING.md`, `ROADMAP.md`	`CHANGELOG.md` records the Unreleased Fixed entry. `TESTING.md` and `ROADMAP.md` updated test counts and related metrics (total tests now 3,735).
Known Issues Register `KNOWN_ISSUES.md`	Added bug entries describing the captured `Array<T>` closure-indexing invalid-WASM issue (`#588`) and strict-UTF8 decoding in network response handling (`#591`).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

aallan/vera#592: Overlaps — adds end-to-end tests and hardens the same decoding sites (host_print, host_stderr, host_contract_fail, _read_wasm_string, markdown._read_string, return-string decoding).

Possibly related PRs

aallan/vera#533: Modifies similar runtime I/O and execute paths in vera/codegen/api.py; likely related to decoding/host-I/O changes.

Suggested labels

compiler, tests, docs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 68.42% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically identifies the main fix: preventing Python UnicodeDecodeError from escaping host_print on invalid UTF-8 bytes, directly referencing issue `#589`.
Linked Issues check	✅ Passed	The changeset fully addresses `#589`'s objectives: host_print, host_stderr, host_contract_fail, _read_wasm_string, _extract_string and markdown._read_string now use errors='replace' to prevent UnicodeDecodeError escapes, with comprehensive structural tests and synthetic end-to-end WAT validation. `#586` is informational only (deferred).
Out of Scope Changes check	✅ Passed	All changes are in-scope: the defensive UTF-8 decoding fix to core affected sites, test coverage (structural assertions and end-to-end synthetic test), and documentation updates (CHANGELOG, KNOWN_ISSUES, TESTING.md, ROADMAP.md). `#586`'s call_indirect Unit-arg fix is noted as deferred.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/fix-host-print-utf8-crash

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-06T16:07:06Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.98%. Comparing base (c5c0263) to head (9d8a0f2).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #590   +/-   ##
=======================================
  Coverage   90.97%   90.98%           
=======================================
  Files          59       59           
  Lines       22930    22927    -3     
  Branches      259      259           
=======================================
- Hits        20860    20859    -1     
+ Misses       2063     2061    -2     
  Partials        7        7

Flag	Coverage Δ
javascript	`57.36% <ø> (ø)`
python	`94.81% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

aallan · 2026-05-06T17:36:09Z

@coderabbitai full review

coderabbitai · 2026-05-06T17:36:14Z

✅ Actions performed

Full review triggered.

aallan · 2026-05-06T17:49:51Z

@coderabbitai help

coderabbitai · 2026-05-06T17:50:08Z

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

CodeRabbit commands

These commands are invoked using PR/Issue comments.

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai rate limit to show your current review rate limit status.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai evaluate custom pre-merge check --instructions <custom-checks-instructions> --name <custom-checks-title> [--mode <error|warning>] to validate the custom pre-merge checks instructions. Defaults to error when --mode is omitted.
@coderabbitai ignore pre-merge checks to override pre-merge checks and get an approval on PR.
@coderabbitai run pre-merge checks to run pre-merge checks on the pull request.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve merge conflict to automatically resolve merge conflicts.
@coderabbitai autofix to automatically fix issues identified in unresolved review comments.
@coderabbitai resolve to resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai generate configuration to create a PR that adds the current resolved configuration as .coderabbit.yaml (or show it if already present).
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, support, documentation and community

Visit our status page to check the current availability of CodeRabbit.
Create a ticket on our support page for assistance with any issues or questions.
Visit our documentation site for detailed information on how to use CodeRabbit.
Join our Discord community to connect with other users and get help from the community.
Follow us on X/Twitter for updates and announcements.

aallan · 2026-05-06T17:51:11Z

@coderabbitai full review

coderabbitai · 2026-05-06T17:51:16Z

✅ Actions performed

Full review triggered.

Code-reviewer flagged a 5th decode site missed by the initial pass: vera/wasm/markdown.py::_read_string — invoked from host_md_render / host_md_has_heading / host_md_extract_text / host_md_count_blocks for every Markdown host import. Same failure mode as #589 just routed through Markdown. Now uses errors="replace" with a docstring cross-referencing the api.py sites. Silent-failure-hunter flagged _extract_string at api.py:3259 as inconsistent: it had a try/except UnicodeDecodeError → pointer fallback that silently mutated the return type from str to int. Worse silent failure than U+FFFD because downstream consumers (CLI printer) printed an integer where a string was expected — looks like a valid result. Switched to errors="replace" matching the four sibling sites; value stays a str, invalid bytes surface as U+FFFD. PR-test-analyzer flagged two test-quality issues: - Structural marker for host_print test was anchored on a comment string ("Host function: vera.print"). Anchored on "def host_print(" instead, matching the pattern used for the other three structural tests. Less fragile to comment refactoring. - Class docstring claimed "Tests here are structural assertions" but the class also contains an end-to-end test. Reworded to "primarily structural assertions" plus an explicit note that the e2e test pins the wasmtime-trampoline contract rather than the production path. Added two new structural tests: - test_markdown_read_string_uses_errors_replace - test_extract_string_uses_errors_replace Plus a private _file_body_after helper so future tests in this class can assert on any source file, not just api.py. CHANGELOG entry expanded to mention the markdown.py site and the _extract_string str→int fix. Test counts bumped 3,733 → 3,735 (+2 tests in test_runtime_traps.py). The HTTP/Inference network-decode sites at api.py:756 / 2809 / 2841 were considered but left for a follow-up — they're a different layer (network response, not WASM memory) and deserve their own analysis of whether replace-mode or structured-error is the right choice. Co-Authored-By: Claude <noreply@anthropic.invalid>

aallan · 2026-05-06T18:03:22Z

Review findings (logged for future reference)

Ran /pr-review-toolkit:review-pr (4 agents in parallel: code-reviewer, pr-test-analyzer, comment-analyzer, silent-failure-hunter). Findings + actions taken below. Fixes ship in the next commit on this branch.

Critical — fixed

1. Coverage gap: 5th UTF-8 decode site missed. vera/wasm/markdown.py:386 _read_string decodes WASM memory bytes with strict UTF-8. Invoked from host_md_render / host_md_has_heading / host_md_extract_text / host_md_count_blocks (vera/codegen/api.py:1413, 1430, 1447, 1466). The exact #589 failure mode — corrupt String (ptr, len) → UnicodeDecodeError escaping wasmtime's trampoline → 30-line Python traceback — is reproducible through any of the four md_* host imports. Reported by code-reviewer (confidence 90). Fixed: applied errors="replace" with a docstring cross-referencing the api.py sites. New test_markdown_read_string_uses_errors_replace structural test.

2. Silent type mutation in _extract_string. vera/codegen/api.py:3259 had a try/except UnicodeDecodeError → pointer fallback that silently mutated the return type from str to int when bytes weren't valid UTF-8. Worse silent failure than visible U+FFFD because downstream consumers (CLI printer at minimum) printed an integer where a string was expected — looks like a valid result. Reported by silent-failure-hunter. Fixed: switched to errors="replace" matching the four sibling sites; value stays a str, invalid bytes become U+FFFD. New test_extract_string_uses_errors_replace structural test.

Important — fixed

3. Structural test anchored on fragile comment string. test_host_print_uses_errors_replace originally anchored on the comment "Host function: vera.print" rather than the function definition. Comment refactoring would silently break the test even though behavior is preserved. Reported by pr-test-analyzer. Fixed: anchored on def host_print( matching the pattern used for the other three structural tests.

4. Class docstring undersells the e2e test. Said "Tests here are structural assertions on the source" but the class also contains test_invalid_utf8_through_host_print_does_not_raise, an end-to-end synthetic-WAT test. Reported by comment-analyzer. Fixed: reworded to "primarily structural assertions" plus an explicit note about the e2e test's role (it pins the wasmtime-trampoline contract, not the production path).

Important — partial

5. End-to-end coverage thin for sites 2–6. Only host_print has a synthetic-WAT end-to-end test. host_stderr / host_contract_fail / _read_wasm_string / markdown.py::_read_string / _extract_string are covered only by structural greps. Reported by pr-test-analyzer (rated 6). Partial fix: added structural tests for the two new sites (markdown + _extract_string). Full parametrized end-to-end coverage of all six sites is left as a follow-up — the structural tests catch the regression of dropping errors="replace" from any single site, and the e2e test pins the wasmtime-trampoline mechanism. The cost-benefit of duplicating the e2e setup five more times didn't seem high enough to block this PR.

Mild — explicitly deferred

6. "Test-the-test" concern in test_invalid_utf8_through_host_print_does_not_raise. The end-to-end test defines its own host_print closure that hardcodes errors="replace" — if a contributor drops errors="replace" from vera/codegen/api.py:1076, the structural test catches it but this end-to-end test still passes (it's asserting Python's stdlib behaviour, not the production code's). Reported by pr-test-analyzer (rated 8). Deferred: the structural tests are what protect against the production regression; the e2e test is now correctly framed in the class docstring (post-fix #4) as a wasmtime-trampoline contract test. Wiring up the production host_print from inside a test would require either exporting the linker setup from execute() (large refactor) or duplicating it in the test (worse than the current state).

7. HTTP / Inference network-decode sites. vera/codegen/api.py:756 (Inference response), :2809 and :2841 (Http response) decode network data with strict UTF-8. Same pattern, but a different layer — these are external API responses, not WASM memory. The replace-vs-structured-error tradeoff is different (a misbehaving remote API is a real failure mode the user should know about, vs. a corrupt WASM-internal pointer which is a compiler bug). Deferred: noted in commit message; deserves its own analysis as a separate issue.

8. _read_wasm_string debug breadcrumb for FileNotFoundError disambiguation. When a corrupt path string degrades to U+FFFD and then Path(...).read_text() returns FileNotFoundError, the user can't tell whether the path was corrupt or just wrong. silent-failure-hunter suggested a debug log breadcrumb. Deferred as over-engineering for now — the U+FFFD chars are preserved in the path string, so the FileNotFoundError message should already contain them as a tell.

Strengths confirmed by all four agents

Per-site errors="replace" is the correct choice vs. raising a structured WasmTrapError (preserves the WasmTrapError contract from Runtime traps need Vera-native diagnostics, not raw wasmtime stack traces #516 / IO.print output lost when program traps (stdout fully buffered) #522 / #516 Stage 3 — per-kind Fix paragraphs for runtime traps #547 without forcing every consumer to handle classified errors; matches Unix-tool behaviour for stdout/stderr pipes).
KNOWN_ISSUES.md correctly tracks Indexing a captured Array<T> inside a closure body produces invalid WASM #588 (the open upstream cause) and not host_print / host_stderr / host_contract_fail crash with raw Python UnicodeDecodeError on invalid UTF-8 bytes #589 (closed by this PR).
All five issue cross-references (Runtime traps need Vera-native diagnostics, not raw wasmtime stack traces #516 / IO.print output lost when program traps (stdout fully buffered) #522 / #516 Stage 3 — per-kind Fix paragraphs for runtime traps #547 / Indexing a captured Array<T> inside a closure body produces invalid WASM #588 / host_print / host_stderr / host_contract_fail crash with raw Python UnicodeDecodeError on invalid UTF-8 bytes #589) verified accurate by comment-analyzer.
The PR's own framing (Why-not-also-classify-as-Vera-native-trap section in the description) correctly anticipates the structural-vs-behavioral question.

Final test surface

TestHostPrintInvalidUtf8589 now has 7 tests:

6 structural assertions (one per affected site: 4 in api.py + markdown.py::_read_string + _extract_string)
1 end-to-end synthetic-WAT test (host_print path)

Total tests on the PR: 3,728 → 3,735 (+7).

aallan · 2026-05-06T20:33:03Z

@coderabbitai full review

coderabbitai · 2026-05-06T20:33:10Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@TESTING.md`:
- Line 9: The test-count row in TESTING.md is inconsistent (it shows 3,735 total
but lists 3,714 passed + 14 skipped = 3,728); run the canonical count command
(pytest tests/ --collect-only -q or python scripts/check_doc_counts.py) to get
the authoritative numbers and then update the table row in TESTING.md so the
total equals the sum of all status buckets (or add the missing status bucket
explicitly), e.g., adjust the total to 3,728 or include the omitted bucket and
its count to reconcile the math.

In `@tests/test_runtime_traps.py`:
- Around line 1866-1894: The current assertions in
test_read_wasm_string_uses_errors_replace and
test_markdown_read_string_uses_errors_replace only check for the literal
'errors="replace"' which can live in docstrings/comments; update each test (the
bodies obtained via _api_body_after and _file_body_after) to assert that the
decode call actually includes the errors argument — e.g. verify the body
contains a decode invocation with errors="replace" (use a regex like
'\.decode\([^)]*errors\s*=\s*["\']replace["\']' or check both '.decode(' and
'errors="replace"' occur near each other) instead of the simple substring check
so the tests fail if the real .decode(..., errors="replace") call is removed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d5fe3429-702c-46b3-ac74-f75a6ded446b

📥 Commits

Reviewing files that changed from the base of the PR and between c5c0263 and c22980f.

📒 Files selected for processing (7)

CHANGELOG.md
KNOWN_ISSUES.md
ROADMAP.md
TESTING.md
tests/test_runtime_traps.py
vera/codegen/api.py
vera/wasm/markdown.py

Two follow-up issues filed from the PR #590 review findings, now logged in KNOWN_ISSUES so they don't get lost in the PR comment thread once this PR merges. #591 — HTTP/Inference network response UTF-8 decode hygiene. Three sites in vera/codegen/api.py use strict-mode bytes.decode("utf-8"): - line 756 (Inference.complete via _call_inference_provider) - line 2809 (Http.get host_http_get) - line 2841 (Http.post host_http_post) All three are wrapped in try/except Exception → Result::Err so they DON'T escape as Python tracebacks (lower severity than #589). But the error message handed to the user contains "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x..." — Python-internals noise rather than a Vera-native diagnostic. Filed in KNOWN_ISSUES under Bugs. #592 — e2e test parametrization for the remaining 5 UTF-8 decode sites. TestHostPrintInvalidUtf8589 has 6 structural source-greps and 1 end-to-end synthetic-WAT test (host_print only). The other 5 sites are pinned only structurally. A future refactor that centralises the decodes into a _safe_decode() helper would break the structural greps even with preserved behaviour. Filed in KNOWN_ISSUES under a new "Test coverage gaps" section since it doesn't fit cleanly into "Refactoring needed" (which is about file decomposition). Co-Authored-By: Claude <noreply@anthropic.invalid>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@KNOWN_ISSUES.md`:
- Line 12: Update the KNOWN_ISSUES.md entry to correct the claim about exception
handling: state that only Http.get and Http.post are wrapped in try/except
Exception (so they return a Vera-level Err), while Inference.complete performs
the UTF-8 decode inside a with block with no exception handler and therefore can
directly raise UnicodeDecodeError; adjust the severity/language to reflect that
Inference.complete is exposed and not protected like Http.get/Http.post and
suggest the intended decision options (`errors="replace"` vs explicit detection
with a Vera-native Err) remain under consideration.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 93c3d27f-8f96-4e9b-9567-321fcdd4876d

📥 Commits

Reviewing files that changed from the base of the PR and between c22980f and fd4843d.

📒 Files selected for processing (1)

KNOWN_ISSUES.md

…ertions Both findings on PR #590 verified valid; fixed both with minimal changes. 1. TESTING.md line 9 had stale passed count: "3,735 across 29 files (3,714 passed + 14 skipped)". 3,714 + 14 = 3,728, not 3,735. The earlier sed in the review-fixes commit (c22980f) targeted "3,719 passed" → "3,721 passed" but the original was "3,714 passed", so the substitution missed. Updated to "3,721 passed" so the math reconciles (3,721 + 14 = 3,735, matching pytest's collected count). 2. test_read_wasm_string_uses_errors_replace and test_markdown_read_string_uses_errors_replace asserted only that the substring 'errors="replace"' appeared in the source body. Both sites have docstrings I expanded with prose mentions of `errors="replace"` (explaining the strategy), so the substring would still match if the actual `.decode(...)` call was reverted to strict mode — the tests would pass spuriously. Tightened both to assert on the literal call form '.decode("utf-8", errors="replace")' which appears only in actual call sites (the docstrings use ``errors="replace"`` with markdown backticks, never the leading `.decode(` prefix). Verified by negative test: temporarily reverting markdown.py's call to strict mode triggers the tightened assertion failure as expected. Updated docstrings on both tests to explain why the literal-call form is required. The other four structural tests (host_print / host_stderr / host_contract_fail / _extract_string) already used the literal call form and were not affected. Co-Authored-By: Claude <noreply@anthropic.invalid>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@TESTING.md`:
- Line 82: Update the long "What it covers" summary for the test row to mention
the new UTF-8 hardening suite TestHostPrintInvalidUtf8589: state that it adds
the six structural decode-site assertions plus the synthetic WAT end-to-end
no-traceback regression and that these tests cover invalid-UTF-8 handling in
host printing; edit the row text that lists covered items (the multi-clause
paragraph referencing trap kinds, tee suite, source-mapping, etc.) to append a
short phrase like "and UTF‑8 hardening: TestHostPrintInvalidUtf8589 (6
decode-site assertions + 1 synthetic WAT end-to-end no-traceback test)" so the
documented coverage matches the updated test count.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0976b806-5f60-4f43-a9f1-d92eb81a3930

📥 Commits

Reviewing files that changed from the base of the PR and between fd4843d and c07ff83.

📒 Files selected for processing (2)

TESTING.md
tests/test_runtime_traps.py

Rabbit finding: the test_runtime_traps.py row had the test count bumped (53→60) and line count bumped (1,784→2,001) to reflect the +7 tests I added for #589, but the long "What it covers" prose paragraph wasn't updated to describe what those new tests do. Reading the row, you'd see "this file grew by 7 tests" with no clue why or what they cover. Appended a clause matching the existing per-suite style (the row already lists "v0.0.123 tee suite", "v0.0.124 source-mapping suite", "v0.0.125 Stage 3 suite") with: - the suite name (TestHostPrintInvalidUtf8589) and issue link (#589) - the 6 structural decode-site assertions naming each affected site (host_print / host_stderr / host_contract_fail / _read_wasm_string / markdown::_read_string / _extract_string) - the 1 synthetic-WAT end-to-end test and its specific contract (Python UnicodeDecodeError escapes wasmtime's trampoline as a "python exception" cause iff the host decode is strict) Pure documentation update. No code changes. Co-Authored-By: Claude <noreply@anthropic.invalid>

apply_fn(closure, ()) on a (Unit -> X) closure failed WASM validation with "expected i64, found i32" at compile time. Same falsy-pitfall class as #584's _fn_ret_types filter: _translate_apply_fn in vera/wasm/closures.py defaulted the value-arg's WASM type to "i64" when _infer_expr_wasm_type returned None, but None means BOTH "couldn't infer" AND "Unit has no representation" - conflating the two cases. Result: a UnitLit pushed nothing onto the stack but registered a phantom i64 param in the call_indirect signature. The closure-lift side correctly skipped Unit params, so the two sides disagreed and validation rejected the call (the func_table_idx landing where the phantom value-arg was expected). Fix is three lines: change "arg_wasm_types.append(wt or 'i64')" to "elif wt is not None: arg_wasm_types.append(wt)" so Unit args contribute zero entries to the call_indirect sig. Discovered while smoke-testing #514 / #535 closure-capture work on v0.0.135 to confirm those v0.0.119-era bugs were closed. Both #514 and #535 ARE closed; this is a separate adjacent codegen bug. Conformance: tests/conformance/ch05_unit_arg_closure.vera covers three positions: - No-capture Unit-arg closure - Int-capture Unit-arg closure - Array<Bool>-capture Unit-arg closure (exercises #514's heap- capture work + #586's fix together) Conformance count: 84 -> 85. Total tests 3,735 -> 3,740 (+5 from the parametrized conformance runner: parse, check, verify, run, fmt). Rebased on top of PR #590 (which merged #589's UTF-8 hardening into [Unreleased] without a version bump). The rebase rolls #589's [Unreleased] entry into [0.0.136] alongside this PR's #586 entry, so the v0.0.136 release ships both fixes together. HISTORY.md row for v0.0.136 updated to mention both #586 and #589. Doc updates: - CHANGELOG entry under [0.0.136] Fixed for both #586 and #589 - HISTORY.md Stage 11 row for v0.0.136 names both issues - All count surfaces (TESTING.md, ROADMAP.md, etc.) bumped to 3,740 tests / 85 conformance - uv.lock regenerated Co-Authored-By: Claude <noreply@anthropic.invalid>

…ep test Both findings valid; fixed both. 1. TESTING.md row 82 was missing the new TestHostSleepKeyboardInterrupt suite from its coverage prose, even though the test count and line count were already bumped. Appended a clause matching the existing per-suite style: names the suite, links #595 (the related residual issue this guards the in-our-control half of), summarises both tests in the class. 2. test_host_sleep_keyboard_interrupt_propagates_as_vera_exit was a "test-the-test" — it defined a LOCAL host_sleep_under_test mirror of the production code, patched time.sleep, and asserted the conversion. If the production code regressed (KeyboardInterrupt guard removed), this test still passed because it was checking the LOCAL copy. Replaced with test_host_sleep_keyboard_interrupt_end_to_end that: - Compiles a real Vera program calling IO.sleep(120) - Monkey-patches time.sleep to raise KeyboardInterrupt - Runs via execute() (the real production entrypoint) - Asserts ExecuteResult.exit_code == 130 - Asserts pre-sleep IO.print preserved (#522 contract) - Asserts post-sleep IO.print did NOT run Negative-test verified: temporarily reverting the production guard makes this test fail (wasmtime crashes when KeyboardInterrupt escapes the trampoline). Restoring the guard makes it pass again. So the test now genuinely pins the production path. Same "test-the-test" gap rabbit caught on PR #590's #589 e2e test. Fixing this one symmetrically — for #589 the gap was less tractable (would have required exporting wasmtime linker setup); here it was just a 30-line rewrite using execute() directly. test_runtime_traps.py grows from 2,100 to 2,138 lines (the e2e is slightly longer than the local-helper form). Test count unchanged (still 62: structural + e2e is the same shape as before). Co-Authored-By: Claude <noreply@anthropic.invalid>

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

Comment thread TESTING.md Outdated

Comment thread tests/test_runtime_traps.py

This was referenced May 6, 2026

HTTP / Inference network-response decode uses strict UTF-8 (Python error noise leaks into Result Err) #591

Closed

Extend TestHostPrintInvalidUtf8589 with end-to-end tests for the remaining 5 decode sites #592

Open

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

Comment thread KNOWN_ISSUES.md

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

Comment thread TESTING.md Outdated

aallan merged commit 2c046cb into main May 6, 2026
19 checks passed

aallan deleted the claude/fix-host-print-utf8-crash branch May 6, 2026 21:04

coderabbitai Bot mentioned this pull request May 11, 2026

Tier A bug burn-down: #633 + #634 + #556 + #591 #649

Merged

6 tasks

Conversation

aallan commented May 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Test plan

Doc updates

Coordination with other open PRs

Why not also classify as a Vera-native trap?

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning)

Review ran into problems

Uh oh!

codecov Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

aallan commented May 6, 2026

Uh oh!

coderabbitai Bot commented May 6, 2026

Uh oh!

aallan commented May 6, 2026

Uh oh!

coderabbitai Bot commented May 6, 2026

Chat

CodeRabbit commands

Other keywords and placeholders

Status, support, documentation and community

Uh oh!

aallan commented May 6, 2026

Uh oh!

coderabbitai Bot commented May 6, 2026

Uh oh!

aallan commented May 6, 2026

Review findings (logged for future reference)

Critical — fixed

Important — fixed

Important — partial

Mild — explicitly deferred

Strengths confirmed by all four agents

Final test surface

Uh oh!

aallan commented May 6, 2026

Uh oh!

coderabbitai Bot commented May 6, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aallan commented May 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 6, 2026 •

edited

Loading

codecov Bot commented May 6, 2026 •

edited

Loading