Skip to content

Fix #589: host_print never escapes Python UnicodeDecodeError on bad bytes#590

Merged
aallan merged 5 commits into
mainfrom
claude/fix-host-print-utf8-crash
May 6, 2026
Merged

Fix #589: host_print never escapes Python UnicodeDecodeError on bad bytes#590
aallan merged 5 commits into
mainfrom
claude/fix-host-print-utf8-crash

Conversation

@aallan

@aallan aallan commented May 6, 2026

Copy link
Copy Markdown
Owner

Summary

Closes #589.

A user-level Vera program triggered a 30+ line Python traceback when an upstream codegen bug produced a corrupt String (ptr, len) pair pointing at non-UTF-8 bytes. host_print decoded the bytes with strict-mode UTF-8, raised UnicodeDecodeError, and wasmtime-py wrapped the exception as a "python exception" cause that escaped through wasmtime's trampoline back to the user's CLI:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 123: invalid start byte

This violates the WasmTrapError contract from #516 / #522 / #547 — runtime traps must surface as classified Vera-native errors, not raw wasmtime/Python internals. A user-level program should never produce a Python traceback regardless of what the program does.

Surfaced in the wild by #588 (captured-Array indexing in a closure body produces corrupt String pointers in Conway's Game of Life programs). Fixing #588 removes the most common trigger but the defensive-coding gap is independent of source: any future codegen bug that corrupts a String would crash here too.

Fix

Per-site errors="replace" for the four affected decode paths in vera/codegen/api.py:

Site Used by
host_print IO.print
host_stderr IO.stderr
host_contract_fail runtime contract violation messages
_read_wasm_string IO.read_file / get_env / etc.

Invalid bytes now surface as U+FFFD replacement characters in the user's output rather than escaping as a Python traceback. The program continues running until either main returns normally or hits a different trap.

The _extract_string site added by #585 (String-return decoder) already had a try/except UnicodeDecodeError guard with a pointer-fallback, so it didn't need changing.

Test plan

  • New TestHostPrintInvalidUtf8589 test class in tests/test_runtime_traps.py:
    • Four structural assertions (one per affected site) pinning that errors="replace" is used. If a future change accidentally drops it, these fail.
    • End-to-end test using a synthetic WAT module that imports vera.print and invokes it with raw invalid UTF-8 bytes (b"hello \xc1 world"). Asserts no Python exception escapes wasmtime's trampoline and that U+FFFD appears in the decoded output between the valid-byte halves.
  • User reproducer (Conway's Life from Indexing a captured Array<T> inside a closure body produces invalid WASM #588) verified manually: no more Python traceback. Corrupt strings render as █�F patterns (the is U+FFFD) instead of crashing.
  • pytest tests/ — 3,719 passed, 14 skipped (5 new, 0 regressions).
  • mypy vera/ clean.
  • All pre-commit hooks pass (doc-counts, version-sync, limitations-sync, site assets all consistent).

Doc updates

Coordination with other open PRs

Why not also classify as a Vera-native trap?

A future improvement could detect the first invalid byte in host_print and raise a WasmTrapError with a Vera-native fix paragraph ("your String value points at corrupt memory; this typically indicates a codegen bug or a manual _alloc misuse"). That would surface the underlying corruption more loudly. But:

  1. The user-facing severity (Python traceback escape) is closed by errors="replace" alone.
  2. A loud trap blocks programs that legitimately produce non-UTF-8 output (e.g. binary protocols, raw byte streams). Replacement-char display is the standard Unix-tool behaviour.
  3. The trap classification can land as a follow-up once we have more data on what programs hit this.

Recommend shipping errors="replace" first; queue the diagnostic improvement as a separate issue.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Host I/O and WASM string reads now tolerate invalid UTF‑8 by surfacing replacement characters (U+FFFD) instead of raising decode errors.
  • Tests

    • Added tests covering UTF‑8 decoding safety across multiple host I/O sites and end‑to‑end behaviour; test counts updated to 3,735.
  • Documentation

    • ROADMAP and TESTING metrics updated; KNOWN_ISSUES added for array‑indexing-in-closure WASM bug and strict UTF‑8 decoding of network responses.

…ytes

A user-level program should NEVER produce a Python traceback regardless
of what the program does. This is the WasmTrapError contract from
#516 / #522 / #547 — runtime traps must surface as classified Vera-native
errors, not raw wasmtime/Python internals.

The host UTF-8 decode paths violated this contract. When an upstream
codegen bug produced a corrupt String (ptr, len) pair pointing at non-
UTF-8 bytes, host_print's strict-mode `bytes.decode("utf-8")` raised
UnicodeDecodeError. wasmtime-py wrapped the exception as a "python
exception" cause and the user's CLI saw a 30+ line Python traceback
ending in `UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1`.

Fix is per-site `errors="replace"` for the four affected decode paths:

  - host_print (vera.print)
  - host_stderr (vera.stderr)
  - host_contract_fail (vera.contract_fail)
  - _read_wasm_string (used by read_file / get_env / etc.)

Invalid bytes now surface as U+FFFD replacement characters in the user's
output rather than escaping as a Python traceback. The program continues
running until either main returns normally or hits a different trap.

Surfaced by #588 (captured-Array indexing in closure body produces
corrupt String pointers in Conway's Game of Life programs); fixing #588
removes the most common trigger but the defensive-coding hygiene applies
regardless of source. Other future codegen bugs that corrupt strings
will surface visibly (U+FFFD in output) instead of crashing Python.

Tests added (tests/test_runtime_traps.py TestHostPrintInvalidUtf8589):

  - Four structural assertions (one per affected site) that the fix is
    in place — `errors="replace"` is the regression we're pinning.
  - End-to-end test using a synthetic WAT module that imports vera.print
    and invokes it with raw invalid UTF-8 bytes. Asserts no Python
    exception escapes wasmtime's trampoline and that U+FFFD is in the
    decoded output. Independent of any Vera-source codegen bug.

Conway's Life from #588 now runs without a Python traceback (corrupt
strings render as U+FFFD chars instead of crashing).

Doc updates:
  - CHANGELOG entry under [Unreleased] Fixed (no version bump — rolls
    into whatever 0.0.136 ends up being once PR #587 lands).
  - KNOWN_ISSUES.md gets #588 (the upstream bug). #589 doesn't go in
    KNOWN_ISSUES because this PR closes it.
  - test count surfaces in TESTING.md and ROADMAP.md updated for the
    +5 new tests.

Co-Authored-By: Claude <noreply@anthropic.invalid>
@coderabbitai

coderabbitai Bot commented May 6, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 43ac6f2c-8fab-4892-afef-69c4b511087d

📥 Commits

Reviewing files that changed from the base of the PR and between c07ff83 and 9d8a0f2.

📒 Files selected for processing (1)
  • TESTING.md

📝 Walkthrough

Walkthrough

Hardened UTF‑8 decoding for WASM-to-Python string paths so invalid bytes are surfaced as U+FFFD (replacement character) instead of raising UnicodeDecodeError. Adds targeted tests covering host_print, host_stderr, host_contract_fail, WASM string reads and markdown read paths, and updates documentation and test metrics to 3,735 tests.

Changes

UTF‑8 decoding resilience and tests (#589)

Layer / File(s) Summary
Data Shape / Helper
vera/codegen/api.py, vera/wasm/markdown.py
WASM-linear-memory string readers now decode bytes using UTF‑8 with errors="replace" (changes in _read_wasm_string and _read_string) so invalid sequences become U+FFFD rather than raising.
Core Implementation
vera/codegen/api.py
Host-side IO handlers (host_print, host_stderr, host_contract_fail) and the post-return String decoding in Execute switched to errors="replace", removing strict-decode failure modes and the prior pointer-fallback.
Integration / Consumer
vera/codegen/api.py
Call sites updated where decoded text is appended to stdout/stderr buffers or recorded as contract violation text to consume replacement-character-decoded strings.
Tests
tests/test_runtime_traps.py
Added TestHostPrintInvalidUtf8589 with tests verifying errors="replace" is used across host_print, host_stderr, host_contract_fail, _read_wasm_string, markdown's _read_string, extract-string paths, and an end‑to‑end scenario that ensures no Python UnicodeDecodeError escapes.
Docs / Changelog / Metrics
CHANGELOG.md, TESTING.md, ROADMAP.md
CHANGELOG.md records the Unreleased Fixed entry. TESTING.md and ROADMAP.md updated test counts and related metrics (total tests now 3,735).
Known Issues Register
KNOWN_ISSUES.md
Added bug entries describing the captured Array<T> closure-indexing invalid-WASM issue (#588) and strict-UTF8 decoding in network response handling (#591).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

  • aallan/vera#592: Overlaps — adds end-to-end tests and hardens the same decoding sites (host_print, host_stderr, host_contract_fail, _read_wasm_string, markdown._read_string, return-string decoding).

Possibly related PRs

  • aallan/vera#533: Modifies similar runtime I/O and execute paths in vera/codegen/api.py; likely related to decoding/host-I/O changes.

Suggested labels

compiler, tests, docs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 68.42% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the main fix: preventing Python UnicodeDecodeError from escaping host_print on invalid UTF-8 bytes, directly referencing issue #589.
Linked Issues check ✅ Passed The changeset fully addresses #589's objectives: host_print, host_stderr, host_contract_fail, _read_wasm_string, _extract_string and markdown._read_string now use errors='replace' to prevent UnicodeDecodeError escapes, with comprehensive structural tests and synthetic end-to-end WAT validation. #586 is informational only (deferred).
Out of Scope Changes check ✅ Passed All changes are in-scope: the defensive UTF-8 decoding fix to core affected sites, test coverage (structural assertions and end-to-end synthetic test), and documentation updates (CHANGELOG, KNOWN_ISSUES, TESTING.md, ROADMAP.md). #586's call_indirect Unit-arg fix is noted as deferred.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/fix-host-print-utf8-crash

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented May 6, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.98%. Comparing base (c5c0263) to head (9d8a0f2).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #590   +/-   ##
=======================================
  Coverage   90.97%   90.98%           
=======================================
  Files          59       59           
  Lines       22930    22927    -3     
  Branches      259      259           
=======================================
- Hits        20860    20859    -1     
+ Misses       2063     2061    -2     
  Partials        7        7           
Flag Coverage Δ
javascript 57.36% <ø> (ø)
python 94.81% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@aallan

aallan commented May 6, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented May 6, 2026

Copy link
Copy Markdown
✅ Actions performed

Full review triggered.

@aallan

aallan commented May 6, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai help

@coderabbitai

coderabbitai Bot commented May 6, 2026

Copy link
Copy Markdown

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

CodeRabbit commands

These commands are invoked using PR/Issue comments.

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai rate limit to show your current review rate limit status.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai evaluate custom pre-merge check --instructions <custom-checks-instructions> --name <custom-checks-title> [--mode <error|warning>] to validate the custom pre-merge checks instructions. Defaults to error when --mode is omitted.
  • @coderabbitai ignore pre-merge checks to override pre-merge checks and get an approval on PR.
  • @coderabbitai run pre-merge checks to run pre-merge checks on the pull request.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve merge conflict to automatically resolve merge conflicts.
  • @coderabbitai autofix to automatically fix issues identified in unresolved review comments.
  • @coderabbitai resolve to resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai generate configuration to create a PR that adds the current resolved configuration as .coderabbit.yaml (or show it if already present).
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, support, documentation and community

  • Visit our status page to check the current availability of CodeRabbit.
  • Create a ticket on our support page for assistance with any issues or questions.
  • Visit our documentation site for detailed information on how to use CodeRabbit.
  • Join our Discord community to connect with other users and get help from the community.
  • Follow us on X/Twitter for updates and announcements.

@aallan

aallan commented May 6, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented May 6, 2026

Copy link
Copy Markdown
✅ Actions performed

Full review triggered.

Code-reviewer flagged a 5th decode site missed by the initial pass:
vera/wasm/markdown.py::_read_string — invoked from host_md_render /
host_md_has_heading / host_md_extract_text / host_md_count_blocks for
every Markdown host import.  Same failure mode as #589 just routed
through Markdown.  Now uses errors="replace" with a docstring
cross-referencing the api.py sites.

Silent-failure-hunter flagged _extract_string at api.py:3259 as
inconsistent: it had a try/except UnicodeDecodeError → pointer fallback
that silently mutated the return type from str to int.  Worse silent
failure than U+FFFD because downstream consumers (CLI printer) printed
an integer where a string was expected — looks like a valid result.
Switched to errors="replace" matching the four sibling sites; value
stays a str, invalid bytes surface as U+FFFD.

PR-test-analyzer flagged two test-quality issues:
  - Structural marker for host_print test was anchored on a comment
    string ("Host function: vera.print").  Anchored on "def host_print("
    instead, matching the pattern used for the other three structural
    tests.  Less fragile to comment refactoring.
  - Class docstring claimed "Tests here are structural assertions"
    but the class also contains an end-to-end test.  Reworded to
    "primarily structural assertions" plus an explicit note that
    the e2e test pins the wasmtime-trampoline contract rather than
    the production path.

Added two new structural tests:
  - test_markdown_read_string_uses_errors_replace
  - test_extract_string_uses_errors_replace

Plus a private _file_body_after helper so future tests in this class
can assert on any source file, not just api.py.

CHANGELOG entry expanded to mention the markdown.py site and the
_extract_string str→int fix.  Test counts bumped 3,733 → 3,735
(+2 tests in test_runtime_traps.py).

The HTTP/Inference network-decode sites at api.py:756 / 2809 / 2841
were considered but left for a follow-up — they're a different layer
(network response, not WASM memory) and deserve their own analysis
of whether replace-mode or structured-error is the right choice.

Co-Authored-By: Claude <noreply@anthropic.invalid>
@aallan

aallan commented May 6, 2026

Copy link
Copy Markdown
Owner Author

Review findings (logged for future reference)

Ran /pr-review-toolkit:review-pr (4 agents in parallel: code-reviewer, pr-test-analyzer, comment-analyzer, silent-failure-hunter). Findings + actions taken below. Fixes ship in the next commit on this branch.

Critical — fixed

1. Coverage gap: 5th UTF-8 decode site missed. vera/wasm/markdown.py:386 _read_string decodes WASM memory bytes with strict UTF-8. Invoked from host_md_render / host_md_has_heading / host_md_extract_text / host_md_count_blocks (vera/codegen/api.py:1413, 1430, 1447, 1466). The exact #589 failure mode — corrupt String (ptr, len)UnicodeDecodeError escaping wasmtime's trampoline → 30-line Python traceback — is reproducible through any of the four md_* host imports. Reported by code-reviewer (confidence 90). Fixed: applied errors="replace" with a docstring cross-referencing the api.py sites. New test_markdown_read_string_uses_errors_replace structural test.

2. Silent type mutation in _extract_string. vera/codegen/api.py:3259 had a try/except UnicodeDecodeError → pointer fallback that silently mutated the return type from str to int when bytes weren't valid UTF-8. Worse silent failure than visible U+FFFD because downstream consumers (CLI printer at minimum) printed an integer where a string was expected — looks like a valid result. Reported by silent-failure-hunter. Fixed: switched to errors="replace" matching the four sibling sites; value stays a str, invalid bytes become U+FFFD. New test_extract_string_uses_errors_replace structural test.

Important — fixed

3. Structural test anchored on fragile comment string. test_host_print_uses_errors_replace originally anchored on the comment "Host function: vera.print" rather than the function definition. Comment refactoring would silently break the test even though behavior is preserved. Reported by pr-test-analyzer. Fixed: anchored on def host_print( matching the pattern used for the other three structural tests.

4. Class docstring undersells the e2e test. Said "Tests here are structural assertions on the source" but the class also contains test_invalid_utf8_through_host_print_does_not_raise, an end-to-end synthetic-WAT test. Reported by comment-analyzer. Fixed: reworded to "primarily structural assertions" plus an explicit note about the e2e test's role (it pins the wasmtime-trampoline contract, not the production path).

Important — partial

5. End-to-end coverage thin for sites 2–6. Only host_print has a synthetic-WAT end-to-end test. host_stderr / host_contract_fail / _read_wasm_string / markdown.py::_read_string / _extract_string are covered only by structural greps. Reported by pr-test-analyzer (rated 6). Partial fix: added structural tests for the two new sites (markdown + _extract_string). Full parametrized end-to-end coverage of all six sites is left as a follow-up — the structural tests catch the regression of dropping errors="replace" from any single site, and the e2e test pins the wasmtime-trampoline mechanism. The cost-benefit of duplicating the e2e setup five more times didn't seem high enough to block this PR.

Mild — explicitly deferred

6. "Test-the-test" concern in test_invalid_utf8_through_host_print_does_not_raise. The end-to-end test defines its own host_print closure that hardcodes errors="replace" — if a contributor drops errors="replace" from vera/codegen/api.py:1076, the structural test catches it but this end-to-end test still passes (it's asserting Python's stdlib behaviour, not the production code's). Reported by pr-test-analyzer (rated 8). Deferred: the structural tests are what protect against the production regression; the e2e test is now correctly framed in the class docstring (post-fix #4) as a wasmtime-trampoline contract test. Wiring up the production host_print from inside a test would require either exporting the linker setup from execute() (large refactor) or duplicating it in the test (worse than the current state).

7. HTTP / Inference network-decode sites. vera/codegen/api.py:756 (Inference response), :2809 and :2841 (Http response) decode network data with strict UTF-8. Same pattern, but a different layer — these are external API responses, not WASM memory. The replace-vs-structured-error tradeoff is different (a misbehaving remote API is a real failure mode the user should know about, vs. a corrupt WASM-internal pointer which is a compiler bug). Deferred: noted in commit message; deserves its own analysis as a separate issue.

8. _read_wasm_string debug breadcrumb for FileNotFoundError disambiguation. When a corrupt path string degrades to U+FFFD and then Path(...).read_text() returns FileNotFoundError, the user can't tell whether the path was corrupt or just wrong. silent-failure-hunter suggested a debug log breadcrumb. Deferred as over-engineering for now — the U+FFFD chars are preserved in the path string, so the FileNotFoundError message should already contain them as a tell.

Strengths confirmed by all four agents

Final test surface

TestHostPrintInvalidUtf8589 now has 7 tests:

  • 6 structural assertions (one per affected site: 4 in api.py + markdown.py::_read_string + _extract_string)
  • 1 end-to-end synthetic-WAT test (host_print path)

Total tests on the PR: 3,728 → 3,735 (+7).

@aallan

aallan commented May 6, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented May 6, 2026

Copy link
Copy Markdown
✅ Actions performed

Full review triggered.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@TESTING.md`:
- Line 9: The test-count row in TESTING.md is inconsistent (it shows 3,735 total
but lists 3,714 passed + 14 skipped = 3,728); run the canonical count command
(pytest tests/ --collect-only -q or python scripts/check_doc_counts.py) to get
the authoritative numbers and then update the table row in TESTING.md so the
total equals the sum of all status buckets (or add the missing status bucket
explicitly), e.g., adjust the total to 3,728 or include the omitted bucket and
its count to reconcile the math.

In `@tests/test_runtime_traps.py`:
- Around line 1866-1894: The current assertions in
test_read_wasm_string_uses_errors_replace and
test_markdown_read_string_uses_errors_replace only check for the literal
'errors="replace"' which can live in docstrings/comments; update each test (the
bodies obtained via _api_body_after and _file_body_after) to assert that the
decode call actually includes the errors argument — e.g. verify the body
contains a decode invocation with errors="replace" (use a regex like
'\.decode\([^)]*errors\s*=\s*["\']replace["\']' or check both '.decode(' and
'errors="replace"' occur near each other) instead of the simple substring check
so the tests fail if the real .decode(..., errors="replace") call is removed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d5fe3429-702c-46b3-ac74-f75a6ded446b

📥 Commits

Reviewing files that changed from the base of the PR and between c5c0263 and c22980f.

📒 Files selected for processing (7)
  • CHANGELOG.md
  • KNOWN_ISSUES.md
  • ROADMAP.md
  • TESTING.md
  • tests/test_runtime_traps.py
  • vera/codegen/api.py
  • vera/wasm/markdown.py

Comment thread TESTING.md Outdated
Comment thread tests/test_runtime_traps.py
Two follow-up issues filed from the PR #590 review findings, now logged
in KNOWN_ISSUES so they don't get lost in the PR comment thread once
this PR merges.

#591 — HTTP/Inference network response UTF-8 decode hygiene. Three
sites in vera/codegen/api.py use strict-mode bytes.decode("utf-8"):

  - line 756 (Inference.complete via _call_inference_provider)
  - line 2809 (Http.get host_http_get)
  - line 2841 (Http.post host_http_post)

All three are wrapped in try/except Exception → Result::Err so they
DON'T escape as Python tracebacks (lower severity than #589). But the
error message handed to the user contains "UnicodeDecodeError: 'utf-8'
codec can't decode byte 0x..." — Python-internals noise rather than a
Vera-native diagnostic. Filed in KNOWN_ISSUES under Bugs.

#592 — e2e test parametrization for the remaining 5 UTF-8 decode
sites. TestHostPrintInvalidUtf8589 has 6 structural source-greps and
1 end-to-end synthetic-WAT test (host_print only). The other 5 sites
are pinned only structurally. A future refactor that centralises the
decodes into a _safe_decode() helper would break the structural
greps even with preserved behaviour. Filed in KNOWN_ISSUES under a
new "Test coverage gaps" section since it doesn't fit cleanly into
"Refactoring needed" (which is about file decomposition).

Co-Authored-By: Claude <noreply@anthropic.invalid>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@KNOWN_ISSUES.md`:
- Line 12: Update the KNOWN_ISSUES.md entry to correct the claim about exception
handling: state that only Http.get and Http.post are wrapped in try/except
Exception (so they return a Vera-level Err), while Inference.complete performs
the UTF-8 decode inside a with block with no exception handler and therefore can
directly raise UnicodeDecodeError; adjust the severity/language to reflect that
Inference.complete is exposed and not protected like Http.get/Http.post and
suggest the intended decision options (`errors="replace"` vs explicit detection
with a Vera-native Err) remain under consideration.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 93c3d27f-8f96-4e9b-9567-321fcdd4876d

📥 Commits

Reviewing files that changed from the base of the PR and between c22980f and fd4843d.

📒 Files selected for processing (1)
  • KNOWN_ISSUES.md

Comment thread KNOWN_ISSUES.md
…ertions

Both findings on PR #590 verified valid; fixed both with minimal changes.

1. TESTING.md line 9 had stale passed count: "3,735 across 29 files
   (3,714 passed + 14 skipped)". 3,714 + 14 = 3,728, not 3,735. The
   earlier sed in the review-fixes commit (c22980f) targeted "3,719
   passed" → "3,721 passed" but the original was "3,714 passed", so
   the substitution missed. Updated to "3,721 passed" so the math
   reconciles (3,721 + 14 = 3,735, matching pytest's collected count).

2. test_read_wasm_string_uses_errors_replace and
   test_markdown_read_string_uses_errors_replace asserted only that
   the substring 'errors="replace"' appeared in the source body. Both
   sites have docstrings I expanded with prose mentions of
   `errors="replace"` (explaining the strategy), so the substring
   would still match if the actual `.decode(...)` call was reverted
   to strict mode — the tests would pass spuriously.

   Tightened both to assert on the literal call form
   '.decode("utf-8", errors="replace")' which appears only in actual
   call sites (the docstrings use ``errors="replace"`` with markdown
   backticks, never the leading `.decode(` prefix). Verified by
   negative test: temporarily reverting markdown.py's call to strict
   mode triggers the tightened assertion failure as expected.

   Updated docstrings on both tests to explain why the literal-call
   form is required.

The other four structural tests (host_print / host_stderr /
host_contract_fail / _extract_string) already used the literal call
form and were not affected.

Co-Authored-By: Claude <noreply@anthropic.invalid>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@TESTING.md`:
- Line 82: Update the long "What it covers" summary for the test row to mention
the new UTF-8 hardening suite TestHostPrintInvalidUtf8589: state that it adds
the six structural decode-site assertions plus the synthetic WAT end-to-end
no-traceback regression and that these tests cover invalid-UTF-8 handling in
host printing; edit the row text that lists covered items (the multi-clause
paragraph referencing trap kinds, tee suite, source-mapping, etc.) to append a
short phrase like "and UTF‑8 hardening: TestHostPrintInvalidUtf8589 (6
decode-site assertions + 1 synthetic WAT end-to-end no-traceback test)" so the
documented coverage matches the updated test count.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0976b806-5f60-4f43-a9f1-d92eb81a3930

📥 Commits

Reviewing files that changed from the base of the PR and between fd4843d and c07ff83.

📒 Files selected for processing (2)
  • TESTING.md
  • tests/test_runtime_traps.py

Comment thread TESTING.md Outdated
Rabbit finding: the test_runtime_traps.py row had the test count
bumped (53→60) and line count bumped (1,784→2,001) to reflect the
+7 tests I added for #589, but the long "What it covers" prose
paragraph wasn't updated to describe what those new tests do.
Reading the row, you'd see "this file grew by 7 tests" with no clue
why or what they cover.

Appended a clause matching the existing per-suite style (the row
already lists "v0.0.123 tee suite", "v0.0.124 source-mapping suite",
"v0.0.125 Stage 3 suite") with:

  - the suite name (TestHostPrintInvalidUtf8589) and issue link (#589)
  - the 6 structural decode-site assertions naming each affected
    site (host_print / host_stderr / host_contract_fail /
    _read_wasm_string / markdown::_read_string / _extract_string)
  - the 1 synthetic-WAT end-to-end test and its specific contract
    (Python UnicodeDecodeError escapes wasmtime's trampoline as a
    "python exception" cause iff the host decode is strict)

Pure documentation update.  No code changes.

Co-Authored-By: Claude <noreply@anthropic.invalid>
@aallan aallan merged commit 2c046cb into main May 6, 2026
19 checks passed
@aallan aallan deleted the claude/fix-host-print-utf8-crash branch May 6, 2026 21:04
aallan added a commit that referenced this pull request May 6, 2026
apply_fn(closure, ()) on a (Unit -> X) closure failed WASM validation
with "expected i64, found i32" at compile time.  Same falsy-pitfall
class as #584's _fn_ret_types filter: _translate_apply_fn in
vera/wasm/closures.py defaulted the value-arg's WASM type to "i64"
when _infer_expr_wasm_type returned None, but None means BOTH
"couldn't infer" AND "Unit has no representation" - conflating the
two cases.  Result: a UnitLit pushed nothing onto the stack but
registered a phantom i64 param in the call_indirect signature.  The
closure-lift side correctly skipped Unit params, so the two sides
disagreed and validation rejected the call (the func_table_idx
landing where the phantom value-arg was expected).

Fix is three lines: change "arg_wasm_types.append(wt or 'i64')" to
"elif wt is not None: arg_wasm_types.append(wt)" so Unit args
contribute zero entries to the call_indirect sig.

Discovered while smoke-testing #514 / #535 closure-capture work on
v0.0.135 to confirm those v0.0.119-era bugs were closed.  Both #514
and #535 ARE closed; this is a separate adjacent codegen bug.

Conformance: tests/conformance/ch05_unit_arg_closure.vera covers
three positions:
  - No-capture Unit-arg closure
  - Int-capture Unit-arg closure
  - Array<Bool>-capture Unit-arg closure (exercises #514's heap-
    capture work + #586's fix together)

Conformance count: 84 -> 85.  Total tests 3,735 -> 3,740 (+5 from the
parametrized conformance runner: parse, check, verify, run, fmt).

Rebased on top of PR #590 (which merged #589's UTF-8 hardening into
[Unreleased] without a version bump).  The rebase rolls #589's
[Unreleased] entry into [0.0.136] alongside this PR's #586 entry, so
the v0.0.136 release ships both fixes together.  HISTORY.md row for
v0.0.136 updated to mention both #586 and #589.

Doc updates:
  - CHANGELOG entry under [0.0.136] Fixed for both #586 and #589
  - HISTORY.md Stage 11 row for v0.0.136 names both issues
  - All count surfaces (TESTING.md, ROADMAP.md, etc.) bumped to
    3,740 tests / 85 conformance
  - uv.lock regenerated

Co-Authored-By: Claude <noreply@anthropic.invalid>
aallan added a commit that referenced this pull request May 7, 2026
…ep test

Both findings valid; fixed both.

1. TESTING.md row 82 was missing the new TestHostSleepKeyboardInterrupt
   suite from its coverage prose, even though the test count and line
   count were already bumped. Appended a clause matching the existing
   per-suite style: names the suite, links #595 (the related residual
   issue this guards the in-our-control half of), summarises both
   tests in the class.

2. test_host_sleep_keyboard_interrupt_propagates_as_vera_exit was a
   "test-the-test" — it defined a LOCAL host_sleep_under_test mirror
   of the production code, patched time.sleep, and asserted the
   conversion. If the production code regressed (KeyboardInterrupt
   guard removed), this test still passed because it was checking
   the LOCAL copy.

   Replaced with test_host_sleep_keyboard_interrupt_end_to_end that:
     - Compiles a real Vera program calling IO.sleep(120)
     - Monkey-patches time.sleep to raise KeyboardInterrupt
     - Runs via execute() (the real production entrypoint)
     - Asserts ExecuteResult.exit_code == 130
     - Asserts pre-sleep IO.print preserved (#522 contract)
     - Asserts post-sleep IO.print did NOT run

   Negative-test verified: temporarily reverting the production
   guard makes this test fail (wasmtime crashes when KeyboardInterrupt
   escapes the trampoline). Restoring the guard makes it pass again.
   So the test now genuinely pins the production path.

   Same "test-the-test" gap rabbit caught on PR #590's #589 e2e
   test. Fixing this one symmetrically — for #589 the gap was less
   tractable (would have required exporting wasmtime linker setup);
   here it was just a 30-line rewrite using execute() directly.

test_runtime_traps.py grows from 2,100 to 2,138 lines (the e2e is
slightly longer than the local-helper form). Test count unchanged
(still 62: structural + e2e is the same shape as before).

Co-Authored-By: Claude <noreply@anthropic.invalid>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

host_print / host_stderr / host_contract_fail crash with raw Python UnicodeDecodeError on invalid UTF-8 bytes

1 participant