Skip to content

HTTP / Inference network-response decode uses strict UTF-8 (Python error noise leaks into Result Err) #591

@aallan

Description

@aallan

Summary

Three network-response UTF-8 decode sites in vera/codegen/api.py use strict-mode bytes.decode("utf-8"). If a remote API returns a response containing non-UTF-8 bytes, the user sees a Python UnicodeDecodeError message leaked into a Vera-level Result::Err string rather than something Vera-native and actionable.

This is a lower-severity sibling of #589 — the sites are all wrapped in try/except Exception_alloc_result_err_string, so the failure does NOT escape as a raw Python traceback (the way #589 did via host_print). But the error message handed to the user contains Python-internals noise (UnicodeDecodeError: 'utf-8' codec can't decode byte 0x... in position N) which doesn't help them understand what went wrong with their Inference.complete / Http.get / Http.post call.

Found during the #590 (/pr-review-toolkit:review-pr) follow-up: I grepped for ALL strict-decode sites across the codebase to make sure #589's fix was complete. These three are the remaining ones; they're a different layer (network data, not WASM memory) so they were deliberately not bundled into #590.

Affected sites

File:line Used by Trigger
vera/codegen/api.py:756 Inference.complete (host_inference_complete) Remote LLM API returns non-UTF-8 response
vera/codegen/api.py:2809 Http.get (host_http_get) URL response body is non-UTF-8
vera/codegen/api.py:2841 Http.post (host_http_post) URL response body is non-UTF-8

Why it's lower severity than #589

  • The bytes come from a remote server, not from a Vera-program-internal codegen bug. A misbehaving remote API is a real failure mode that the user should know about.
  • The try/except Exception wrapping means no Python traceback escapes — the user sees a Result::Err(...) they can match on, not a wasmtime-trampoline-wrapped Python crash.
  • Practical trigger probability is very low: HTTP/JSON APIs almost universally use UTF-8 with explicit Content-Type. A misconfigured server is the only realistic source.

Why it should still be fixed

  • The error message contains Python-internals noise that the Vera user can't map back to anything actionable in their program.
  • Setting errors="replace" (matching the host_print / host_stderr / host_contract_fail crash with raw Python UnicodeDecodeError on invalid UTF-8 bytes #589 sites) would let the user see the actual response body with U+FFFD substitutions, surfacing API misbehaviour in a way they can debug.
  • Alternative: detect invalid UTF-8 explicitly and surface a Vera-native Err("response body was not valid UTF-8") — preserves the failure signal but strips Python noise.

Trade-off

The structured-error path (option B) preserves the diagnostic signal but loses the actual response bytes. The errors="replace" path (option A) loses the diagnostic signal (the user might not notice U+FFFD chars in their string) but preserves data. Most plausible answer: option A for Http.get / Http.post (user wants the data); option B for Inference.complete (where a non-UTF-8 response from an LLM API is genuinely broken and should fail loudly).

Worth deciding before fixing.

Reproducer (synthetic)

Hard to reproduce naturally — would need a mock HTTP server returning Content-Type: application/json; charset=utf-8 with bytes that aren't valid UTF-8. The structural concern is verifiable by inspection of the four lines without needing a runtime trigger.

Acceptance

  • All three sites use either errors="replace" or an explicit invalid-UTF-8 detection that surfaces a Vera-native Err.
  • Existing structural tests in tests/test_runtime_traps.py::TestHostPrintInvalidUtf8589 extended to cover these three sites for regression-pinning.
  • No new bug introduced for valid-UTF-8 responses (which is the overwhelming common case).

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions