You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Three network-response UTF-8 decode sites in vera/codegen/api.py use strict-mode bytes.decode("utf-8"). If a remote API returns a response containing non-UTF-8 bytes, the user sees a Python UnicodeDecodeError message leaked into a Vera-level Result::Err string rather than something Vera-native and actionable.
This is a lower-severity sibling of #589 — the sites are all wrapped in try/except Exception → _alloc_result_err_string, so the failure does NOT escape as a raw Python traceback (the way #589 did via host_print). But the error message handed to the user contains Python-internals noise (UnicodeDecodeError: 'utf-8' codec can't decode byte 0x... in position N) which doesn't help them understand what went wrong with their Inference.complete / Http.get / Http.post call.
Found during the #590 (/pr-review-toolkit:review-pr) follow-up: I grepped for ALL strict-decode sites across the codebase to make sure #589's fix was complete. These three are the remaining ones; they're a different layer (network data, not WASM memory) so they were deliberately not bundled into #590.
The bytes come from a remote server, not from a Vera-program-internal codegen bug. A misbehaving remote API is a real failure mode that the user should know about.
The try/except Exception wrapping means no Python traceback escapes — the user sees a Result::Err(...) they can match on, not a wasmtime-trampoline-wrapped Python crash.
Practical trigger probability is very low: HTTP/JSON APIs almost universally use UTF-8 with explicit Content-Type. A misconfigured server is the only realistic source.
Why it should still be fixed
The error message contains Python-internals noise that the Vera user can't map back to anything actionable in their program.
Alternative: detect invalid UTF-8 explicitly and surface a Vera-native Err("response body was not valid UTF-8") — preserves the failure signal but strips Python noise.
Trade-off
The structured-error path (option B) preserves the diagnostic signal but loses the actual response bytes. The errors="replace" path (option A) loses the diagnostic signal (the user might not notice U+FFFD chars in their string) but preserves data. Most plausible answer: option A for Http.get / Http.post (user wants the data); option B for Inference.complete (where a non-UTF-8 response from an LLM API is genuinely broken and should fail loudly).
Worth deciding before fixing.
Reproducer (synthetic)
Hard to reproduce naturally — would need a mock HTTP server returning Content-Type: application/json; charset=utf-8 with bytes that aren't valid UTF-8. The structural concern is verifiable by inspection of the four lines without needing a runtime trigger.
Acceptance
All three sites use either errors="replace" or an explicit invalid-UTF-8 detection that surfaces a Vera-native Err.
Existing structural tests in tests/test_runtime_traps.py::TestHostPrintInvalidUtf8589 extended to cover these three sites for regression-pinning.
No new bug introduced for valid-UTF-8 responses (which is the overwhelming common case).
Summary
Three network-response UTF-8 decode sites in
vera/codegen/api.pyuse strict-modebytes.decode("utf-8"). If a remote API returns a response containing non-UTF-8 bytes, the user sees a PythonUnicodeDecodeErrormessage leaked into a Vera-levelResult::Errstring rather than something Vera-native and actionable.This is a lower-severity sibling of #589 — the sites are all wrapped in
try/except Exception→_alloc_result_err_string, so the failure does NOT escape as a raw Python traceback (the way #589 did viahost_print). But the error message handed to the user contains Python-internals noise (UnicodeDecodeError: 'utf-8' codec can't decode byte 0x... in position N) which doesn't help them understand what went wrong with theirInference.complete/Http.get/Http.postcall.Found during the #590 (
/pr-review-toolkit:review-pr) follow-up: I grepped for ALL strict-decode sites across the codebase to make sure #589's fix was complete. These three are the remaining ones; they're a different layer (network data, not WASM memory) so they were deliberately not bundled into #590.Affected sites
vera/codegen/api.py:756Inference.complete(host_inference_complete)vera/codegen/api.py:2809Http.get(host_http_get)vera/codegen/api.py:2841Http.post(host_http_post)Why it's lower severity than #589
try/except Exceptionwrapping means no Python traceback escapes — the user sees aResult::Err(...)they can match on, not a wasmtime-trampoline-wrapped Python crash.Why it should still be fixed
errors="replace"(matching the host_print / host_stderr / host_contract_fail crash with raw Python UnicodeDecodeError on invalid UTF-8 bytes #589 sites) would let the user see the actual response body with U+FFFD substitutions, surfacing API misbehaviour in a way they can debug.Err("response body was not valid UTF-8")— preserves the failure signal but strips Python noise.Trade-off
The structured-error path (option B) preserves the diagnostic signal but loses the actual response bytes. The
errors="replace"path (option A) loses the diagnostic signal (the user might not notice U+FFFD chars in their string) but preserves data. Most plausible answer: option A forHttp.get/Http.post(user wants the data); option B forInference.complete(where a non-UTF-8 response from an LLM API is genuinely broken and should fail loudly).Worth deciding before fixing.
Reproducer (synthetic)
Hard to reproduce naturally — would need a mock HTTP server returning
Content-Type: application/json; charset=utf-8with bytes that aren't valid UTF-8. The structural concern is verifiable by inspection of the four lines without needing a runtime trigger.Acceptance
errors="replace"or an explicit invalid-UTF-8 detection that surfaces a Vera-nativeErr.tests/test_runtime_traps.py::TestHostPrintInvalidUtf8589extended to cover these three sites for regression-pinning.Related