You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Origin: Surfaced when the Windows CI matrix entries (added in PR #639 closing #637) ran for the first time. ~9 tests fail across tests/test_codegen.py, tests/test_codegen_monomorphize.py, tests/test_codegen_closures.py, and tests/test_html.py with two flavours of the same root cause:
UnicodeEncodeError: 'charmap' codec can't encode character '→' in position N: character maps to <undefined> — Python on Windows tries to write the right-arrow → to a stream whose default encoding is cp1252 (Windows' legacy code page).
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x97 in position N: invalid start byte — a file written with cp1252 (where 0x97 is em-dash —) is then read back assuming UTF-8.
test_html.py::test_all_vera_blocks_check (4 sub-blocks) — likely same root cause
test_html.py::test_all_vera_blocks_verify (4 sub-blocks) — likely same root cause
The pattern
Python on Windows uses locale.getpreferredencoding() for open() calls without an explicit encoding= argument. In en-US Windows that's cp1252; in other locales it's whatever the system default is. cp1252 can't represent most of Unicode; UTF-8 can.
Test fixtures with → (arrow) in error messages, contract violations, and trap diagnostics
HTML examples in docs/index.html
Regenerated llms.txt / SKILL.md outputs that contain em-dashes from CHANGELOG/spec prose
Any open() call in the code that touches these files without encoding='utf-8' will fail on Windows. The pre-PR-#639 matrix never tested Windows so the bugs lay dormant.
Recommended fix
Two complementary approaches; the project should pick one:
Option A — explicit encoding='utf-8' everywhere
Audit the codebase for open(...) calls without an explicit encoding kwarg. Add encoding='utf-8' to every text-mode open. This is the universal Python pattern and the recommended fix per PEP 597.
# Bad on Windows:withopen(path) asf:
returnf.read()
# Good everywhere:withopen(path, encoding='utf-8') asf:
returnf.read()
A grep gives the audit surface (rough estimate: ~50-100 sites across vera/, scripts/, tests/).
Option B — set PYTHONUTF8=1 in CI
Add PYTHONUTF8: 1 to the test job's environment, which forces Python to default to UTF-8 regardless of locale (PEP 540). Covers all open() calls without code changes.
- name: Run testsenv:
PYTHONUTF8: 1run: pytest -v -n auto
Trade-offs:
A is more robust. Code that explicitly says encoding='utf-8' works regardless of environment. Users running vera on Windows without PYTHONUTF8=1 still hit the bug under Option B.
B is faster to land (one CI config change vs. ~50 code changes). Good as a stopgap; Option A is the proper fix.
Source files in tests/conformance/*.vera — these are read by the parser which already handles UTF-8 explicitly via lark.
Acceptance criteria
All ~9 failing tests pass on windows-latest, 3.{11,12,13}.
Either Option A is fully applied (preferred) or Option B is applied + Option A queued as follow-up.
A regression sentinel: a test that explicitly opens a file containing → and — without PYTHONUTF8=1 and expects the right behaviour (catches future regressions of Option A's coverage).
Origin: Surfaced when the Windows CI matrix entries (added in PR #639 closing #637) ran for the first time. ~9 tests fail across
tests/test_codegen.py,tests/test_codegen_monomorphize.py,tests/test_codegen_closures.py, andtests/test_html.pywith two flavours of the same root cause:UnicodeEncodeError: 'charmap' codec can't encode character '→' in position N: character maps to <undefined>— Python on Windows tries to write the right-arrow→to a stream whose default encoding is cp1252 (Windows' legacy code page).UnicodeDecodeError: 'utf-8' codec can't decode byte 0x97 in position N: invalid start byte— a file written with cp1252 (where 0x97 is em-dash—) is then read back assuming UTF-8.Failing tests (sample)
test_array_flatten_with_empty_inners— encode→test_array_sort_by_options— encode→test_array_sort_by_stability— decode 0x97 (em-dash)test_modulecall_provenance_emits_guard_and_traps— encode→test_rhs_only_provenance_emits_guard_and_traps— encode→test_array_map_type_change_int_to_bool— encode→test_non_contiguous_int_capture_tail_shape— decode 0x97test_html.py::test_all_vera_blocks_check(4 sub-blocks) — likely same root causetest_html.py::test_all_vera_blocks_verify(4 sub-blocks) — likely same root causeThe pattern
Python on Windows uses
locale.getpreferredencoding()foropen()calls without an explicitencoding=argument. In en-US Windows that's cp1252; in other locales it's whatever the system default is. cp1252 can't represent most of Unicode; UTF-8 can.Vera's source files contain Unicode characters in:
→(arrow) in error messages, contract violations, and trap diagnosticsdocs/index.htmlAny
open()call in the code that touches these files withoutencoding='utf-8'will fail on Windows. The pre-PR-#639 matrix never tested Windows so the bugs lay dormant.Recommended fix
Two complementary approaches; the project should pick one:
Option A — explicit
encoding='utf-8'everywhereAudit the codebase for
open(...)calls without an explicit encoding kwarg. Addencoding='utf-8'to every text-mode open. This is the universal Python pattern and the recommended fix per PEP 597.A grep gives the audit surface (rough estimate: ~50-100 sites across
vera/,scripts/,tests/).Option B — set
PYTHONUTF8=1in CIAdd
PYTHONUTF8: 1to the test job's environment, which forces Python to default to UTF-8 regardless of locale (PEP 540). Covers allopen()calls without code changes.Trade-offs:
encoding='utf-8'works regardless of environment. Users runningveraon Windows withoutPYTHONUTF8=1still hit the bug under Option B.Out of scope
.wasmbinaries (already always UTF-8 — WASM standard requires it).tests/conformance/*.vera— these are read by the parser which already handles UTF-8 explicitly vialark.Acceptance criteria
windows-latest, 3.{11,12,13}.→and—withoutPYTHONUTF8=1and expects the right behaviour (catches future regressions of Option A's coverage).Pairs with
/dev/stdinWindows handling (sibling).\Uescape investigation (probably collapses into this issue).