Skip to content

Windows-strict CI: cp1252 fix + drop advisory + bump to v0.0.143#646

Merged
aallan merged 6 commits into
mainfrom
claude/issue-641-utf8-encoding
May 10, 2026
Merged

Windows-strict CI: cp1252 fix + drop advisory + bump to v0.0.143#646
aallan merged 6 commits into
mainfrom
claude/issue-641-utf8-encoding

Conversation

@aallan

@aallan aallan commented May 10, 2026

Copy link
Copy Markdown
Owner

Summary

Closes #641. The "now works on Windows" release — closes the third and final sub-issue from PR #639's matrix-expansion follow-up cluster, drops the continue-on-error advisory, and bumps to v0.0.143.

What changes

Three things in one PR — appropriate for the milestone framing (Windows goes from untested to strict gate):

  1. Default cp1252 encoding causes Windows test failures — audit file I/O for explicit UTF-8 #641 fix. Set PYTHONUTF8=1 in the CI test job env (PEP 540 — Python's text-mode open() defaults to UTF-8 regardless of locale). Closes ~9 failing tests across test_codegen.py, test_codegen_monomorphize.py, test_codegen_closures.py, test_html.py (all 'charmap' codec can't encode '→' / 'utf-8' can't decode 0x97 flavours of the same root cause). Also added explicit encoding='utf-8' to vera/parser.py's grammar load — the load-bearing site that runs on every parse.

  2. Drop continue-on-error advisory. PR Add windows-latest to CI test matrix in advisory mode (closes #637) #639 added Windows entries with continue-on-error: ${{ matrix.os == 'windows-latest' }} so failures didn't block merges while sub-issues were being fixed. All three sub-issues (Vera CLI's /dev/stdin is Unix-only — add Windows-compatible stdin handling #640 stdin via PR Read stdin via sys.stdin for cross-platform support (closes #640) #644, Default cp1252 encoding causes Windows test failures — audit file I/O for explicit UTF-8 #641 cp1252 here, test_io_read_file_roundtrip: Windows path embedded in Vera string literal triggers \U escape #642 path escapes via PR Sanitise Windows paths in IO test fixtures (closes #642) #643) now closed. Matrix is now fully strict across all 9 entries.

  3. Version bump 0.0.142 → 0.0.143. This is the Windows-compatibility milestone — the project went from "untested on Windows" to "Windows is a strict gate" in three sequential PRs.

Why Option B (PYTHONUTF8=1) instead of Option A (audit each call site)

Per the original #641 issue body, two options were on the table:

  • A: explicit encoding='utf-8' at every text-mode call site (~30 sites across vera/, scripts/, tests/)
  • B: PYTHONUTF8=1 in CI as the stopgap

This PR goes with B for the CI fix because it's a single-line change vs. ~30 mechanical edits, and the user asked to "burn through" the cluster. #645 is filed to track Option A as the durable follow-up — explicit per-site encoding is what protects users running on Windows without PYTHONUTF8=1 in their shell environment.

Test plan

Sibling PRs

This PR depends on the other two having landed (or at least having their fixes in main) so the matrix can be flipped to strict cleanly. If those land first and CI is green, this one's clean.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Improved stdin handling and enforced UTF‑8 text processing on Windows for more reliable test/run behaviour
    • Normalised Windows temporary paths in tests to avoid escape-related failures
  • Tests

    • Updated temporary-file handling in tests to reliably run CLI checks across platforms
  • Chores

    • Bumped project version to v0.0.143
    • Windows tests no longer run in advisory mode and now block merges alongside other platforms
  • Documentation

    • Release notes, changelog and history updated for v0.0.143

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 10, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Version 0.0.143 is released: parser now loads grammar.lark with explicit UTF‑8, CI test job sets PYTHONUTF8=1 and no longer treats Windows results as advisory, tests adjust tempfile lifecycles for subprocess compatibility, and version/docs/changelog are updated.

Changes

v0.0.143 Release: Windows Encoding, CI Strictness & Test Fixes

Layer / File(s) Summary
Parser UTF-8 Grammar Loading
vera/parser.py
_get_parser() now reads grammar.lark with read_text(encoding="utf-8"); Lark configuration unchanged.
CI Workflow
.github/workflows/ci.yml
jobs.test adds job-level env: PYTHONUTF8: 1 and removes continue-on-error: ${{ matrix.os == 'windows-latest' }} so Windows failures fail the job.
Test lifecycle changes
tests/test_html.py, TESTING.md
test_all_vera_blocks_check and test_all_vera_blocks_verify now use NamedTemporaryFile(..., delete=False), close the handle before invoking subprocesses, and unlink the file in finally; TESTING.md test-line count and fixture conventions updated.
Documentation & Local Guidance
CLAUDE.md, TESTING.md
Added shell pitfalls and cross-platform test-fixture guidance (NamedTemporaryFile rules, POSIX path formatting in Vera literals, and explicit encoding="utf-8").
Release Notes & Known Issues
CHANGELOG.md, HISTORY.md, KNOWN_ISSUES.md
Inserted v0.0.143 release entry and history row; updated compare links; replaced prior Windows advisory test-gap entries with requirement to use explicit encoding='utf-8' in text-mode file I/O.
Version Metadata
pyproject.toml, vera/__init__.py, README.md
Bumped version from 0.0.1420.0.143 in project metadata, exported constants, and project status text.

Sequence Diagram(s)

sequenceDiagram
  participant CI
  participant TestRunner
  participant VeraParser
  participant Filesystem
  CI->>TestRunner: start test job (PYTHONUTF8=1)
  TestRunner->>VeraParser: import -> _get_parser()
  VeraParser->>Filesystem: read_text("grammar.lark", encoding="utf-8")
  Filesystem-->>VeraParser: UTF-8 grammar
  TestRunner->>TestRunner: run tests (tempfile closed before subprocess)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

  • aallan/vera#643: Related — similar fixes to sanitize Windows file paths in tests.
  • aallan/vera#639: Related — that PR added Windows advisory behaviour which this PR removes.
  • aallan/vera#638: Related — overlaps KNOWN_ISSUES edits replaced by this PR.

Suggested labels

compiler, tests, ci, docs

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the three main changes: enabling UTF-8 on Windows (cp1252 fix), removing the continue-on-error advisory, and version bump to v0.0.143.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/issue-641-utf8-encoding

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented May 10, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.97%. Comparing base (6922fe6) to head (884aa0e).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #646   +/-   ##
=======================================
  Coverage   90.97%   90.97%           
=======================================
  Files          59       59           
  Lines       23137    23137           
  Branches      259      259           
=======================================
  Hits        21048    21048           
  Misses       2082     2082           
  Partials        7        7           
Flag Coverage Δ
javascript 57.36% <ø> (ø)
python 94.76% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@HISTORY.md`:
- Line 331: Update the top header count in HISTORY.md to match the corrected
total of active development days: change the header line that currently reads
"56" to "57" so it matches the body line "Total: **810+ commits, 143 tagged
releases, 57 active development days.**"; locate and edit the header/title text
(the first few lines of HISTORY.md) to replace the outdated day-count with 57.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 55fb920e-8581-46c3-94cd-e58eb8ed9a20

📥 Commits

Reviewing files that changed from the base of the PR and between f148e54 and eb0fbc8.

⛔ Files ignored due to path filters (6)
  • docs/index.html is excluded by !docs/**
  • docs/index.md is excluded by !docs/**
  • docs/llms-full.txt is excluded by !docs/**
  • docs/llms.txt is excluded by !docs/**
  • docs/sitemap.xml is excluded by !docs/**
  • uv.lock is excluded by !**/*.lock, !uv.lock
📒 Files selected for processing (8)
  • .github/workflows/ci.yml
  • CHANGELOG.md
  • HISTORY.md
  • KNOWN_ISSUES.md
  • README.md
  • pyproject.toml
  • vera/__init__.py
  • vera/parser.py

Comment thread HISTORY.md
Closes the third sub-issue from PR #639's matrix-expansion follow-up
cluster (#641 cp1252 file I/O encoding) and flips the Windows
matrix entries from advisory to strict.

This is the "now works on Windows" release.

Set PYTHONUTF8=1 in the CI test job environment so Python's
text-mode open() defaults to UTF-8 regardless of locale (PEP 540).
Covers all CI-side encoding failures with a single config change
rather than auditing ~30 individual call sites.  Closes ~9 failing
tests across test_codegen.py, test_codegen_monomorphize.py,
test_codegen_closures.py, test_html.py.

Also added explicit encoding='utf-8' to vera/parser.py's grammar
load — the load-bearing site that runs on every parse, so users
on Windows without PYTHONUTF8=1 in their shell still get correct
grammar loading (defense-in-depth).

Filed #645 for the broader audit (every text-mode open() /
read_text() / write_text() call site for explicit encoding) as
the durable fix.

Drop continue-on-error advisory mode

PR #639 added Windows entries with continue-on-error: ${{ matrix.os
== 'windows-latest' }} so failures didn't block merges while #640,
sub-issues now closed (PR for #642 sanitises Windows paths in
test fixtures; PR for #640 reads sys.stdin cross-platform; this
PR closes #641).  The continue-on-error line is removed in this
release; the matrix is now fully strict across all 9 entries
(ubuntu / macos / windows × 3.11 / 3.12 / 3.13).

Version bump 0.0.142 → 0.0.143

This is the Windows-compatibility milestone — the project went
from "untested on Windows" to "Windows is a strict gate" in three
sequential PRs (#643 / #644 / this).  Marking with a minor version
bump and "now works on Windows" framing in CHANGELOG.

Co-Authored-By: Claude <noreply@anthropic.invalid>
@aallan aallan force-pushed the claude/issue-641-utf8-encoding branch from eb0fbc8 to 676606f Compare May 10, 2026 22:34
Header line 3 said 56 active development days; footer line 331
already correctly read 57 after v0.0.143 added a day.  Header was
the stale half.  Trivial single-word fix.

Co-Authored-By: Claude <noreply@anthropic.invalid>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@HISTORY.md`:
- Line 290: Update the v0.0.143 entry in HISTORY.md (the line containing
"v0.0.143 | 10 May") to remove the incorrect issue reference `#641` from the list
of closed issues so it only lists `#640` and `#642`; keep the rest of the text
intact (including the notes about Windows CI, `continue-on-error` advisory, and
the matrix) or remove the `#641` citation entirely if you prefer.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6c514633-6d12-40c6-8f0c-fb8fa8658704

📥 Commits

Reviewing files that changed from the base of the PR and between 676606f and 74948cb.

📒 Files selected for processing (1)
  • HISTORY.md

Comment thread HISTORY.md
`tests/test_html.py::test_all_vera_blocks_check` and
`test_all_vera_blocks_verify` used `tempfile.NamedTemporaryFile(...,
delete=True)` then ran `vera <subcmd> f.name` via subprocess while
the `with` block held the file handle open.  On Windows you can't
open a file twice while one handle is still held; the subprocess
hit a Python traceback (truncated to "Traceback (most recent call
last):" by the test's first-line error capture).

This is the classic Windows `NamedTemporaryFile(delete=True)`
gotcha — Unix allows concurrent file handles, Windows doesn't.
All other test fixtures in the suite already use `delete=False`
and clean up explicitly; only test_html.py had the legacy
`delete=True` pattern.

Fix: convert both sites to `delete=False` + manual `Path.unlink()`
in a `finally` block.  Closes the last 2 Windows-only test
failures on PR #646; matrix should now go fully green across all
9 strict entries.

Co-Authored-By: Claude <noreply@anthropic.invalid>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_html.py`:
- Around line 112-114: tempfile.NamedTemporaryFile is invoked twice in
tests/test_html.py (the NamedTemporaryFile call that assigns to f around lines
where mode="w", suffix=".vera", dir=str(ROOT), delete=False, and the second
similar call later) without an explicit encoding, which can cause non-ASCII
writes to fail on some locales; update both calls to include encoding='utf-8' so
text-mode writes use UTF-8 (i.e., add encoding='utf-8' to the
NamedTemporaryFile(...) arguments for the NamedTemporaryFile that creates the
Vera temp files).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1fde7034-e146-43a7-9026-94865157c15f

📥 Commits

Reviewing files that changed from the base of the PR and between 74948cb and 58c9dd5.

📒 Files selected for processing (2)
  • TESTING.md
  • tests/test_html.py

Comment thread tests/test_html.py
The post-#637 Windows CI rollout (PRs #639/#643/#644/#646) hit
three test-fixture footguns — `tempfile.NamedTemporaryFile`'s
delete-while-open behaviour, Windows paths embedded in Vera
string literals tripping the `\U` escape, and cp1252 default
file I/O encoding.  Each had a portable workaround.

Added a "Test Fixture Conventions" section to TESTING.md
(immediately before "Adding Tests") with the three rules + their
canonical wrong/right code snippets.  Each rule cites the test
that surfaced the bug + the PR that fixed it.

Added a brief "Cross-platform pitfalls" section to CLAUDE.md's
project-orientation footer that points at the TESTING.md section.
The CLAUDE.md entry is a one-bullet pointer per rule, not a
full repeat — keeping the substantive content in TESTING.md as
the single source of truth.

These notes will save the next contributor (or agent) from
re-discovering the same footguns.

Co-Authored-By: Claude <noreply@anthropic.invalid>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CLAUDE.md`:
- Around line 183-185: The doc incorrectly suggests using
tmp_path.replace(os.sep, "/") on a pathlib.Path; replace any occurrences or
examples that call tmp_path.replace(...) with tmp_path.as_posix() so the pytest
tmp_path Path is converted to a POSIX string properly (e.g., where the guidance
talks about "Paths embedded into Vera string literals" use tmp_path.as_posix());
update the wording/examples around tmp_path to show .as_posix() and ensure any
surrounding explanation mentions converting the Path to a string for Vera
literals.

In `@TESTING.md`:
- Around line 337-338: Update TESTING.md so its Overview and CI Pipeline matrix
rows reflect that Windows is now part of the active CI matrix: find the phrases
"6 combinations (Ubuntu/macOS)" and the Overview/CI Pipeline rows that enumerate
platform combinations and change them to include Windows (and adjust the total
count accordingly), and update the section text that claims only Ubuntu/macOS to
state the current active matrix includes Windows as well; ensure the matrix
labels and counts are consistent with the new "Windows rollout" paragraph added
earlier in the file.
- Around line 371-372: The example uses tmp_path.replace(os.sep, "/") which
calls pathlib.Path.replace (wrong signature) and raises a TypeError; change the
conversion to POSIX to use the Path method tmp_path.as_posix() instead so
vera_path is a POSIX string (used when building source =
f'IO.read_file("{vera_path}")'); update the reference of vera_path to be
tmp_path.as_posix() everywhere.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 23b46a03-beb7-4be6-be3c-94ed07627c08

📥 Commits

Reviewing files that changed from the base of the PR and between 58c9dd5 and c8c063a.

📒 Files selected for processing (2)
  • CLAUDE.md
  • TESTING.md

Comment thread CLAUDE.md
Comment thread TESTING.md
Comment thread TESTING.md Outdated
aallan and others added 2 commits May 11, 2026 00:01
…deRabbit)

Both `NamedTemporaryFile` calls in test_html.py omitted the
`encoding=` kwarg.  Per the convention I just added to TESTING.md
("File I/O without explicit encoding falls back to the locale
default"), the explicit form is the durable fix — CI sets
PYTHONUTF8=1 as a backstop, but a local-developer run on
Windows without that env var would hit cp1252 fallbacks if the
HTML code blocks ever contain non-ASCII characters.

Trivial — and consistent with the rule the same PR documents.

Co-Authored-By: Claude <noreply@anthropic.invalid>
Three findings on PR #646's docs additions:

1. CLAUDE.md and TESTING.md examples used `tmp_path.replace(os.sep,
   "/")` to get POSIX form.  Works on str, but BROKEN on
   pathlib.Path (Path.replace is the rename method — different
   signature, would silently move the file).  Variable name
   `tmp_path` evokes pytest's Path-typed fixture.  Switched to
   `Path(tmp_path).as_posix()` which works for both str and Path.
   The actual test code at tests/test_codegen.py:7760 / 7787
   still uses str.replace because the variables there are str
   (from `tempfile.NamedTemporaryFile().name` and `os.path.join`)
   — that's correct in context, no change needed.

2. TESTING.md Overview row line 18 said "6 combinations
   (Ubuntu/macOS)" — pre-Windows.  CI Pipeline section line 484
   said the same.  Both updated to "9 combinations
   (Ubuntu/macOS/Windows)".

Same shape as the README test-count drift caught in PR #631 —
matrix-related docs across multiple files don't auto-sync.
Worth flagging the doc-counts script's coverage gap as a
follow-up if it recurs.

Co-Authored-By: Claude <noreply@anthropic.invalid>
@aallan aallan merged commit 2407624 into main May 10, 2026
23 checks passed
@aallan aallan deleted the claude/issue-641-utf8-encoding branch May 10, 2026 23:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Default cp1252 encoding causes Windows test failures — audit file I/O for explicit UTF-8

1 participant