Skip to content

fix(tests): pin UTF-8 encoding in test_auxiliary_config_bridge — salvage of #21473#22460

Merged
kshitijk4poor merged 1 commit into
mainfrom
salvage/test-utf8-21473
May 9, 2026
Merged

fix(tests): pin UTF-8 encoding in test_auxiliary_config_bridge — salvage of #21473#22460
kshitijk4poor merged 1 commit into
mainfrom
salvage/test-utf8-21473

Conversation

@kshitijk4poor

Copy link
Copy Markdown
Collaborator

Summary

Salvage of #21473 by @KvnGz — pins encoding="utf-8" on 3 Path.read_text() calls in tests/agent/test_auxiliary_config_bridge.py that read in-tree source files (gateway/run.py, cli.py). On Western Windows installs the default locale is cp1252, so any non-ASCII byte in those source files (em-dash in a comment, etc.) crashes the read with UnicodeDecodeError: 'charmap' codec can't decode byte .... Linux CI doesn't catch this; Windows contributors hit it on every test run.

Matches the existing precedent in hermes_cli/doctor.py:363.

Why a salvage

#21473 was 161 commits behind main. Although the diff applies cleanly with no stale-branch reverts (the PR is small and surgical), I ran the salvage path so the commit lands against fresh main rather than 161 commits stale, and so I could rerun the suite end-to-end. @KvnGz's commit is preserved with original authorship via plain git cherry-pick (no --author rewrite needed).

Verification

$ git diff origin/main..HEAD --stat
 tests/agent/test_auxiliary_config_bridge.py | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)
  • scripts/run_tests.sh tests/agent/test_auxiliary_config_bridge.py — 19/19 pass on the salvage branch
  • git diff shows pure additions of encoding="utf-8" and explanatory comments, no other edits
  • Diff verified clean against current main (git diff origin/main..pr-21473 | git apply --check returned 0)

Sibling-area note

I also audited every other .read_text() in tests/ while reviewing — there are ~30 more without encoding=, but they all read test-controlled JSON / state files that won't contain non-ASCII bytes. The 3 calls in this PR are the only ones reading repo source files (which DO contain em-dashes). Author's scope is correctly minimal.

Credit

Original work by @KvnGz in #21473. Their commit is preserved here with original authorship (obafemiferanmi1999@gmail.com, mapped to KvnGz via #22458).

Notes for reviewer

Depends on #22458 (chore(release): add KvnGz to AUTHOR_MAP) being merged first.

Merge with --rebase to preserve @KvnGz's authorship in the commit history.

Closes #21473

Three tests in tests/agent/test_auxiliary_config_bridge.py read
in-tree source files (gateway/run.py and cli.py) via
Path.read_text() with no encoding argument.  The default falls
back to the system locale, which on Western Windows installs is
cp1252, and the read fails as soon as the source contains any
byte that isn't valid cp1252 (e.g. an em-dash in a comment):

    UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f
    in position 41190: character maps to <undefined>

Linux CI doesn't catch this because the default Linux locale is
UTF-8.  Windows contributors hit it on every run of the test suite.

Pin encoding="utf-8" on the three call sites that read repo
source files.  This matches the existing precedent in
hermes_cli/doctor.py:363, where the same pattern (with an
explanatory comment) was applied to fix the .env read on
non-UTF-8 Windows locales.

Affected tests now pass on Windows + Python 3.12:
  - TestGatewayBridgeCodeParity.test_gateway_has_auxiliary_bridge
  - TestGatewayBridgeCodeParity.test_gateway_no_compression_env_bridge
  - TestCLIDefaultsHaveAuxiliaryKeys.test_cli_defaults_can_merge_auxiliary
@kshitijk4poor kshitijk4poor merged commit 3801825 into main May 9, 2026
13 of 15 checks passed
@kshitijk4poor kshitijk4poor deleted the salvage/test-utf8-21473 branch May 9, 2026 09:47
@github-actions

github-actions Bot commented May 9, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: salvage/test-utf8-21473 vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 7876 on HEAD, 7876 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 4171 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant