fix: reconfigure MCP stdin/stdout to UTF-8 on Windows (fixes #363)#400
fix: reconfigure MCP stdin/stdout to UTF-8 on Windows (fixes #363)#400mvalentsev wants to merge 3 commits into
Conversation
f7a7e0a to
c570213
Compare
PR Review: fix: reconfigure MCP stdin/stdout to UTF-8 on Windows (fixes #363)Executive Summary
Affected Areas: Business Impact: Unblocks all Windows users encountering non-ASCII content via MCP (#363) Flow Changes: All Ratings
PR Health
High Priority Issues(Must fix before merge) None. The fix is correct and well-scoped.
Medium Priority Issues(Should fix, not blocking) [Error Handling] #1: Silent failure swallows reconfigure errors with no diagnosticLocation: If except Exception:
- pass # older Python or non-reconfigurable stream
+ logger.warning("Could not reconfigure stdio to UTF-8: %s", e)Note: requires renaming the except clause to [Bug] #2:
|
c570213 to
b1b9b5d
Compare
|
Applied both: stderr is now reconfigured alongside stdin/stdout, and the except clause logs the reason instead of silently passing. Edit 2026-05-03: two follow-ups landed since the above.
|
b1b9b5d to
0e9e614
Compare
0e9e614 to
ebdd2c2
Compare
web3guru888
left a comment
There was a problem hiding this comment.
Solid fix. The Windows codepage issue is real — non-English users on Windows get mojibake that eventually crashes ChromaDB's tokenizer downstream. The guard (sys.platform == "win32") is clean, and the try/except means it gracefully degrades.
Two notes:
-
errors="strict"— Good call using strict mode rather than the default"replace". With MCP, you want to fail fast on encoding issues rather than silently corrupt JSON data. -
stderr too — Nice that you also reconfigure stderr. If the MCP server logs a non-ASCII palace path or drawer content in an error message, that would have crashed on Windows too.
This is a no-op on Linux/macOS, so zero risk to existing setups. We run on Linux but cross-platform compat is important for the ecosystem.
🔭 Reviewed as part of the MemPalace-AGI integration project — autonomous research with perfect memory. Community interaction updates are posted regularly on the dashboard.
ebdd2c2 to
71d6d0f
Compare
3182a4c to
7833da1
Compare
7833da1 to
5247e55
Compare
932337b to
9f52c4e
Compare
c231860 to
3ddfb72
Compare
3ddfb72 to
ea816f9
Compare
The `python -m mempalace.fact_checker --stdin` entry point reads non-ASCII text through the system ANSI codepage (cp1252/cp1251/cp950) on Windows, which mojibakes characters before claim-extraction sees them. Reconfigure stdin/stdout/stderr to UTF-8 with `errors="strict"`, wrapped in try/except so a replaced stream (Jupyter, test harness) logs a warning rather than crashing the CLI. Mirrors the same fix shipped for `mcp_server.py:main()` (MemPalace#400) and `hooks_cli.py:run_hook()` (MemPalace#1280) -- this is the third and last stdin-reading entry point in the package.
dbdf5e6 to
b79e09a
Compare
|
Rebased on develop now that #1060 has landed. Repositioned this PR's scope: with If the lighter |
19c18b5 to
2a14a35
Compare
dc48ddd to
ddc1e76
Compare
dec29ca to
efa941f
Compare
efa941f to
93a921d
Compare
…ce#363) On Windows, Python defaults stdin to the system ANSI codepage (cp1251, cp1252, etc). When an MCP client sends UTF-8 JSON with non-ASCII characters (Cyrillic, CJK, Hebrew, emoji), the bytes are decoded through the wrong codepage and produce mojibake with surrogate escapes. These broken strings pass Python type checks but crash the HuggingFace tokenizer inside ChromaDB's embedding function with "TextInputSequence must be str in query". The fix calls sys.stdin.reconfigure(encoding="utf-8") at the top of main(), guarded by sys.platform == "win32" so it only runs where needed. stdout gets the same treatment so JSON responses with non-ASCII content serialize cleanly. Fixes MemPalace#363
…tests
The previous block reconfigured stdin / stdout / stderr to UTF-8 with
errors='strict'. On Windows, a malformed byte from a misbehaving MCP
client (or a stray BOM) then raises UnicodeDecodeError inside
sys.stdin.readline(), bypassing the inner try/except for json.loads
and killing the whole server on the first bad byte -- worse than the
pre-PR cp1252 behavior, which produced wrong-but-valid str downstream.
Split the policy:
stdin -> surrogateescape (lone surrogates pass through; json.loads
then raises a recoverable parse error -> -32700 instead
of process death)
stdout -> strict (we control writes here; failures are real bugs)
stderr -> strict (same)
Iterate over the three streams via getattr(sys, name) so a test
harness that strips a stream entirely degrades gracefully.
Add two mocked tests via _ReconfigurableStdio to pin the per-stream
policy AND verify the off-Windows branch is a no-op. Closes the
'no test added' coverage gap from pre-merge review.
Move the per-stream policy loop out of mcp_server.main() into mempalace/_stdio.py so the same helper can serve other Windows entry points (CLI, fact_checker) without copy-pasting the iteration. Behaviour unchanged: stdin=surrogateescape, stdout=strict, stderr=strict; on_failure callback routes failures through logger.warning as before.
93a921d to
2c4a08a
Compare
What does this PR do?
Fixes #363. On Windows, Python defaults
sys.stdinto the system ANSI codepage (cp1251, cp1252, etc.), not UTF-8. When an MCP client sends JSON containing non-ASCII characters, the bytes get decoded through the wrong codepage and produce mojibake with surrogate escapes. These strings pass Python type checks but crash the HuggingFace tokenizer inside ChromaDB withTypeError: TextInputSequence must be str in query.The fix adds
sys.stdin.reconfigure(encoding="utf-8")at the top ofmain(), guarded bysys.platform == "win32".stdoutandstderrget the same treatment so non-ASCII JSON responses serialize cleanly. On non-Windows platforms the code is a no-op.This effectively makes
mempalace_searchwork for every non-English speaker on Windows (Russian, Chinese, Hebrew, etc.), where it currently crashes on the very first query containing non-ASCII text.How to test
On Windows with a non-English locale:
On Linux/macOS: no behavior change (guard skips reconfigure).
ruff check mempalace/mcp_server.py python -c "import mempalace.mcp_server; print('OK')"Checklist
ruff check .)reconfigureavailable since 3.7)