fix: comprehensive Windows and CJK support for MCP server#512
fix: comprehensive Windows and CJK support for MCP server#512JerryLiu720 wants to merge 1 commit into
Conversation
web3guru888
left a comment
There was a problem hiding this comment.
Solid fix. The three-part approach is the right one:
stdin surrogateescape / stdout replace: This is the correct pairing. surrogateescape on input is tolerant — it lets you ingest bytes that aren't valid UTF-8 without crashing, and you can propagate or discard them later. replace on output is safe — it won't produce invalid JSON (lone surrogates in a JSON string will break parsers on the other end). The hasattr guard for non-buffered environments (tests, IDEs) is also good defensive practice.
ensure_ascii=False in both serialization points: Important to get both — if you fix one and miss the other you get inconsistent behavior. Catching both handle_request (tool result) and the main loop response write was the right call.
One minor note: in the MCP protocol, the JSON-RPC layer is supposed to be clean UTF-8. If errors='replace' substitutes a \uFFFD character somewhere inside a content string, that's technically correct from an encoding standpoint, but clients may not expect replacement characters in tool outputs. It might be worth a brief note in the PR description (or a comment in the code) explaining that replace is the fallback for unrecoverable surrogates and that clean UTF-8 content will pass through unmodified.
But that's a documentation concern, not a correctness issue. The behavior is right. This closes #503 cleanly.
|
@web3guru888 Thank you for the detailed review! Good point about the replace documentation. The code comment already explains that replace prevents lone surrogates from breaking JSON-RPC client parsers, and clean UTF-8 content passes through unmodified - only unrecoverable surrogates get replaced with �. Appreciate the confirmation that the approach is correct and that this closes #503 cleanly! |
|
Fair point — I missed that the code comment already explains the rationale. The logic is sound: |
b32b506 to
f619ae9
Compare
9696911 to
6aee219
Compare
This PR addresses Windows encoding issues and improves non-ASCII character handling throughout the MCP server. **Changes:** 1. **stdin/stdout UTF-8 enforcement (MemPalace#503)** - Windows defaults to cp936/GBK, causing crashes with CJK content - Wrapped stdin with 'surrogateescape' error handler (tolerant input) - Wrapped stdout with 'replace' error handler (safe output) - Added hasattr() checks for edge cases (IDE/test environments) 2. **MCP tool result serialization** - Added ensure_ascii=False to tool result json.dumps() (line 907) - Preserves Unicode characters in tool responses 3. **JSON-RPC response serialization (MemPalace#359)** - Added ensure_ascii=False to main loop json.dumps() (line 949) - Prevents Chinese characters from being escaped to \uXXXX **Technical Details:** - stdin uses 'surrogateescape': handles malformed input gracefully - stdout uses 'replace': prevents lone surrogates in JSON-RPC responses (per recommendation from MemPalace#503 discussion to avoid breaking client parsers) - All json.dumps() calls now preserve Unicode with ensure_ascii=False Fixes MemPalace#503 Related to MemPalace#359 (already has PR MemPalace#425 with tests) Made-with: Cursor
6aee219 to
4ecc085
Compare
|
Hi, thanks for the contribution. This PR has merge conflicts with Could you rebase onto If this change is no longer relevant, feel free to close the PR. (This message is part of a periodic backlog pass, sent to all open PRs that match this state.) |
Summary
Comprehensive fix for Windows encoding issues and improved non-ASCII character handling in the MCP server.
Problem
\uXXXXformat, making output hard to readSolution
This PR addresses three critical areas in
mcp_server.py:1. stdin/stdout UTF-8 enforcement (lines 925-947)
errors='surrogateescape'for tolerant input handlingerrors='replace'for safe output (prevents lone surrogates in JSON-RPC)hasattr()checks for edge cases (IDE/test environments)2. MCP tool result serialization (line 907)
ensure_ascii=Falseto preserve Unicode in tool responses3. JSON-RPC response serialization (line 961)
ensure_ascii=Falseto prevent character escapingRelationship to PR #400
This PR complements #400 but addresses different aspects:
PR #400:
reconfigure()witherrors="strict"This PR:
TextIOWrapperwith differentiated error handlerssurrogateescape(tolerant input, per Bug: MCP server fails with surrogate error on Windows when writing CJK content #503 discussion recommendations)replace(safe output, prevents JSON-RPC client crashes)ensure_ascii=False)Both approaches are valid, but this PR follows the error handling strategy recommended by experts in the #503 discussion thread.
Technical Details
surrogateescapeto handle malformed input gracefullyreplaceto prevent lone surrogates from breaking JSON-RPC client parsersjson.dumps()calls now preserve Unicode charactersTesting
python3 -m py_compileImpact
\uXXXXescapes)Related Issues
Checklist