fix: remove Unicode Private Use Area characters to prevent rendering issues#120
Merged
Jeomon merged 1 commit intoCursorTouch:mainfrom Mar 22, 2026
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a text-sanitization utility to strip Unicode Private Use Area (PUA) characters from Snapshot UI text, preventing client-side rendering artifacts and reducing token waste in downstream LLM usage.
Changes:
- Added
remove_private_use_chars()(with a precompiled regex covering BMP + Supplementary PUA-A/B ranges) todesktop.utils. - Applied PUA stripping to Snapshot’s
interactive_elementsandscrollable_elementsstrings before building the response. - Added unit tests for all PUA ranges and verified non-PUA Unicode is preserved; adjusted a forward-ref/union type annotation style in
TreeState.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
src/windows_mcp/desktop/utils.py |
Adds the PUA-stripping helper + exports it via __all__. |
src/windows_mcp/tools/_snapshot_helpers.py |
Sanitizes interactive/scrollable element text before formatting the Snapshot response. |
src/windows_mcp/tree/views.py |
Updates optional forward-ref annotations to the project’s preferred quoted-union style. |
tests/test_desktop_utils.py |
Adds targeted tests for BMP + Supplementary PUA ranges and mixed/unicode-preservation cases. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Jeomon
approved these changes
Mar 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
As discussed in #117, some applications (e.g. VS Code) embed Unicode Private Use Area (PUA) characters (
U+E000..U+F8FF,U+F0000..U+FFFFD,U+100000..U+10FFFD) in the Automation ElementNameproperty of certain UI elements — for instance, icon-font glyphs in the navigation bar.These characters serve no informational purpose in the Snapshot output but can:
This PR adds a
remove_private_use_charsutility that strips all PUA characters from the interactive and scrollable element text before returning Snapshot results.Changes
src/windows_mcp/desktop/utils.py— Addremove_private_use_chars()with a pre-compiled regex covering all three Unicode PUA blocks; update module docstring and__all__src/windows_mcp/tools/_snapshot_helpers.py— Applyremove_private_use_charstointeractive_elementsandscrollable_elementsinbuild_snapshot_responsesrc/windows_mcp/tree/views.py— Fix type annotation style ("X" | None→"X | None")tests/test_desktop_utils.py— Add tests covering BMP PUA, Supplementary PUA-A/B, mixed content, and non-PUA Unicode preservation