Skip to content

fix: remove Unicode Private Use Area characters to prevent rendering issues#120

Merged
Jeomon merged 1 commit intoCursorTouch:mainfrom
JezaChen:fix/remove_unicode_private_use_area_characters
Mar 22, 2026
Merged

fix: remove Unicode Private Use Area characters to prevent rendering issues#120
Jeomon merged 1 commit intoCursorTouch:mainfrom
JezaChen:fix/remove_unicode_private_use_area_characters

Conversation

@JezaChen
Copy link
Copy Markdown
Collaborator

Summary

As discussed in #117, some applications (e.g. VS Code) embed Unicode Private Use Area (PUA) characters (U+E000..U+F8FF, U+F0000..U+FFFFD, U+100000..U+10FFFD) in the Automation Element Name property of certain UI elements — for instance, icon-font glyphs in the navigation bar.

These characters serve no informational purpose in the Snapshot output but can:

  • Cause rendering issues on the client side (displayed as tofu/replacement glyphs)
  • Waste tokens when sent to LLMs

This PR adds a remove_private_use_chars utility that strips all PUA characters from the interactive and scrollable element text before returning Snapshot results.

Changes

  • src/windows_mcp/desktop/utils.py — Add remove_private_use_chars() with a pre-compiled regex covering all three Unicode PUA blocks; update module docstring and __all__
  • src/windows_mcp/tools/_snapshot_helpers.py — Apply remove_private_use_chars to interactive_elements and scrollable_elements in build_snapshot_response
  • src/windows_mcp/tree/views.py — Fix type annotation style ("X" | None"X | None")
  • tests/test_desktop_utils.py — Add tests covering BMP PUA, Supplementary PUA-A/B, mixed content, and non-PUA Unicode preservation

Copilot AI review requested due to automatic review settings March 22, 2026 08:19
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a text-sanitization utility to strip Unicode Private Use Area (PUA) characters from Snapshot UI text, preventing client-side rendering artifacts and reducing token waste in downstream LLM usage.

Changes:

  • Added remove_private_use_chars() (with a precompiled regex covering BMP + Supplementary PUA-A/B ranges) to desktop.utils.
  • Applied PUA stripping to Snapshot’s interactive_elements and scrollable_elements strings before building the response.
  • Added unit tests for all PUA ranges and verified non-PUA Unicode is preserved; adjusted a forward-ref/union type annotation style in TreeState.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
src/windows_mcp/desktop/utils.py Adds the PUA-stripping helper + exports it via __all__.
src/windows_mcp/tools/_snapshot_helpers.py Sanitizes interactive/scrollable element text before formatting the Snapshot response.
src/windows_mcp/tree/views.py Updates optional forward-ref annotations to the project’s preferred quoted-union style.
tests/test_desktop_utils.py Adds targeted tests for BMP + Supplementary PUA ranges and mixed/unicode-preservation cases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Jeomon Jeomon merged commit 38c74d0 into CursorTouch:main Mar 22, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants