Skip to content

fix(encoding): correct UTF-8 handling for accented characters#68

Merged
dbfx merged 4 commits intomainfrom
fix/utf8-encoding-accented-chars
Mar 24, 2026
Merged

fix(encoding): correct UTF-8 handling for accented characters#68
dbfx merged 4 commits intomainfrom
fix/utf8-encoding-accented-chars

Conversation

@dbfx
Copy link
Contributor

@dbfx dbfx commented Mar 24, 2026

Summary

  • Root cause: Windows PowerShell outputs UTF-16-LE and native CLI tools (reg.exe, netsh, pnputil, schtasks, sfc, dism) use the system's OEM code page (e.g. CP1252 for Western European). Node.js decodes stdout as UTF-8 by default, corrupting accented characters like "Système" → garbled text in drive labels, registry entries, and elsewhere.
  • Adds src/main/services/exec-utf8.ts — a central encoding utility with psUtf8() for PowerShell and execNativeUtf8() for native tools, including cmd.exe argument escaping to prevent shell injection from dynamic values (registry names, task paths, WiFi names, etc.)
  • Updates all 25 files across the main process that execute PowerShell or native Windows tools to use these helpers
  • Fixes ASCII-only regex patterns (/^[A-Za-z0-9...]/) to Unicode-aware (/^[\p{L}\p{N}...]/u) so accented display names in startup items and task names are no longer silently rejected

Changes by category

PowerShell UTF-8 (psUtf8() wrapper) — forces [Console]::OutputEncoding = UTF-8 before every PS command:

  • cli.ts, index.ts, ipc/index.ts, ipc/debloater.ipc.ts, ipc/disk-analyzer.ipc.ts, ipc/game-mode.ipc.ts, ipc/malware-scanner.ipc.ts, ipc/recycle-bin.ipc.ts, ipc/registry-cleaner.ipc.ts, ipc/service-manager.ipc.ts, ipc/shortcut-cleaner.ipc.ts, ipc/startup-manager.ipc.ts, platform/win32/commands.ts, platform/win32/security.ts, platform/win32/network.ts, services/cloud-agent.ts, services/perf-monitor.ts, services/program-uninstaller.ts, services/restore-point.ts, services/software-updater.ts, services/uninstall-leftovers.ts

Native tool UTF-8 (execNativeUtf8()) — runs via cmd /c chcp 65001 with escaped arguments:

  • reg.exe: registry-cleaner, startup-manager, privacy-shield, network-cleanup, malware-scanner, program-uninstaller, uninstall-leftovers
  • schtasks: index.ts, registry-cleaner, privacy-shield
  • pnputil: driver-manager
  • netsh: platform/win32/network

Streaming UTF-8 (StringDecoder) — prevents multi-byte character corruption from Buffer chunk splitting:

  • disk-analyzer.ipc.ts (SFC + DISM progress streaming)

Unicode regex — allows accented characters in display names/task names:

  • startup-manager.ipc.ts (display name validation, isSafeTaskName)
  • registry-cleaner.ipc.ts (SAFE_TASK_PATH_RE)

Test plan

  • Verify drive labels with accented characters display correctly on the dashboard storage overview
  • Verify registry cleaner scans/fixes work with entries containing accented characters
  • Verify startup manager lists items with non-ASCII names (e.g. "Données", "Système")
  • Verify WiFi profile names with spaces still display and can be deleted
  • Verify SFC/DISM disk repair progress streaming works without garbled output
  • Run npx tsc --noEmit — no new type errors introduced

🤖 Generated with Claude Code

…all tools

Windows PowerShell outputs UTF-16-LE and native tools (reg.exe, netsh,
pnputil, schtasks, sfc, dism) use the system's OEM code page (e.g.
CP1252). Node.js decodes stdout as UTF-8 by default, corrupting accented
characters like "Système" → garbled text in drive labels and elsewhere.

- Add exec-utf8.ts: central utility with psUtf8() for PowerShell and
  execNativeUtf8() for native tools (chcp 65001 + cmd.exe arg escaping)
- Wrap all PowerShell -Command calls with psUtf8() (25 files)
- Route all reg/schtasks/pnputil/netsh calls through execNativeUtf8()
- Fix SFC/DISM streaming with StringDecoder for multi-byte UTF-8 chunks
- Update ASCII-only regex patterns to Unicode-aware (\p{L}\p{N}/u) so
  accented display names are no longer silently rejected
- Harden execNativeUtf8 against cmd.exe shell injection by escaping
  metacharacters in dynamic arguments (registry names, task paths, etc.)

Closes #66

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fb12913d7a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Address PR review feedback:

P1 — execNativeUtf8 now detects % in arguments and falls back to direct
execFileAsync (no cmd.exe shell) to avoid %VAR% expansion corrupting
literal percent sequences like %APPDATA%\App\app.exe in registry values.
The chcp 65001 code-page switch is skipped for these calls, but % in
arguments occurs almost exclusively in write operations whose output is
plain ASCII.

P2 — Revert WiFi profile name validation to original (block only " and
control chars). Shell metacharacters like ( ) & are now safely handled
by cmdEscapeArg inside execNativeUtf8, so names like "Home (5G)" are
no longer incorrectly filtered out.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 96154d978d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Restructure execNativeUtf8 to pass arguments through temporary
environment variables (__KA0, __KA1, …) instead of concatenating
user-controlled data into the cmd.exe command string. The command
line now contains only hardcoded %__KAn% references, eliminating
the CodeQL "Uncontrolled command line" critical finding.

Also adds a tool whitelist and fixes 3 test files that were missing
vi.mock for the new exec-utf8 module.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f5cde65189

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Add /v:off flag to cmd.exe calls in execNativeUtf8 to explicitly
disable delayed expansion. Without this, systems with cmd /v:on
defaults would re-expand ! characters in environment variable
values, potentially corrupting Wi-Fi names, registry data, or
other arguments containing exclamation marks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dbfx dbfx merged commit 38f3e2d into main Mar 24, 2026
8 checks passed
@dbfx dbfx deleted the fix/utf8-encoding-accented-chars branch March 24, 2026 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant