fix(exec): decode Windows command output with codepage-aware streaming by vincentkoc · Pull Request #73751 · openclaw/openclaw

vincentkoc · 2026-04-28T18:48:59Z

Summary

Fix Windows exec output mojibake for CJK locales by decoding captured command output with the active Windows code page instead of assuming UTF-8.
Preserve decoder state across stdout/stderr chunks or decode buffered byte streams so split GBK/Shift_JIS characters are not replaced.
Add focused regression coverage for CP936/GBK Chinese output and split multibyte chunks.

Credit

This carries forward the useful approach from @slepybear's source PR #64661, which could not be safely updated directly because the branch is closed, maintainer_can_modify=false, checks failed, and bot review identified unresolved decoder-state issues.

Linked Issues

Fixes/addresses the Windows CJK exec mojibake family reported in #40613, #50519, and #56462.

Validation

pnpm -s vitest run src/infra/windows-encoding.test.ts src/process/exec.windows.test.ts
pnpm check:changed

Preflight Note

Hydrate the mentioned #72393 before opening a replacement PR. If #72393 is already a live maintainer-owned replacement with equivalent coverage and clean review gates, route this cluster to that PR instead of creating a duplicate.

ProjectClownfish replacement details:

Cluster: ghcrawl-156777-autonomous-smoke
Source PRs: fix(exec): apply Windows codepage-aware output decoding in runExec and spawn-based callers [AI-assisted] #64661
Credit: Credit @slepybear and source PR fix(exec): apply Windows codepage-aware output decoding in runExec and spawn-based callers [AI-assisted] #64661 for the original Windows codepage-aware decoding approach.; Mention that fix(exec): apply Windows codepage-aware output decoding in runExec and spawn-based callers [AI-assisted] #64661 could not be safely updated because it is closed, maintainer_can_modify=false, had relevant failing checks, and had actionable bot-review findings.; Preserve reporter context from [Bug]: Windows 11 exec/system-message Chinese output becomes mojibake #40613, [Bug]: Windows exec tool produces garbled Chinese characters due to hardcoded UTF-8 encoding #50519, and [Bug]: Windows exec tool garbled Chinese characters (confirmed in 2026.3.24) #56462 in the PR body.
Validation: pnpm -s vitest run src/infra/windows-encoding.test.ts src/process/exec.windows.test.ts; pnpm check:changed

aisle-research-bot · 2026-04-28T18:49:05Z

🔒 Aisle Security Analysis

We found 2 potential security issue(s) in this PR:

#	Severity	Title
1	🔵 Low	Possible execution of attacker-controlled cmd.exe via UNC SystemRoot in Windows codepage detection
2	🔵 Low	Possible execution of attacker-controlled cmd.exe via UNC SystemRoot in memory-host SDK Windows output decoder

1. 🔵 Possible execution of attacker-controlled cmd.exe via UNC SystemRoot in Windows codepage detection

Property	Value
Severity	Low
CWE	CWE-427
Location	`src/infra/windows-encoding.ts:43-81`

Description

resolveWindowsConsoleEncoding() spawns cmd.exe to run chcp, attempting to avoid PATH hijacking by constructing an absolute path from process.env.SystemRoot.

However, normalizeWindowsSystemRoot() accepts UNC absolute paths (e.g. \\server\\share\\Windows) because path.win32.isAbsolute() treats UNC paths as absolute and the subsequent checks only enforce that the last path segment is Windows and that its parent is the parsed root.

If an attacker can influence the process environment (e.g., launching OpenClaw from a tainted environment), they can set SystemRoot to a UNC share containing a malicious System32\\cmd.exe. The code will then execute that binary during encoding detection.

Vulnerable code:

const result = spawnSync(resolveTrustedWindowsCmdPath(), ["/d", "/s", "/c", "chcp"], ...);
...
if (!trimmed || trimmed.includes("\0") || !pathWin32.isAbsolute(trimmed)) {
  return null;
}
...
// UNC like \\server\share\Windows passes these checks
pathWin32.basename(normalized).toLowerCase() === "windows" &&
pathWin32.dirname(normalized).toLowerCase() === parsed.root.toLowerCase();

Recommendation

Harden normalizeWindowsSystemRoot() to only accept local drive roots (and reject UNC/device/extended-length paths), since this value is used to construct an executable path.

For example:

function normalizeWindowsSystemRoot(value: unknown): string | null {
  if (typeof value !== "string") return null;
  const trimmed = value.trim();
  if (!trimmed || trimmed.includes("\0")) return null;

  const normalized = pathWin32.normalize(trimmed);
  const parsed = pathWin32.parse(normalized);

  // Require local drive root like C:\
  if (!/^[A-Za-z]:\\$/.test(parsed.root)) return null;

  if (
    pathWin32.basename(normalized).toLowerCase() !== "windows" ||
    pathWin32.dirname(normalized).toLowerCase() !== parsed.root.toLowerCase()
  ) {
    return null;
  }
  return normalized;
}

Alternatively, avoid shelling out to cmd.exe for codepage detection (e.g., use Win32 APIs via a small native helper) to remove this execution sink entirely.

2. 🔵 Possible execution of attacker-controlled cmd.exe via UNC SystemRoot in memory-host SDK Windows output decoder

Property	Value
Severity	Low
CWE	CWE-427
Location	`packages/memory-host-sdk/src/host/windows-output-decoder.ts:116-155`

Description

The memory-host SDK’s Windows output decoder spawns cmd.exe (via spawnSync) to run chcp and infer the console codepage.

resolveTrustedWindowsCmdPath() derives the executable path from process.env.SystemRoot after normalizeWindowsSystemRoot() validation. That validation still allows UNC absolute paths (e.g. \\server\\share\\Windows), enabling execution of \\server\\share\\Windows\\System32\\cmd.exe if the environment is tainted.

Vulnerable code:

const result = spawnSync(resolveTrustedWindowsCmdPath(), ["/d", "/s", "/c", "chcp"], ...);

The intent is to avoid PATH hijacking, but UNC acceptance can reintroduce binary planting risk.

Recommendation

Reject UNC/device/extended-length roots when deriving an executable path from SystemRoot.

Example hardening:

const parsed = pathWin32.parse(normalized);
if (!/^[A-Za-z]:\\$/.test(parsed.root)) return null; // reject UNC like \\server\share\

Or remove the need to spawn cmd.exe by using a Windows API-based codepage lookup.

Analyzed PR: #73751 at commit 5bcb170

_{Last updated on: 2026-04-29T01:36:59Z}

clawsweeper · 2026-04-28T18:52:47Z

Codex review: keeping this open for maintainer follow-up; there is still a little grit to resolve.

Keep this PR open. Automated cleanup is blocked because the PR is member-authored and carries the protected maintainer label; beyond policy, current main only clearly covers the core exec/process path while the PR still targets local TUI and QMD command output surfaces that decode chunks as UTF-8 today.

Best possible solution:

Keep this PR open for maintainer review. The best path is to either land a hardened version that covers local TUI and QMD decoding with portable tests, or have a maintainer explicitly narrow the scope to the already-merged #72393 core exec fix and close this PR manually.

What I checked:

Protected maintainer PR: Provided GitHub context lists authorAssociation as MEMBER and labels include maintainer, which the cleanup policy says must stay open for explicit maintainer judgment. (5bcb17019215)
Core exec path already codepage-aware on current main: runExec requests Buffer output and decodes stdout/stderr through decodeWindowsOutputBuffer; runCommandWithTimeout buffers chunks and decodes the concatenated buffers before resolving. (src/process/exec.ts:142, 7e5c3753f6f7)
Core regression coverage exists: Current main has Windows regression tests for GBK stdout/stderr in runExec and split GBK chunks in runCommandWithTimeout. (src/process/exec.windows.test.ts:373, 7e5c3753f6f7)
Remaining local TUI surface: The local TUI shell still appends buf.toString("utf8") per stdout/stderr chunk, so current main has not implemented the PR's local-shell decoding target. (src/tui/tui-local-shell.ts:120, 7e5c3753f6f7)
Remaining QMD surface: runCliCommand in the memory-host SDK still decodes stdout/stderr chunks with data.toString("utf8"), so current main has not implemented the PR's QMD decoding target. (packages/memory-host-sdk/src/host/qmd-process.ts:128, 7e5c3753f6f7)
PR discussion has security-sensitive review findings: The Aisle review flagged low-severity command-resolution risk in the PR's Windows codepage detection changes: deriving cmd.exe from SystemRoot while accepting UNC-style roots could execute attacker-controlled binaries from a tainted environment. (src/infra/windows-encoding.ts, 5bcb17019215)

Likely related people:

vincentkoc: Related PR fix(process): decode Windows command output with console codepage awareness #72393, authored by vincentkoc and merged before this PR, introduced the current-main core Windows codepage-aware exec/process fix that this PR builds on; this is history beyond merely opening the current PR. (role: recent maintainer / merged replacement author; confidence: medium; commits: d1eedefc22a2; files: src/process/exec.ts, src/infra/windows-encoding.ts, src/process/exec.windows.test.ts)
Peter Steinberger steipete@gmail.com: Local git blame in this checkout attributes the relevant current-main lines in src/process/exec.ts, src/infra/windows-encoding.ts, src/tui/tui-local-shell.ts, and packages/memory-host-sdk/src/host/qmd-process.ts to the grafted commit 955b4df0; routing confidence is limited because the local history is compacted. (role: current checkout blame / recent maintainer; confidence: low; commits: 955b4df0939f; files: src/process/exec.ts, src/infra/windows-encoding.ts, src/tui/tui-local-shell.ts)
slepybear: The PR body credits fix(exec): apply Windows codepage-aware output decoding in runExec and spawn-based callers [AI-assisted] #64661 by slepybear for the original Windows codepage-aware decoding approach across exec, local shell, and QMD; useful for context, though that PR was closed unmerged. (role: adjacent source contributor; confidence: low; commits: 48818dce42b2; files: src/process/exec.ts, src/tui/tui-local-shell.ts, packages/memory-host-sdk/src/host/qmd-process.ts)

Remaining risk / open question:

Protected author association and maintainer label block automated closure even if a maintainer later decides fix(process): decode Windows command output with console codepage awareness #72393 is enough for the intended product scope.
The PR touches Windows command execution and codepage detection; Aisle flagged possible attacker-controlled cmd.exe execution through UNC SystemRoot handling in two proposed decoder helpers.
Current main still has local TUI and QMD UTF-8-per-chunk decoding, so closing this PR as implemented would drop potentially useful remaining coverage.
The PR appears to duplicate Windows output decoder logic into the memory-host SDK, which needs maintainer review for divergence and security hardening.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 7e5c3753f6f7.

greptile-apps · 2026-04-28T18:55:06Z

Greptile Summary

This PR adds stateful, codepage-aware stream decoders to exec.ts, tui-local-shell.ts, and qmd-process.ts, replacing the previous buffer-accumulate-then-UTF-8-decode pattern that mojibaked CJK output on GBK/Shift_JIS Windows consoles. The core decoder logic in windows-output-decoder.ts correctly handles split multibyte chunks and UTF-8 → legacy fallback.

P1: The new runCliCommand test in qmd-process.test.ts will fail on every non-Windows CI runner. resolveWindowsConsoleEncoding() returns null before calling spawnSync when process.platform !== \"win32\", so the spawnSyncMock setup is never exercised — decoders fall back to UTF-8, and the GBK bytes produce garbage instead of \"测试\". The fix used in tui-local-shell.test.ts (injecting createOutputDecoder: () => createWindowsOutputDecoder({ platform: \"win32\", windowsEncoding: \"gbk\" })) should be applied here by adding a decoder-factory injection point to runCliCommand's params.

Confidence Score: 3/5

Production code is sound but the qmd-process test will fail on non-Windows CI, blocking clean merge.

One P1 finding: the runCliCommand test in qmd-process.test.ts is broken on all non-Windows runners because resolveWindowsConsoleEncoding bypasses the spawnSyncMock via an early platform guard, making the GBK assertion impossible to pass. The runtime fix (stateful decoders) is correct; only the test infrastructure needs a decoder-factory injection seam like tui-local-shell already has.

packages/memory-host-sdk/src/host/qmd-process.test.ts and packages/memory-host-sdk/src/host/qmd-process.ts (missing decoder-factory injection)

Prompt To Fix All With AI

This is a comment left during a code review.
Path: packages/memory-host-sdk/src/host/qmd-process.test.ts
Line: 139-163

Comment:
**Test will always fail on non-Windows CI because `resolveWindowsConsoleEncoding` returns early**

`resolveWindowsConsoleEncoding()` in `windows-output-decoder.ts` guards behind `process.platform !== "win32"` and returns `null` before ever calling `spawnSync`. This means `spawnSyncMock.mockReturnValue({ stdout: "Active code page: 936", stderr: "" })` is never reached on Linux/macOS, `legacyDecoder` is set to `null`, and `decode()` falls through to `buffer.toString("utf8")`. The GBK bytes `[0xb2, 0xe2, 0xca, 0xd4]` produce replacement characters instead of `"测试"`, so the assertion fails on every non-Windows runner.

The sibling test in `tui-local-shell.test.ts` avoids this correctly by injecting `createOutputDecoder: () => createWindowsOutputDecoder({ platform: "win32", windowsEncoding: "gbk" })`. `runCliCommand` needs the same injection seam — add an optional decoder factory to its parameter type in `qmd-process.ts` and pass it into the decoders — or the test must pass explicit platform/encoding overrides before creating the decoders.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix(exec): decode Windows command output..." | Re-trigger Greptile}

greptile-apps · 2026-04-28T18:55:10Z

+  it("decodes Windows codepage output across split qmd stdout and stderr chunks", async () => {
+    spawnSyncMock.mockReturnValue({ stdout: "Active code page: 936", stderr: "" });
+    const child = createMockChild();
+    spawnMock.mockImplementationOnce(() => {
+      queueMicrotask(() => {
+        child.stdout.emit("data", Buffer.from([0xb2]));
+        child.stdout.emit("data", Buffer.from([0xe2, 0xca]));
+        child.stdout.emit("data", Buffer.from([0xd4]));
+        child.stderr.emit("data", Buffer.from([0xa3]));
+        child.stderr.emit("data", Buffer.from([0xbb]));
+        child.emit("close", 0);
+      });
+      return child;
+    });
+
+    await expect(
+      runCliCommand({
+        commandSummary: "qmd query",
+        spawnInvocation: { command: "qmd", argv: ["query"] },
+        env: process.env,
+        cwd: tempDir,
+        maxOutputChars: 1_000,
+      }),
+    ).resolves.toEqual({ stdout: "测试", stderr: "；" });
+  });


Test will always fail on non-Windows CI because resolveWindowsConsoleEncoding returns early

resolveWindowsConsoleEncoding() in windows-output-decoder.ts guards behind process.platform !== "win32" and returns null before ever calling spawnSync. This means spawnSyncMock.mockReturnValue({ stdout: "Active code page: 936", stderr: "" }) is never reached on Linux/macOS, legacyDecoder is set to null, and decode() falls through to buffer.toString("utf8"). The GBK bytes [0xb2, 0xe2, 0xca, 0xd4] produce replacement characters instead of "测试", so the assertion fails on every non-Windows runner.

The sibling test in tui-local-shell.test.ts avoids this correctly by injecting createOutputDecoder: () => createWindowsOutputDecoder({ platform: "win32", windowsEncoding: "gbk" }). runCliCommand needs the same injection seam — add an optional decoder factory to its parameter type in qmd-process.ts and pass it into the decoders — or the test must pass explicit platform/encoding overrides before creating the decoders.

Prompt To Fix With AI

This is a comment left during a code review. Path: packages/memory-host-sdk/src/host/qmd-process.test.ts Line: 139-163 Comment: **Test will always fail on non-Windows CI because `resolveWindowsConsoleEncoding` returns early** `resolveWindowsConsoleEncoding()` in `windows-output-decoder.ts` guards behind `process.platform !== "win32"` and returns `null` before ever calling `spawnSync`. This means `spawnSyncMock.mockReturnValue({ stdout: "Active code page: 936", stderr: "" })` is never reached on Linux/macOS, `legacyDecoder` is set to `null`, and `decode()` falls through to `buffer.toString("utf8")`. The GBK bytes `[0xb2, 0xe2, 0xca, 0xd4]` produce replacement characters instead of `"测试"`, so the assertion fails on every non-Windows runner. The sibling test in `tui-local-shell.test.ts` avoids this correctly by injecting `createOutputDecoder: () => createWindowsOutputDecoder({ platform: "win32", windowsEncoding: "gbk" })`. `runCliCommand` needs the same injection seam — add an optional decoder factory to its parameter type in `qmd-process.ts` and pass it into the decoders — or the test must pass explicit platform/encoding overrides before creating the decoders. How can I resolve this? If you propose a fix, please make it concise.

vincentkoc added clownfish:human-review labels Apr 28, 2026

openclaw-barnacle Bot added the size: M label Apr 28, 2026

vincentkoc added the clawsweeper Tracked by ClawSweeper automation label Apr 28, 2026

openclaw-barnacle Bot added the maintainer Maintainer-authored PR label Apr 28, 2026

greptile-apps Bot reviewed Apr 28, 2026

View reviewed changes

vincentkoc removed clownfish:human-review labels Apr 28, 2026

vincentkoc and others added 2 commits April 29, 2026 01:23

fix(exec): decode Windows command output with codepage-aware streams

512ef7f

fix(clownfish): address review for ghcrawl-156777-autonomous-smoke (1)

5bcb170

openclaw-clownfish Bot force-pushed the clownfish/ghcrawl-156777-autonomous-smoke branch from d7c85e0 to 5bcb170 Compare April 29, 2026 01:34

BradGroux mentioned this pull request Apr 29, 2026

WORKING: All Microsoft Issues and PRs (refresh) #74163

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(exec): decode Windows command output with codepage-aware streaming#73751

fix(exec): decode Windows command output with codepage-aware streaming#73751
vincentkoc wants to merge 2 commits intomainfrom
clownfish/ghcrawl-156777-autonomous-smoke

vincentkoc commented Apr 28, 2026

Uh oh!

aisle-research-bot Bot commented Apr 28, 2026 •

edited

Loading

Description

Recommendation

Description

Recommendation

Uh oh!

clawsweeper Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Apr 28, 2026

Uh oh!

greptile-apps Bot Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

vincentkoc commented Apr 28, 2026

Summary

Credit

Linked Issues

Validation

Preflight Note

Uh oh!

aisle-research-bot Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔒 Aisle Security Analysis

Description

Recommendation

Description

Recommendation

Uh oh!

clawsweeper Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented Apr 28, 2026

Greptile Summary

Confidence Score: 3/5

Uh oh!

greptile-apps Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aisle-research-bot Bot commented Apr 28, 2026 •

edited

Loading

clawsweeper Bot commented Apr 28, 2026 •

edited

Loading