Skip to content

fix(exec): decode Windows command output with codepage-aware streaming#73751

Open
vincentkoc wants to merge 2 commits intomainfrom
clownfish/ghcrawl-156777-autonomous-smoke
Open

fix(exec): decode Windows command output with codepage-aware streaming#73751
vincentkoc wants to merge 2 commits intomainfrom
clownfish/ghcrawl-156777-autonomous-smoke

Conversation

@vincentkoc
Copy link
Copy Markdown
Member

Summary

  • Fix Windows exec output mojibake for CJK locales by decoding captured command output with the active Windows code page instead of assuming UTF-8.
  • Preserve decoder state across stdout/stderr chunks or decode buffered byte streams so split GBK/Shift_JIS characters are not replaced.
  • Add focused regression coverage for CP936/GBK Chinese output and split multibyte chunks.

Credit

This carries forward the useful approach from @slepybear's source PR #64661, which could not be safely updated directly because the branch is closed, maintainer_can_modify=false, checks failed, and bot review identified unresolved decoder-state issues.

Linked Issues

Fixes/addresses the Windows CJK exec mojibake family reported in #40613, #50519, and #56462.

Validation

  • pnpm -s vitest run src/infra/windows-encoding.test.ts src/process/exec.windows.test.ts
  • pnpm check:changed

Preflight Note

Hydrate the mentioned #72393 before opening a replacement PR. If #72393 is already a live maintainer-owned replacement with equivalent coverage and clean review gates, route this cluster to that PR instead of creating a duplicate.

ProjectClownfish replacement details:

@aisle-research-bot
Copy link
Copy Markdown

aisle-research-bot Bot commented Apr 28, 2026

🔒 Aisle Security Analysis

We found 2 potential security issue(s) in this PR:

# Severity Title
1 🔵 Low Possible execution of attacker-controlled cmd.exe via UNC SystemRoot in Windows codepage detection
2 🔵 Low Possible execution of attacker-controlled cmd.exe via UNC SystemRoot in memory-host SDK Windows output decoder
1. 🔵 Possible execution of attacker-controlled cmd.exe via UNC SystemRoot in Windows codepage detection
Property Value
Severity Low
CWE CWE-427
Location src/infra/windows-encoding.ts:43-81

Description

resolveWindowsConsoleEncoding() spawns cmd.exe to run chcp, attempting to avoid PATH hijacking by constructing an absolute path from process.env.SystemRoot.

However, normalizeWindowsSystemRoot() accepts UNC absolute paths (e.g. \\server\\share\\Windows) because path.win32.isAbsolute() treats UNC paths as absolute and the subsequent checks only enforce that the last path segment is Windows and that its parent is the parsed root.

If an attacker can influence the process environment (e.g., launching OpenClaw from a tainted environment), they can set SystemRoot to a UNC share containing a malicious System32\\cmd.exe. The code will then execute that binary during encoding detection.

Vulnerable code:

const result = spawnSync(resolveTrustedWindowsCmdPath(), ["/d", "/s", "/c", "chcp"], ...);
...
if (!trimmed || trimmed.includes("\0") || !pathWin32.isAbsolute(trimmed)) {
  return null;
}
...
​// UNC like \\server\share\Windows passes these checks
pathWin32.basename(normalized).toLowerCase() === "windows" &&
pathWin32.dirname(normalized).toLowerCase() === parsed.root.toLowerCase();

Recommendation

Harden normalizeWindowsSystemRoot() to only accept local drive roots (and reject UNC/device/extended-length paths), since this value is used to construct an executable path.

For example:

function normalizeWindowsSystemRoot(value: unknown): string | null {
  if (typeof value !== "string") return null;
  const trimmed = value.trim();
  if (!trimmed || trimmed.includes("\0")) return null;

  const normalized = pathWin32.normalize(trimmed);
  const parsed = pathWin32.parse(normalized);// Require local drive root like C:\
  if (!/^[A-Za-z]:\\$/.test(parsed.root)) return null;

  if (
    pathWin32.basename(normalized).toLowerCase() !== "windows" ||
    pathWin32.dirname(normalized).toLowerCase() !== parsed.root.toLowerCase()
  ) {
    return null;
  }
  return normalized;
}

Alternatively, avoid shelling out to cmd.exe for codepage detection (e.g., use Win32 APIs via a small native helper) to remove this execution sink entirely.

2. 🔵 Possible execution of attacker-controlled cmd.exe via UNC SystemRoot in memory-host SDK Windows output decoder
Property Value
Severity Low
CWE CWE-427
Location packages/memory-host-sdk/src/host/windows-output-decoder.ts:116-155

Description

The memory-host SDK’s Windows output decoder spawns cmd.exe (via spawnSync) to run chcp and infer the console codepage.

resolveTrustedWindowsCmdPath() derives the executable path from process.env.SystemRoot after normalizeWindowsSystemRoot() validation. That validation still allows UNC absolute paths (e.g. \\server\\share\\Windows), enabling execution of \\server\\share\\Windows\\System32\\cmd.exe if the environment is tainted.

Vulnerable code:

const result = spawnSync(resolveTrustedWindowsCmdPath(), ["/d", "/s", "/c", "chcp"], ...);

The intent is to avoid PATH hijacking, but UNC acceptance can reintroduce binary planting risk.

Recommendation

Reject UNC/device/extended-length roots when deriving an executable path from SystemRoot.

Example hardening:

const parsed = pathWin32.parse(normalized);
if (!/^[A-Za-z]:\\$/.test(parsed.root)) return null; // reject UNC like \\server\share\

Or remove the need to spawn cmd.exe by using a Windows API-based codepage lookup.


Analyzed PR: #73751 at commit 5bcb170

Last updated on: 2026-04-29T01:36:59Z

@vincentkoc vincentkoc added the clawsweeper Tracked by ClawSweeper automation label Apr 28, 2026
@openclaw-barnacle openclaw-barnacle Bot added the maintainer Maintainer-authored PR label Apr 28, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Apr 28, 2026

Codex review: keeping this open for maintainer follow-up; there is still a little grit to resolve.

Keep this PR open. Automated cleanup is blocked because the PR is member-authored and carries the protected maintainer label; beyond policy, current main only clearly covers the core exec/process path while the PR still targets local TUI and QMD command output surfaces that decode chunks as UTF-8 today.

Best possible solution:

Keep this PR open for maintainer review. The best path is to either land a hardened version that covers local TUI and QMD decoding with portable tests, or have a maintainer explicitly narrow the scope to the already-merged #72393 core exec fix and close this PR manually.

What I checked:

  • Protected maintainer PR: Provided GitHub context lists authorAssociation as MEMBER and labels include maintainer, which the cleanup policy says must stay open for explicit maintainer judgment. (5bcb17019215)
  • Core exec path already codepage-aware on current main: runExec requests Buffer output and decodes stdout/stderr through decodeWindowsOutputBuffer; runCommandWithTimeout buffers chunks and decodes the concatenated buffers before resolving. (src/process/exec.ts:142, 7e5c3753f6f7)
  • Core regression coverage exists: Current main has Windows regression tests for GBK stdout/stderr in runExec and split GBK chunks in runCommandWithTimeout. (src/process/exec.windows.test.ts:373, 7e5c3753f6f7)
  • Remaining local TUI surface: The local TUI shell still appends buf.toString("utf8") per stdout/stderr chunk, so current main has not implemented the PR's local-shell decoding target. (src/tui/tui-local-shell.ts:120, 7e5c3753f6f7)
  • Remaining QMD surface: runCliCommand in the memory-host SDK still decodes stdout/stderr chunks with data.toString("utf8"), so current main has not implemented the PR's QMD decoding target. (packages/memory-host-sdk/src/host/qmd-process.ts:128, 7e5c3753f6f7)
  • PR discussion has security-sensitive review findings: The Aisle review flagged low-severity command-resolution risk in the PR's Windows codepage detection changes: deriving cmd.exe from SystemRoot while accepting UNC-style roots could execute attacker-controlled binaries from a tainted environment. (src/infra/windows-encoding.ts, 5bcb17019215)

Likely related people:

  • vincentkoc: Related PR fix(process): decode Windows command output with console codepage awareness #72393, authored by vincentkoc and merged before this PR, introduced the current-main core Windows codepage-aware exec/process fix that this PR builds on; this is history beyond merely opening the current PR. (role: recent maintainer / merged replacement author; confidence: medium; commits: d1eedefc22a2; files: src/process/exec.ts, src/infra/windows-encoding.ts, src/process/exec.windows.test.ts)
  • Peter Steinberger steipete@gmail.com: Local git blame in this checkout attributes the relevant current-main lines in src/process/exec.ts, src/infra/windows-encoding.ts, src/tui/tui-local-shell.ts, and packages/memory-host-sdk/src/host/qmd-process.ts to the grafted commit 955b4df0; routing confidence is limited because the local history is compacted. (role: current checkout blame / recent maintainer; confidence: low; commits: 955b4df0939f; files: src/process/exec.ts, src/infra/windows-encoding.ts, src/tui/tui-local-shell.ts)
  • slepybear: The PR body credits fix(exec): apply Windows codepage-aware output decoding in runExec and spawn-based callers [AI-assisted] #64661 by slepybear for the original Windows codepage-aware decoding approach across exec, local shell, and QMD; useful for context, though that PR was closed unmerged. (role: adjacent source contributor; confidence: low; commits: 48818dce42b2; files: src/process/exec.ts, src/tui/tui-local-shell.ts, packages/memory-host-sdk/src/host/qmd-process.ts)

Remaining risk / open question:

  • Protected author association and maintainer label block automated closure even if a maintainer later decides fix(process): decode Windows command output with console codepage awareness #72393 is enough for the intended product scope.
  • The PR touches Windows command execution and codepage detection; Aisle flagged possible attacker-controlled cmd.exe execution through UNC SystemRoot handling in two proposed decoder helpers.
  • Current main still has local TUI and QMD UTF-8-per-chunk decoding, so closing this PR as implemented would drop potentially useful remaining coverage.
  • The PR appears to duplicate Windows output decoder logic into the memory-host SDK, which needs maintainer review for divergence and security hardening.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 7e5c3753f6f7.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 28, 2026

Greptile Summary

This PR adds stateful, codepage-aware stream decoders to exec.ts, tui-local-shell.ts, and qmd-process.ts, replacing the previous buffer-accumulate-then-UTF-8-decode pattern that mojibaked CJK output on GBK/Shift_JIS Windows consoles. The core decoder logic in windows-output-decoder.ts correctly handles split multibyte chunks and UTF-8 → legacy fallback.

  • P1: The new runCliCommand test in qmd-process.test.ts will fail on every non-Windows CI runner. resolveWindowsConsoleEncoding() returns null before calling spawnSync when process.platform !== \"win32\", so the spawnSyncMock setup is never exercised — decoders fall back to UTF-8, and the GBK bytes produce garbage instead of \"测试\". The fix used in tui-local-shell.test.ts (injecting createOutputDecoder: () => createWindowsOutputDecoder({ platform: \"win32\", windowsEncoding: \"gbk\" })) should be applied here by adding a decoder-factory injection point to runCliCommand's params.

Confidence Score: 3/5

Production code is sound but the qmd-process test will fail on non-Windows CI, blocking clean merge.

One P1 finding: the runCliCommand test in qmd-process.test.ts is broken on all non-Windows runners because resolveWindowsConsoleEncoding bypasses the spawnSyncMock via an early platform guard, making the GBK assertion impossible to pass. The runtime fix (stateful decoders) is correct; only the test infrastructure needs a decoder-factory injection seam like tui-local-shell already has.

packages/memory-host-sdk/src/host/qmd-process.test.ts and packages/memory-host-sdk/src/host/qmd-process.ts (missing decoder-factory injection)

Prompt To Fix All With AI
This is a comment left during a code review.
Path: packages/memory-host-sdk/src/host/qmd-process.test.ts
Line: 139-163

Comment:
**Test will always fail on non-Windows CI because `resolveWindowsConsoleEncoding` returns early**

`resolveWindowsConsoleEncoding()` in `windows-output-decoder.ts` guards behind `process.platform !== "win32"` and returns `null` before ever calling `spawnSync`. This means `spawnSyncMock.mockReturnValue({ stdout: "Active code page: 936", stderr: "" })` is never reached on Linux/macOS, `legacyDecoder` is set to `null`, and `decode()` falls through to `buffer.toString("utf8")`. The GBK bytes `[0xb2, 0xe2, 0xca, 0xd4]` produce replacement characters instead of `"测试"`, so the assertion fails on every non-Windows runner.

The sibling test in `tui-local-shell.test.ts` avoids this correctly by injecting `createOutputDecoder: () => createWindowsOutputDecoder({ platform: "win32", windowsEncoding: "gbk" })`. `runCliCommand` needs the same injection seam — add an optional decoder factory to its parameter type in `qmd-process.ts` and pass it into the decoders — or the test must pass explicit platform/encoding overrides before creating the decoders.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(exec): decode Windows command output..." | Re-trigger Greptile

Comment on lines +139 to +163
it("decodes Windows codepage output across split qmd stdout and stderr chunks", async () => {
spawnSyncMock.mockReturnValue({ stdout: "Active code page: 936", stderr: "" });
const child = createMockChild();
spawnMock.mockImplementationOnce(() => {
queueMicrotask(() => {
child.stdout.emit("data", Buffer.from([0xb2]));
child.stdout.emit("data", Buffer.from([0xe2, 0xca]));
child.stdout.emit("data", Buffer.from([0xd4]));
child.stderr.emit("data", Buffer.from([0xa3]));
child.stderr.emit("data", Buffer.from([0xbb]));
child.emit("close", 0);
});
return child;
});

await expect(
runCliCommand({
commandSummary: "qmd query",
spawnInvocation: { command: "qmd", argv: ["query"] },
env: process.env,
cwd: tempDir,
maxOutputChars: 1_000,
}),
).resolves.toEqual({ stdout: "测试", stderr: ";" });
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Test will always fail on non-Windows CI because resolveWindowsConsoleEncoding returns early

resolveWindowsConsoleEncoding() in windows-output-decoder.ts guards behind process.platform !== "win32" and returns null before ever calling spawnSync. This means spawnSyncMock.mockReturnValue({ stdout: "Active code page: 936", stderr: "" }) is never reached on Linux/macOS, legacyDecoder is set to null, and decode() falls through to buffer.toString("utf8"). The GBK bytes [0xb2, 0xe2, 0xca, 0xd4] produce replacement characters instead of "测试", so the assertion fails on every non-Windows runner.

The sibling test in tui-local-shell.test.ts avoids this correctly by injecting createOutputDecoder: () => createWindowsOutputDecoder({ platform: "win32", windowsEncoding: "gbk" }). runCliCommand needs the same injection seam — add an optional decoder factory to its parameter type in qmd-process.ts and pass it into the decoders — or the test must pass explicit platform/encoding overrides before creating the decoders.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/memory-host-sdk/src/host/qmd-process.test.ts
Line: 139-163

Comment:
**Test will always fail on non-Windows CI because `resolveWindowsConsoleEncoding` returns early**

`resolveWindowsConsoleEncoding()` in `windows-output-decoder.ts` guards behind `process.platform !== "win32"` and returns `null` before ever calling `spawnSync`. This means `spawnSyncMock.mockReturnValue({ stdout: "Active code page: 936", stderr: "" })` is never reached on Linux/macOS, `legacyDecoder` is set to `null`, and `decode()` falls through to `buffer.toString("utf8")`. The GBK bytes `[0xb2, 0xe2, 0xca, 0xd4]` produce replacement characters instead of `"测试"`, so the assertion fails on every non-Windows runner.

The sibling test in `tui-local-shell.test.ts` avoids this correctly by injecting `createOutputDecoder: () => createWindowsOutputDecoder({ platform: "win32", windowsEncoding: "gbk" })`. `runCliCommand` needs the same injection seam — add an optional decoder factory to its parameter type in `qmd-process.ts` and pass it into the decoders — or the test must pass explicit platform/encoding overrides before creating the decoders.

How can I resolve this? If you propose a fix, please make it concise.

@openclaw-clownfish openclaw-clownfish Bot force-pushed the clownfish/ghcrawl-156777-autonomous-smoke branch from d7c85e0 to 5bcb170 Compare April 29, 2026 01:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clawsweeper Tracked by ClawSweeper automation maintainer Maintainer-authored PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant