Skip to content

fix(process): decode Windows command output with console codepage awareness#72393

Merged
vincentkoc merged 2 commits intomainfrom
clownfish/ghcrawl-199248-agentic-merge
Apr 27, 2026
Merged

fix(process): decode Windows command output with console codepage awareness#72393
vincentkoc merged 2 commits intomainfrom
clownfish/ghcrawl-199248-agentic-merge

Conversation

@vincentkoc
Copy link
Copy Markdown
Member

Summary

  • Fix Windows CJK mojibake for exec output by preserving raw bytes and decoding with a Windows console-codepage-aware helper.
  • Prefer valid UTF-8 output when tools emit UTF-8 on Windows, and only fall back to detected console encodings when needed.
  • Avoid per-chunk multibyte corruption by using a stateful decoder or whole-buffer decode for chunked output paths.

Credit

This replacement carries forward and credits ideas and tests from #43611 by @iready, #50586 by @kevinten10, #50885 by @zhangyongjie1997, #56538 by @knightplat-blip / heiqishi666, and #64661/#64709 by @slepybear.

Validation

  • pnpm check:changed

Fixes #50519. Supersedes #43611, #50586, #50885, #56538, #64661, and #64709 after this PR lands and validation passes.

ProjectClownfish replacement details:

@aisle-research-bot
Copy link
Copy Markdown

aisle-research-bot Bot commented Apr 26, 2026

🔒 Aisle Security Analysis

We found 3 potential security issue(s) in this PR:

# Severity Title
1 🟠 High Windows search-order hijack and potential hang in console codepage detection via spawnSync("cmd.exe")
2 🟡 Medium Unbounded stdout/stderr buffering in runCommandWithTimeout can cause memory exhaustion
3 🟡 Medium Terminal/log escape injection via unsanitized decoded child-process output in verbose logging
1. 🟠 Windows search-order hijack and potential hang in console codepage detection via spawnSync("cmd.exe")
Property Value
Severity High
CWE CWE-427
Location src/infra/windows-encoding.ts:39-43

Description

resolveWindowsConsoleEncoding() executes cmd.exe using a non-absolute executable name and without a timeout/maxBuffer.

Impacts:

  • Binary planting / search-order hijack (RCE): On Windows, CreateProcess resolves cmd.exe via the standard search order (application directory, current working directory, system directories, then PATH). If an attacker can influence the working directory or place a malicious cmd.exe earlier in the search order, the library may execute attacker-controlled code.
  • Denial of service: spawnSync() is blocking. If the resolved cmd.exe (or chcp) hangs (e.g., via malicious binary, policy restrictions, or environmental issues), the process can block indefinitely because no timeout is set.

Vulnerable code:

const result = spawnSync("cmd.exe", ["/d", "/s", "/c", "chcp"], {
  windowsHide: true,
  encoding: "utf8",
  stdio: ["ignore", "pipe", "pipe"],
});

Recommendation

Avoid search-order resolution and add a bounded execution time.

  • Prefer an absolute path to cmd.exe (e.g. via process.env.ComSpec with validation, or %SystemRoot%\\System32\\cmd.exe).
  • Add a timeout and a conservative maxBuffer.

Example fix:

import path from "node:path";

function getSystemCmdExe(): string {
  const systemRoot = process.env.SystemRoot || process.env.windir;
  if (systemRoot) {
    return path.join(systemRoot, "System32", "cmd.exe");
  }// Fallback (still better than relying on CWD); consider hard-failing if missing.
  return process.env.ComSpec || "cmd.exe";
}

const result = spawnSync(getSystemCmdExe(), ["/d", "/s", "/c", "chcp"], {
  windowsHide: true,
  encoding: "utf8",
  stdio: ["ignore", "pipe", "pipe"],
  timeout: 2000,
  maxBuffer: 64 * 1024,
});

If this code can run in elevated/service contexts, consider also setting a safe cwd and sanitizing PATH for this specific call.

2. 🟡 Unbounded stdout/stderr buffering in runCommandWithTimeout can cause memory exhaustion
Property Value
Severity Medium
CWE CWE-400
Location src/process/exec.ts:300-435

Description

The runCommandWithTimeout() helper accumulates all child process stdout/stderr into in-memory Buffer[] arrays and concatenates them at process completion.

  • The child output is appended without any size limit (stdoutChunks.push(...), stderrChunks.push(...)).
  • On completion, Buffer.concat(stdoutChunks) and Buffer.concat(stderrChunks) allocate contiguous buffers sized to the entire output.
  • There is no maxBuffer/output cap, truncation, or early termination when output grows too large.

If an attacker can influence the executed command (or cause a spawned tool/plugin to emit大量 output), they can produce arbitrarily large stdout/stderr quickly and trigger high memory usage / out-of-memory termination (denial of service).

Vulnerable code:

child.stdout?.on("data", (d) => {
  stdoutChunks.push(Buffer.isBuffer(d) ? d : Buffer.from(d));
});
...
stdout: decodeWindowsOutputBuffer({
  buffer: Buffer.concat(stdoutChunks),
  windowsEncoding,
}),

Recommendation

Introduce an explicit output cap (similar to Node's execFile maxBuffer) and stop buffering once the limit is exceeded.

Example approach:

const maxBuffer = options.maxBuffer ?? 1024 * 1024; // 1 MiB default (pick an appropriate default)
let stdoutLen = 0;
let stderrLen = 0;

child.stdout?.on("data", (d: Buffer) => {
  const b = Buffer.isBuffer(d) ? d : Buffer.from(d);
  stdoutLen += b.length;
  if (stdoutLen > maxBuffer) {
    child.kill("SIGKILL");
    return;
  }
  stdoutChunks.push(b);
});

child.stderr?.on("data", (d: Buffer) => {
  const b = Buffer.isBuffer(d) ? d : Buffer.from(d);
  stderrLen += b.length;
  if (stderrLen > maxBuffer) {
    child.kill("SIGKILL");
    return;
  }
  stderrChunks.push(b);
});

Alternatively, stream output to a temporary file or expose a streaming API/callback instead of buffering all output in memory.

3. 🟡 Terminal/log escape injection via unsanitized decoded child-process output in verbose logging
Property Value
Severity Medium
CWE CWE-117
Location src/process/exec.ts:156-165

Description

The runExec function logs decoded stdout/stderr from executed commands directly when verbose logging is enabled. After this change, output is decoded using Windows code pages (e.g., GBK/Shift-JIS) via decodeWindowsOutputBuffer, which can turn previously-invalid UTF-8 byte sequences into valid text that includes control characters.

This allows a child process (or any untrusted command being executed) to inject:

  • ANSI escape sequences (e.g., \x1b]... / \x1b[...m) that may manipulate the terminal (clear screen, move cursor, change colors, set window title, etc.)
  • Control characters/newlines that can forge/mislead log output (log injection)

Vulnerable code paths:

  • decodedStdout/decodedStderr are derived from raw command output buffers
  • Values are passed to logDebug / logError without neutralizing control characters

Vulnerable code:

const decodedStdout = decodeWindowsOutputBuffer({ buffer: stdout, windowsEncoding });
const decodedStderr = decodeWindowsOutputBuffer({ buffer: stderr, windowsEncoding });
if (shouldLogVerbose()) {
  if (decodedStdout.trim()) {
    logDebug(decodedStdout.trim());
  }
  if (decodedStderr.trim()) {
    logError(decodedStderr.trim());
  }
}

Recommendation

Neutralize control characters / ANSI escape sequences before emitting child-process output to terminals or persistent logs.

Option A (safe-by-default for logs): escape non-printables and newlines.

function sanitizeForLog(s: string): string {// Remove ANSI escapes and other C0 controls except \n/\t if desired.// Consider a well-tested lib like `strip-ansi` for ANSI removal.
  const withoutAnsi = s.replace(/\x1B\[[0-?]*[ -/]*[@-~]/g, "");// Replace other control chars (including CR) to prevent log forging.
  return withoutAnsi.replace(/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F\r]/g, (c) =>
    `\\x${c.charCodeAt(0).toString(16).padStart(2, "0")}`,
  );
}

logDebug(sanitizeForLog(decodedStdout.trim()));
logError(sanitizeForLog(decodedStderr.trim()));

Option B (terminal-focused): strip ANSI escapes and disallow OSC sequences too (window-title, hyperlinks), then emit.

Also consider logging output with clear delimiters (prefix each line) to mitigate log forging.


Analyzed PR: #72393 at commit d1eedef

Last updated on: 2026-04-27T06:08:56Z

@openclaw-barnacle openclaw-barnacle Bot added size: L maintainer Maintainer-authored PR labels Apr 26, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 26, 2026

Greptile Summary

This PR refactors Windows console encoding detection and CJK mojibake repair into a dedicated src/infra/windows-encoding.ts module, then threads it through runExec, runCommandWithTimeout, and the supervisor ChildAdapter. The buffered-output paths collect raw bytes and decode once with UTF-8-first / codepage-fallback semantics; the streaming ChildAdapter path uses a new stateful createWindowsOutputDecoder that handles multi-byte sequences split across chunks.

Confidence Score: 4/5

Safe to merge; all findings are P2 style/robustness suggestions with no present defects on the changed paths.

The encoding logic is well-tested with both unit and integration tests, the stateful-decoder invariant holds for all exercised cases, and the refactoring correctly removes the duplicate implementation from invoke.ts. Only P2 concerns remain: a redundant undefined guard, an implicit invariant in the streaming decoder that should be commented, a pre-existing regex ordering issue in parseWindowsCodePage, and a shared-state footgun in the adapter if onStdout is ever called more than once.

src/infra/windows-encoding.ts (streaming decoder invariant comment) and src/process/supervisor/adapters/child.ts (shared decoder state)

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/infra/windows-encoding.ts
Line: 145-151

Comment:
**Redundant undefined guard in `getTrailingIncompleteUtf8Bytes` loop**

The `buffer[index] !== undefined` check in the while condition is redundant — the `index >= 0` guard already ensures the access is in bounds for a `Buffer`, and TypeScript types `Buffer` indexed access as `number` (not `number | undefined`). Removing it makes the intent clearer.

```suggestion
  while (
    index >= 0 &&
    buffer[index] >= 0x80 &&
    buffer[index] <= 0xbf &&
    continuationBytes < 3
  ) {
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/infra/windows-encoding.ts
Line: 111-114

Comment:
**`decode` passes raw chunk, not `replayBuffer`, to `utf8Decoder`**

Inside the `try` block, `utf8Decoder.decode(buffer, { stream: true })` receives only the new chunk, while `getTrailingIncompleteUtf8Bytes(replayBuffer)` uses the concatenated `[pendingUtf8Bytes + buffer]`. This works correctly because the stateful `utf8Decoder` already holds `pendingUtf8Bytes` in its internal incomplete-sequence buffer (that's the maintained invariant), so feeding it just `buffer` is equivalent to feeding it `replayBuffer`. However, this dependency is non-obvious and worth a comment to prevent future regressions — e.g., resetting the `utf8Decoder` instance anywhere would silently break the invariant.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/infra/windows-encoding.ts
Line: 20

Comment:
**`parseWindowsCodePage` matches the first 3-5 digit number, not the last**

The regex `/\b(\d{3,5})\b/` picks the first match in the combined `stdout + "\n" + stderr` string. If `stderr` happens to contain a 3-5 digit number before the code page (e.g., `"Error 100:"`) the wrong value would be captured, leaving `cachedWindowsConsoleEncoding` as `null` (since 100 is not in `WINDOWS_CODEPAGE_ENCODING_MAP`). Using a more anchored pattern — such as matching the last number or keying on the colon-delimited format — would be more robust, though this issue is pre-existing from the original `invoke.ts` implementation.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/process/supervisor/adapters/child.ts
Line: 112-113

Comment:
**`stdoutDecoder`/`stderrDecoder` are shared across multiple `onStdout`/`onStderr` calls**

The decoders are created once at the adapter level. If `onStdout` or `onStderr` is ever invoked more than once (e.g., by two consumers of the same `ChildAdapter`), each call registers a new `"data"` listener on `child.stdout`, but both share the same stateful `stdoutDecoder`. The first listener to handle a `"data"` event will consume the decoder's state, leaving the second listener's call to `stdoutDecoder.decode(chunk)` to return empty output. The current call sites appear to invoke these at most once per adapter, so this is not an active bug, but the lack of a guard or comment makes this fragile as consumers grow.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(clownfish): address review for ghcra..." | Re-trigger Greptile

Comment thread src/infra/windows-encoding.ts
Comment thread src/infra/windows-encoding.ts
Comment thread src/infra/windows-encoding.ts
Comment thread src/process/supervisor/adapters/child.ts Outdated
@vincentkoc vincentkoc force-pushed the clownfish/ghcrawl-199248-agentic-merge branch from aeaf469 to d1eedef Compare April 27, 2026 06:06
@vincentkoc vincentkoc merged commit e6d2c9b into main Apr 27, 2026
66 checks passed
@vincentkoc vincentkoc deleted the clownfish/ghcrawl-199248-agentic-merge branch April 27, 2026 06:11
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
…reness (openclaw#72393)

* fix(process): decode Windows command output with console codepage awareness

* fix(clownfish): address review for ghcrawl-199248-agentic-merge (1)
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
…reness (openclaw#72393)

* fix(process): decode Windows command output with console codepage awareness

* fix(clownfish): address review for ghcrawl-199248-agentic-merge (1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clawsweeper Tracked by ClawSweeper automation maintainer Maintainer-authored PR size: L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Windows exec tool produces garbled Chinese characters due to hardcoded UTF-8 encoding

1 participant