Skip to content

fix(exec): apply Windows codepage-aware output decoding in runExec and spawn-based callers [AI-assisted]#64661

Closed
slepybear wants to merge 1 commit intoopenclaw:mainfrom
slepybear:fix/windows-codepage-aware-decoding
Closed

fix(exec): apply Windows codepage-aware output decoding in runExec and spawn-based callers [AI-assisted]#64661
slepybear wants to merge 1 commit intoopenclaw:mainfrom
slepybear:fix/windows-codepage-aware-decoding

Conversation

@slepybear
Copy link
Copy Markdown
Contributor

Summary

  • Problem: runExec() in src/process/exec.ts used encoding: "utf8" which causes Node.js to decode stdout/stderr internally as UTF-8 before returning. On Windows with non-UTF-8 console codepages (e.g. GBK/936, Big5/950, Shift_JIS/932), CJK characters in command output are garbled because the raw byte buffer is lost and cannot be re-decoded with the correct encoding. The same issue existed in tui-local-shell.ts and qmd-process.ts which used .toString("utf8") directly on spawn data chunks.
  • Why it matters: Windows users in CJK locales (Chinese, Japanese, Korean) consistently see garbled output from exec tool calls. This is reported in [Bug]: Windows exec tool garbled Chinese characters (confirmed in 2026.3.24) #56462 and [Bug]: Windows exec tool produces garbled Chinese characters due to hardcoded UTF-8 encoding #50519.
  • What changed: Replaced encoding: "utf8" with encoding: "buffer" in runExec() so execFileAsync returns raw Buffers, then applied decodeCapturedOutputBuffer() (from the existing src/infra/windows-encoding.ts shared module) to correctly decode output based on the detected Windows console encoding. Applied the same fix to tui-local-shell.ts and qmd-process.ts. Also fixed runCommandWithTimeout() which was already using chunk-based collection but was missing the decodeCapturedOutputBuffer() call in resolveFromClose.
  • What did NOT change (scope boundary): The windows-encoding.ts shared module was not modified. invoke.ts and child.ts were already correctly using decodeCapturedOutputBuffer() and were not touched. No new dependencies were added.

Change Type

  • Bug fix

Scope

  • Skills / tool execution

Linked Issue/PR

Root Cause

  • Root cause: runExec() passed encoding: "utf8" to execFileAsync, which instructs Node.js to decode the child process output as UTF-8 before returning it. On Windows systems using codepages like GBK (936), Big5 (950), or Shift_JIS (932), the child process outputs bytes in the local codepage encoding. Node.js decodes these bytes as UTF-8, producing garbled text (mojibake). Since the raw bytes are discarded during this internal decode, there is no way to re-decode them afterward.
  • Missing detection / guardrail: The existing decodeCapturedOutputBuffer() function in src/infra/windows-encoding.ts already handles this correctly by detecting the Windows console codepage and using the appropriate TextDecoder. However, runExec() and two spawn-based callers bypassed this function by using encoding: "utf8" or .toString("utf8") directly.

Testing

  • pnpm test src/process/exec.windows.test.ts -> 14 tests pass (including 3 new encoding tests)
  • pnpm test src/infra/windows-encoding.test.ts -> 18 tests pass
  • pnpm check -> passed (our files: 0 warnings, 0 errors)
  • New test cases:
    • decodes GBK output in runExec on Windows -- verifies GBK-encoded bytes are correctly decoded
    • decodes Shift_JIS output in runExec on Windows -- verifies Shift_JIS-encoded bytes are correctly decoded
    • returns UTF-8 output unchanged on non-Windows in runExec -- verifies no behavior change on macOS/Linux

AI Usage Disclosure

  • AI used for: code analysis, identifying encoding gaps across codebase, generating test structure
  • AI model: Claude
  • Degree of testing: fully tested (unit tests with mocked Windows platform + GBK/Shift_JIS encoding)
  • Human verification: all test expectations verified against actual GBK/Shift_JIS byte sequences; code changes reviewed for Buffer handling correctness

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a04e95a251

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/process/exec.ts
Comment on lines +8 to +11
import {
decodeCapturedOutputBuffer,
resolveWindowsConsoleEncoding,
} from "../infra/windows-encoding.js";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Badge Point encoding imports to a real module

This commit introduces imports from ../infra/windows-encoding.js, but that module does not exist in the tree (the only definitions are in src/node-host/invoke.ts), so loading src/process/exec.ts now fails module resolution before any runtime branch checks. Because runExec is used broadly, this can break command execution on all platforms, not just Windows.

Useful? React with 👍 / 👎.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 11, 2026

Greptile Summary

This PR applies Windows codepage-aware output decoding to runExec(), tui-local-shell.ts, and qmd-process.ts by switching to encoding: "buffer" and calling decodeCapturedOutputBuffer() from a shared src/infra/windows-encoding.ts module, fixing garbled CJK output on non-UTF-8 Windows consoles. It also bundles unrelated test additions for the kimi-coding and qianfan extension plugins from a prior commit.

  • src/infra/windows-encoding.ts does not exist in HEAD (git cat-file -e HEAD:src/infra/windows-encoding.ts → "does not exist in 'HEAD'"). All three changed production files now import decodeCapturedOutputBuffer and resolveWindowsConsoleEncoding from this missing module. This will cause TypeScript compilation failures (pnpm tsgo/pnpm build) and runtime import errors. The file needs to be created — the implementation already exists as private functions inside src/node-host/invoke.ts (lines 123–165) and should be extracted into the shared module.
  • When execFileAsync rejects with a non-zero exit code, the ExecFileException error's .stdout/.stderr properties are now raw Buffer objects rather than decoded strings (changed by encoding: "buffer"). No current callers read those properties from caught errors, but this is a silent behavioral change worth documenting.

Confidence Score: 2/5

Not safe to merge: the shared module src/infra/windows-encoding.ts referenced by all three changed production files does not exist in HEAD, breaking TypeScript compilation and runtime imports.

The encoding logic is correct and follows the established pattern in invoke.ts. However, src/infra/windows-encoding.ts is confirmed absent from the git tree — git cat-file -e HEAD:src/infra/windows-encoding.ts returns 'does not exist in HEAD'. All three production files import from this missing path, so pnpm tsgo/pnpm build will fail. The PR cannot ship without creating and committing the shared module.

A new file src/infra/windows-encoding.ts must be created — extract resolveWindowsConsoleEncoding and decodeCapturedOutputBuffer from src/node-host/invoke.ts (lines 123–165) and update invoke.ts to import from the shared module.

Comments Outside Diff (2)

  1. src/process/exec.ts, line 174-179 (link)

    P2 Error path: .stdout/.stderr on thrown errors are now raw Buffers

    With encoding: "buffer", when execFileAsync rejects because the command exits non-zero, the ExecFileException's .stdout and .stderr properties are Buffer objects rather than decoded strings. Before this PR they were UTF-8 strings. No current callers read those properties from a caught error (verified), but this is a silent interface change worth a comment.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/process/exec.ts
    Line: 174-179
    
    Comment:
    **Error path: `.stdout`/`.stderr` on thrown errors are now raw Buffers**
    
    With `encoding: "buffer"`, when `execFileAsync` rejects because the command exits non-zero, the `ExecFileException`'s `.stdout` and `.stderr` properties are `Buffer` objects rather than decoded strings. Before this PR they were UTF-8 strings. No current callers read those properties from a caught error (verified), but this is a silent interface change worth a comment.
    
    How can I resolve this? If you propose a fix, please make it concise.
  2. src/process/exec.windows.test.ts, line 82-86 (link)

    P2 Stale ExecCall callback type

    The ExecCall type still declares the callback as (err: Error | null, stdout: string, stderr: string) => void — the pre-buffer shape. The new encoding tests (lines 363–446) use a different mock callback shape (err, result: { stdout: Buffer; stderr: Buffer }) => void. Consider aligning the type or defining a separate ExecCallBuffer variant.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/process/exec.windows.test.ts
    Line: 82-86
    
    Comment:
    **Stale `ExecCall` callback type**
    
    The `ExecCall` type still declares the callback as `(err: Error | null, stdout: string, stderr: string) => void` — the pre-buffer shape. The new encoding tests (lines 363–446) use a different mock callback shape `(err, result: { stdout: Buffer; stderr: Buffer }) => void`. Consider aligning the type or defining a separate `ExecCallBuffer` variant.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/process/exec.ts
Line: 8-11

Comment:
**Missing shared module `src/infra/windows-encoding.ts`**

`git cat-file -e HEAD:src/infra/windows-encoding.ts` returns "does not exist in 'HEAD'", and `git ls-files src/infra/ | grep windows` confirms the file is not tracked. All three production files changed in this PR import `decodeCapturedOutputBuffer` and `resolveWindowsConsoleEncoding` from this path, so `pnpm tsgo`/`pnpm build` will fail and the runtime will throw "Cannot find module". The implementation already exists as private functions in `src/node-host/invoke.ts` (lines 123–165) — those should be extracted into `src/infra/windows-encoding.ts` and committed as part of this PR.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/process/exec.ts
Line: 174-179

Comment:
**Error path: `.stdout`/`.stderr` on thrown errors are now raw Buffers**

With `encoding: "buffer"`, when `execFileAsync` rejects because the command exits non-zero, the `ExecFileException`'s `.stdout` and `.stderr` properties are `Buffer` objects rather than decoded strings. Before this PR they were UTF-8 strings. No current callers read those properties from a caught error (verified), but this is a silent interface change worth a comment.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/process/exec.windows.test.ts
Line: 82-86

Comment:
**Stale `ExecCall` callback type**

The `ExecCall` type still declares the callback as `(err: Error | null, stdout: string, stderr: string) => void` — the pre-buffer shape. The new encoding tests (lines 363–446) use a different mock callback shape `(err, result: { stdout: Buffer; stderr: Buffer }) => void`. Consider aligning the type or defining a separate `ExecCallBuffer` variant.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(exec): apply Windows codepage-aware ..." | Re-trigger Greptile

Comment thread src/process/exec.ts
Comment on lines +8 to +11
import {
decodeCapturedOutputBuffer,
resolveWindowsConsoleEncoding,
} from "../infra/windows-encoding.js";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Missing shared module src/infra/windows-encoding.ts

git cat-file -e HEAD:src/infra/windows-encoding.ts returns "does not exist in 'HEAD'", and git ls-files src/infra/ | grep windows confirms the file is not tracked. All three production files changed in this PR import decodeCapturedOutputBuffer and resolveWindowsConsoleEncoding from this path, so pnpm tsgo/pnpm build will fail and the runtime will throw "Cannot find module". The implementation already exists as private functions in src/node-host/invoke.ts (lines 123–165) — those should be extracted into src/infra/windows-encoding.ts and committed as part of this PR.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/process/exec.ts
Line: 8-11

Comment:
**Missing shared module `src/infra/windows-encoding.ts`**

`git cat-file -e HEAD:src/infra/windows-encoding.ts` returns "does not exist in 'HEAD'", and `git ls-files src/infra/ | grep windows` confirms the file is not tracked. All three production files changed in this PR import `decodeCapturedOutputBuffer` and `resolveWindowsConsoleEncoding` from this path, so `pnpm tsgo`/`pnpm build` will fail and the runtime will throw "Cannot find module". The implementation already exists as private functions in `src/node-host/invoke.ts` (lines 123–165) — those should be extracted into `src/infra/windows-encoding.ts` and committed as part of this PR.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2298b52de7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

const next = appendOutputWithCap(stdout, data.toString("utf8"), params.maxOutputChars);
const next = appendOutputWithCap(
stdout,
decodeCapturedOutputBuffer({ buffer: data, windowsEncoding }),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve decoder state across stdout chunks

The new decodeCapturedOutputBuffer({ buffer: data, windowsEncoding }) call decodes each data event independently, so Windows multibyte encodings (for example GBK/Shift_JIS) still corrupt output when a character is split across chunk boundaries. In that case each partial byte sequence is decoded as replacement characters, which can break downstream parsing and error reporting even though the full byte stream is valid; this path should use a per-stream TextDecoder with streaming enabled or decode after reassembling bytes at close.

Useful? React with 👍 / 👎.

@slepybear slepybear force-pushed the fix/windows-codepage-aware-decoding branch from adf0951 to 48818dc Compare April 15, 2026 07:57
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 48818dce42

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +124 to +128
child.stdout.on("data", (buf: Buffer) => {
stdout = appendWithCap(
stdout,
decodeCapturedOutputBuffer({ buffer: buf, windowsEncoding }),
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Decode local-shell output with a stateful decoder

decodeCapturedOutputBuffer is called on each data chunk independently, which instantiates a fresh decoder per chunk. On Windows code pages like GBK/Shift_JIS, multibyte characters that straddle chunk boundaries are decoded as replacement characters (), so CJK local-shell output can still be corrupted even when the full byte stream is valid. This needs streaming decode state (single TextDecoder with stream: true) or byte buffering until process close.

Useful? React with 👍 / 👎.

@slepybear
Copy link
Copy Markdown
Contributor Author

@obviyus @vincentkoc Would you mind taking a look at this PR? It fixes garbled CJK output on Windows consoles. All 41 CI checks pass. Thanks!

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Apr 27, 2026

Closing this as duplicate or superseded after Codex automated review.

Close as superseded. Current main still has the Windows UTF-8-only exec/spawn decoding paths, so this is not implemented on main, but #72393 is an open maintainer-authored, maintainer-labeled replacement that explicitly supersedes this PR, credits @slepybear's contribution, and addresses the remaining review concern around split multibyte output chunks.

Best possible solution:

Close this PR as superseded by #72393. Continue maintainer review and validation on #72393 as the canonical Windows command-output decoding fix, preserving @slepybear's credit there and folding in any remaining spawn-caller coverage if maintainers decide those surfaces are in scope.

What I checked:

So I’m closing this here and keeping the remaining discussion on the canonical linked item.

Codex Review notes: model gpt-5.5, reasoning high; reviewed against 2dba9e6a765a.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Windows exec tool garbled Chinese characters (confirmed in 2026.3.24)

1 participant