fix(exec): apply Windows codepage-aware output decoding in runExec and spawn-based callers [AI-assisted]#64661
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a04e95a251
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| import { | ||
| decodeCapturedOutputBuffer, | ||
| resolveWindowsConsoleEncoding, | ||
| } from "../infra/windows-encoding.js"; |
There was a problem hiding this comment.
Point encoding imports to a real module
This commit introduces imports from ../infra/windows-encoding.js, but that module does not exist in the tree (the only definitions are in src/node-host/invoke.ts), so loading src/process/exec.ts now fails module resolution before any runtime branch checks. Because runExec is used broadly, this can break command execution on all platforms, not just Windows.
Useful? React with 👍 / 👎.
Greptile SummaryThis PR applies Windows codepage-aware output decoding to
Confidence Score: 2/5Not safe to merge: the shared module The encoding logic is correct and follows the established pattern in A new file
|
| import { | ||
| decodeCapturedOutputBuffer, | ||
| resolveWindowsConsoleEncoding, | ||
| } from "../infra/windows-encoding.js"; |
There was a problem hiding this comment.
Missing shared module
src/infra/windows-encoding.ts
git cat-file -e HEAD:src/infra/windows-encoding.ts returns "does not exist in 'HEAD'", and git ls-files src/infra/ | grep windows confirms the file is not tracked. All three production files changed in this PR import decodeCapturedOutputBuffer and resolveWindowsConsoleEncoding from this path, so pnpm tsgo/pnpm build will fail and the runtime will throw "Cannot find module". The implementation already exists as private functions in src/node-host/invoke.ts (lines 123–165) — those should be extracted into src/infra/windows-encoding.ts and committed as part of this PR.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/process/exec.ts
Line: 8-11
Comment:
**Missing shared module `src/infra/windows-encoding.ts`**
`git cat-file -e HEAD:src/infra/windows-encoding.ts` returns "does not exist in 'HEAD'", and `git ls-files src/infra/ | grep windows` confirms the file is not tracked. All three production files changed in this PR import `decodeCapturedOutputBuffer` and `resolveWindowsConsoleEncoding` from this path, so `pnpm tsgo`/`pnpm build` will fail and the runtime will throw "Cannot find module". The implementation already exists as private functions in `src/node-host/invoke.ts` (lines 123–165) — those should be extracted into `src/infra/windows-encoding.ts` and committed as part of this PR.
How can I resolve this? If you propose a fix, please make it concise.a04e95a to
2298b52
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2298b52de7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const next = appendOutputWithCap(stdout, data.toString("utf8"), params.maxOutputChars); | ||
| const next = appendOutputWithCap( | ||
| stdout, | ||
| decodeCapturedOutputBuffer({ buffer: data, windowsEncoding }), |
There was a problem hiding this comment.
Preserve decoder state across stdout chunks
The new decodeCapturedOutputBuffer({ buffer: data, windowsEncoding }) call decodes each data event independently, so Windows multibyte encodings (for example GBK/Shift_JIS) still corrupt output when a character is split across chunk boundaries. In that case each partial byte sequence is decoded as replacement characters, which can break downstream parsing and error reporting even though the full byte stream is valid; this path should use a per-stream TextDecoder with streaming enabled or decode after reassembling bytes at close.
Useful? React with 👍 / 👎.
2298b52 to
adf0951
Compare
adf0951 to
48818dc
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 48818dce42
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| child.stdout.on("data", (buf: Buffer) => { | ||
| stdout = appendWithCap( | ||
| stdout, | ||
| decodeCapturedOutputBuffer({ buffer: buf, windowsEncoding }), | ||
| ); |
There was a problem hiding this comment.
Decode local-shell output with a stateful decoder
decodeCapturedOutputBuffer is called on each data chunk independently, which instantiates a fresh decoder per chunk. On Windows code pages like GBK/Shift_JIS, multibyte characters that straddle chunk boundaries are decoded as replacement characters (�), so CJK local-shell output can still be corrupted even when the full byte stream is valid. This needs streaming decode state (single TextDecoder with stream: true) or byte buffering until process close.
Useful? React with 👍 / 👎.
|
@obviyus @vincentkoc Would you mind taking a look at this PR? It fixes garbled CJK output on Windows consoles. All 41 CI checks pass. Thanks! |
|
Closing this as duplicate or superseded after Codex automated review. Close as superseded. Current main still has the Windows UTF-8-only exec/spawn decoding paths, so this is not implemented on main, but #72393 is an open maintainer-authored, maintainer-labeled replacement that explicitly supersedes this PR, credits @slepybear's contribution, and addresses the remaining review concern around split multibyte output chunks. Best possible solution: Close this PR as superseded by #72393. Continue maintainer review and validation on #72393 as the canonical Windows command-output decoding fix, preserving @slepybear's credit there and folding in any remaining spawn-caller coverage if maintainers decide those surfaces are in scope. What I checked:
So I’m closing this here and keeping the remaining discussion on the canonical linked item. Codex Review notes: model gpt-5.5, reasoning high; reviewed against 2dba9e6a765a. |
Summary
runExec()insrc/process/exec.tsusedencoding: "utf8"which causes Node.js to decode stdout/stderr internally as UTF-8 before returning. On Windows with non-UTF-8 console codepages (e.g. GBK/936, Big5/950, Shift_JIS/932), CJK characters in command output are garbled because the raw byte buffer is lost and cannot be re-decoded with the correct encoding. The same issue existed intui-local-shell.tsandqmd-process.tswhich used.toString("utf8")directly on spawn data chunks.encoding: "utf8"withencoding: "buffer"inrunExec()soexecFileAsyncreturns raw Buffers, then applieddecodeCapturedOutputBuffer()(from the existingsrc/infra/windows-encoding.tsshared module) to correctly decode output based on the detected Windows console encoding. Applied the same fix totui-local-shell.tsandqmd-process.ts. Also fixedrunCommandWithTimeout()which was already using chunk-based collection but was missing thedecodeCapturedOutputBuffer()call inresolveFromClose.windows-encoding.tsshared module was not modified.invoke.tsandchild.tswere already correctly usingdecodeCapturedOutputBuffer()and were not touched. No new dependencies were added.Change Type
Scope
Linked Issue/PR
Root Cause
runExec()passedencoding: "utf8"toexecFileAsync, which instructs Node.js to decode the child process output as UTF-8 before returning it. On Windows systems using codepages like GBK (936), Big5 (950), or Shift_JIS (932), the child process outputs bytes in the local codepage encoding. Node.js decodes these bytes as UTF-8, producing garbled text (mojibake). Since the raw bytes are discarded during this internal decode, there is no way to re-decode them afterward.decodeCapturedOutputBuffer()function insrc/infra/windows-encoding.tsalready handles this correctly by detecting the Windows console codepage and using the appropriateTextDecoder. However,runExec()and two spawn-based callers bypassed this function by usingencoding: "utf8"or.toString("utf8")directly.Testing
pnpm test src/process/exec.windows.test.ts-> 14 tests pass (including 3 new encoding tests)pnpm test src/infra/windows-encoding.test.ts-> 18 tests passpnpm check-> passed (our files: 0 warnings, 0 errors)decodes GBK output in runExec on Windows-- verifies GBK-encoded bytes are correctly decodeddecodes Shift_JIS output in runExec on Windows-- verifies Shift_JIS-encoded bytes are correctly decodedreturns UTF-8 output unchanged on non-Windows in runExec-- verifies no behavior change on macOS/LinuxAI Usage Disclosure