fix: Windows stale-cwd causes exit 126 on all subsequent commands#13397
Open
Stellarrysss wants to merge 1 commit into
Open
fix: Windows stale-cwd causes exit 126 on all subsequent commands#13397Stellarrysss wants to merge 1 commit into
Stellarrysss wants to merge 1 commit into
Conversation
The terminal tool's command wrapper prefixes every user command with
`cd '<self.cwd>' || exit 126`. When `self.cwd` points at a directory
that has been deleted (temp build artefact, checkpoint cleanup, or an
intermediate path from an earlier `cd ... && rm -rf <same>`), the cd
fails and the script exits 126 before running anything. Because the
`pwd -P > <cwd_file>` line runs *after* the eval, the cwd file is
never refreshed, so `self.cwd` stays wedged on the bad path and every
subsequent `execute()` produces the same 126 — until the process
restarts.
This manifests as the well-known "every terminal command returns
exit 126" bug on Windows, where Git Bash path translation and temp
directory cleanup together make stale-cwd scenarios especially common.
Fix is two complementary layers:
1. Python-side self-heal in `BaseEnvironment.execute()`. Before each
command, `_is_cwd_stale(effective_cwd)` checks whether the path
can be stat'd. If stale, `self.cwd` is reset to the newly-captured
`self._startup_cwd` and a warning is logged. On Windows we handle
both `C:\Users\...` and Git Bash `/c/Users/...` forms via a small
`_posix_drive_to_win` helper; POSIX-only paths like `/tmp/...` on
Windows (whose MSYS mapping we don't resolve in Python) fall
through to the second layer.
2. Shell-side fallback in `_wrap_command`. The single
`cd '<cwd>' || exit 126` becomes
`cd '<cwd>' 2>/dev/null || { cd '<startup_cwd>' 2>/dev/null && echo '...' >&2; } || exit 126`.
If the Python check missed the stale path (e.g. race with an
external cleanup, or a POSIX-only path), bash falls back to the
startup cwd and emits a stderr breadcrumb so the caller sees what
happened. Only exit 126 if *both* cwds fail.
Non-breaking on Linux/Mac: `_is_cwd_stale` uses `os.path.isdir`
there; shell fallback is a superset of the prior behaviour.
Tests (Windows, Python 3.12):
- `_posix_drive_to_win`: /c/Users conv, /d/foo conv, /tmp/x unchanged (pass)
- `_is_cwd_stale`: empty=stale, live=ok, nonexistent Win=stale,
`/c/<bogus>`=stale, `/tmp/<bogus>`=not-declared (pass)
- integration: fresh env `echo hello` → rc=0 (pass)
- integration: cd into tempdir, delete tempdir externally, run
`echo recovered` → rc=0, self.cwd reset to startup (pass)
- integration: force `self.cwd = /tmp/<not-here>` (Python can't
verify) → shell fallback kicks in, rc=0 (pass)
- integration: consecutive commands after self-heal remain stable (pass)
This was referenced Apr 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix the "every terminal command returns exit 126" wedge that users on Windows hit after a prior command cd's into a directory that later gets deleted (temp build dir, checkpoint, intermediate tempdir). The root cause is the shell wrapper's
cd '<self.cwd>' || exit 126guard combined withpwd -P > <cwd_file>running after the user command — so a failing cd short-circuits the write, andself.cwdstays permanently wedged on the bad path until Hermes restarts.Root cause
BaseEnvironment._wrap_commandbuilds a per-execute bash script of the shape:When
self.cwdpoints at a deleted directory,cdfails, the script exits 126 beforepwd -P > <cwd_file>, soself._update_cwd()reads the old (stale) path back next turn — and every subsequentexecute()repeats the 126 exit until the whole Python process restarts.On Windows this scenario is especially common because:
os.getcwd()(used at construction time) returnsC:\Users\...whilepwd -Preturns/c/Users/...— two representations that only the first command's bash successfully normalises; if that fails for any reason, drift can leaveself.cwdin a form Git Bash can't resolve.cdinto ephemeral build/checkpoint temp dirs that are later cleaned.Fix
Two complementary layers, both in
tools/environments/base.py:Layer 1 — Python-side self-heal in
BaseEnvironment.execute().A new module-level
_is_cwd_stale(cwd)helper callsos.path.isdir(or, on Windows, translates Git Bash's/c/Users/...toC:/Users/...via_posix_drive_to_winbefore callingisdir). When the check reports stale,effective_cwdis reset toself._startup_cwd(captured at__init__) and alogger.warningis emitted. Conservative on Windows: POSIX-only paths like/tmp/...(whose MSYS mapping isn't reconstructed in Python) are not declared stale — the shell-side fallback handles them.Layer 2 — Shell-side fallback in
_wrap_command.The single
cd '<cwd>' || exit 126is replaced withIf the Python check missed (race with external cleanup, or a path Python couldn't resolve), bash falls back to startup cwd and emits a stderr breadcrumb so the caller can see what happened. The final
exit 126still fires only when both cwds fail — preserving the original signal when the environment is genuinely broken.self._startup_cwdis captured once inBaseEnvironment.__init__alongsideself.cwd.Non-breaking on Linux/Mac
_is_cwd_staleon non-Windows falls straight through toos.path.isdir— same semantics as a plain fs check._posix_drive_to_winis a no-op for paths that don't match/<letter>/....cd || exit 126: same sentinel on double failure, no new failure modes._startup_cwdis purely internal; no public API change.Test results
All passing on Windows 11 / Python 3.12 / Git for Windows bash:
_posix_drive_to_win:/c/Users/...→C:/Users/...,/tmp/xunchanged (4 cases)_is_cwd_stale: empty=stale, live=ok, nonexistent Win=stale,/c/<bogus>=stale,/tmp/<bogus>=not-declaredecho hello→ rc=0cd <tempdir>, delete tempdir externally,echo recovered→ rc=0,self.cwdreset to startupself.cwd = /tmp/<not-here>(Python can't verify on Windows),echo shell_fallback_worked→ rc=0 (shell fallback fires)Total: 10/10 unit + 4/4 integration = 14 assertions, all pass. On a wedged environment without the patch, every command returns 126 including the trivial
echo hello.Related / side finding
While writing the regression test I noticed
_wait_for_processusesselect.select([pipe_fd], ...)which raisesOSErroron Windows (select there only supports sockets) — this is orthogonal to the exit-126 bug but does mean local terminal commands returnoutput=""even whenrc=0on Windows. That's a separate issue I can file a follow-up for if useful; it didn't affect the 126 diagnosis because users see the return code, not the captured output.