Skip to content

fix(terminal): rewrite A && B & to A && { B & } (salvage #12207)#12724

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-dbe41156
Apr 19, 2026
Merged

fix(terminal): rewrite A && B & to A && { B & } (salvage #12207)#12724
teknium1 merged 1 commit into
mainfrom
hermes/hermes-dbe41156

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Salvage of #12207 by @handsdiff onto current main.

Summary

Prevents the bash subshell leak where A && B & is parsed as {A && B} & — forking a subshell that waits forever on B when B is a long-lived process. After #8340 fixed the drain-hang symptom, this leak still produced orphan bash processes reparented to init. Rewriting to A && { B & } (brace group, no subshell fork) eliminates the leak.

Changes

  • tools/terminal_tool.py: _rewrite_compound_background() — shell-aware walk (quotes, parens, brace groups, &>/>& redirects) that rewrites A && B & / A || B & at depth 0 into A && { B & } / A || { B & }.
  • tools/environments/base.py: call _rewrite_compound_background() from execute() after _prepare_command so it runs on every backend (local, ssh, docker, modal).
  • tests/tools/test_terminal_compound_background.py: 34 tests — rewrites, preserved cases, redirects, quoting, parens, idempotence, edge cases.

Validation

Before After
cd /tmp && python3 -m http.server PORT &>/dev/null & leaked bash subshell (PPID=1, stuck in wait4) + server only server (PPID=1), no bash leak
elapsed 0.24s (from #8340) 0.24s
_rewrite_compound_background unit cases n/a 8/8 expected transforms correct
Targeted pytest n/a 70 passed (compound_background + foreground_timeout_cap + terminal_tool + base_environment)

Also complements the #12723 guardrail (salvaged earlier today from #10810): the guard blocks the model from calling these patterns in foreground at all, while this PR ensures that if the rewrite path is taken for any reason, no subshell leaks.

Root cause (from @handsdiff's PR body)

bash binds && tighter than &, so A && B & forks a subshell for the compound and backgrounds it. The subshell runs B in foreground and blocks in wait4 forever when B doesn't exit — leaking as an orphan when the outer bash exits.

Closes #12207 with authorship preserved via cherry-pick.

…ll leak

bash parses `A && B &` with `&&` tighter than `&`, so it forks a subshell
for the compound and backgrounds the subshell. Inside the subshell, B
runs foreground, so the subshell waits for B. When B is a process that
doesn't naturally exit (`python3 -m http.server`, `yes > /dev/null`, a
long-running daemon), the subshell is stuck in `wait4` forever and leaks
as an orphan reparented to init.

Observed in production: agents running `cd X && python3 -m http.server
8000 &>/dev/null & sleep 1 && curl ...` as a "start a local server, then
verify it" one-liner. Outer bash exits cleanly; the subshell never does.
Across ~3 days of use, 8 unique stuck-terminal events and 7 leaked
bash+server pairs accumulated on the fleet, with some sessions appearing
hung from the user's perspective because the subshell's open stdout pipe
kept the terminal tool's drain thread blocked.

This is distinct from the `set +m` fix in 933fbd8 (which addressed
interactive-shell job-control waiting at exit). `set +m` doesn't help
here because `bash -c` is non-interactive and job control is already
off; the problem is the subshell's own internal wait for its foreground
B, not the outer shell's job-tracking.

The fix: walk the command shell-aware (respecting quotes, parens, brace
groups, `&>`/`>&` redirects), find `A && B &` / `A || B &` at depth 0
and rewrite the tail to `A && { B & }`. Brace groups don't fork a
subshell — they run in the current shell. `B &` inside the group is a
simple background (no subshell wait). The outer `&` is absorbed into
the group, so the compound no longer needs an explicit subshell.

`&&` error-propagation is preserved exactly: if A fails, `&&`
short-circuits and B never runs.

- Skips quoted strings, comment lines, and `(…)` subshells
- Handles `&>/dev/null`, `2>&1`, `>&2` without mistaking them for `&`
- Resets chain state at `;`, `|`, and newlines
- Tracks brace depth so already-rewritten output is idempotent
- Walks using the existing `_read_shell_token` tokenizer, matching the
  pattern of `_rewrite_real_sudo_invocations`

Called once from `BaseEnvironment.execute` right after
`_prepare_command`, so it runs for every backend (local, ssh, docker,
modal, etc.) with no per-backend plumbing.

34 new tests covering rewrite cases, preservation cases, redirect
edge-cases, quoting/parens/backticks, idempotency, and empty/edge
inputs. End-to-end verified on a test VM: the exact vela-incident
command now returns in ~1.3s with no leaked bash, only the intentional
backgrounded server.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@teknium1 teknium1 merged commit abfc184 into main Apr 19, 2026
3 of 4 checks passed
@teknium1 teknium1 deleted the hermes/hermes-dbe41156 branch April 19, 2026 23:53
handsdiff added a commit to handsdiff/hermes-agent that referenced this pull request Apr 24, 2026
…12724; slate-dg-proxy-hook superseded

- NousResearch#12207 (compound-background-subshell-leak) cherry-picked upstream as abfc184
  via PR NousResearch#12724 with authorship preserved. Moved to "Merged" section;
  removed from rebase workflow and sequential-merge loop.
- feat/slate-dg-proxy-hook (no PR) added under "Closed / superseded":
  abandoned prototype that built the Discord zero-secrets transport inside
  hermes-agent as an opt-in SLATE_DG_PROXY hook. Provisioner-side .pth
  install (hermes-provisioner/dg_patch.py) superseded it — zero hermes-agent
  patches required. Also mis-cut off fork main so unmergeable as upstream PR.
- Dropped stale fix/compressor-tool-args-valid-json rebase line (branch
  already deleted; the "removed" note below was orphaned).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants