Skip to content

Bug: PTY Shell aborts orphan nested background processes causing OS-level resource exhaustion #20941

@KumarADITHYA123

Description

@KumarADITHYA123

What happened?

OVERVIEW While auditing the cross-platform execution lifecycles in packages/core/src/services/shellExecutionService.ts and process-utils.ts for a GSoC prototype, I identified a severe resource exhaustion vector relating to Node-PTY subsystem tear-downs.

Note: This architectural report is distinct from PR #20916 and Issue #15945. PR #20916 addresses a File Descriptor leak (via missing dispose() lifecycle hooks). This issue addresses OS-Level Zombie Process survival (orphaning background jobs due to incomplete PID signaling during an abort).

When the CLI executes shell operations via child_process, it correctly isolates and terminates the entire process group. However, the primary shell execution path (executeWithPty) utilizes node-pty.

During process abort, the teardown logic falls back to pty.kill('SIGKILL'). POSIX pseudo-terminals route these signals specifically to the session leader (e.g., the bash wrapper). On Windows, it routes to winpty-agent.exe. If the LLM agent generates a command that spans background sub-processes, killing the PTY session leader orphans all nested background jobs.

Over extended, agentic pairing sessions, the CLI will silently bleed Node event loop resources and leak OS-level zombie processes (attached to dead TTYs), eventually resulting in total resource exhaustion.

STEPS TO REPRODUCE

Instantiate the Gemini CLI and allow the agent to invoke the Shell tool.
Direct the agent to execute a command that spawns persistent background child processes (e.g., a compound background sleep chain).
Trigger a hard abort (Ctrl+C), invoking the abortSignal and subsequent killProcessGroup mechanism.
Inspect the OS Process Manager. Note that the primary PTY shell leader is destroyed, but the nested background children remain alive, consuming CPU/Memory silently.

What did you expect to happen?

EXPECTED BEHAVIOR AND FIX A properly sandboxed developer tool must guarantee 100% ephemeral cleanliness during aborts.

To resolve this, we must ensure killProcessGroup fully collapses the tree when a PTY is aborted.

The POSIX path currently implements a group kill (process.kill(-pid)), but the Windows PTY path explicitly skips /t tree-killing and relies solely on pty.kill().
To cleanly fix this, we must extend KillOptions.pty to safely expose pid?, allowing the Windows path to invoke native taskkill /pid /f /t to successfully reap all descendant shell zombies.

Client information

Gemini CLI Version: 0.30.0-nightly Node Version: v20.19.0 OS: Cross-platform (Windows)

Login information

This is a local OS-level process management bug. It triggers regardless of authentication state.

Anything else we need to know?

A 1-Commit architectural fix including the updated TS interface and tests is ready. Please let me know if you would like me to link the PR to structurally patch this leak!

Metadata

Metadata

Assignees

Labels

area/coreIssues related to User Interface, OS Support, Core Functionalityhelp wantedWe will accept PRs from all issues marked as "help wanted". Thanks for your support!priority/p2Important but can be addressed in a future release.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions