[codex] fix(tui): reap dashboard slash workers#32483
Closed
stardust-mem wants to merge 1 commit into
Closed
Conversation
Contributor
|
Superseded by #42132 (merged), which closes the slash_worker subprocess leak via two guards: process-group kill on PTY teardown + a cross-platform parent-death watchdog in the worker. Closing as resolved — thanks for tackling this; the merged fix salvaged the process-group-kill and watchdog approaches with contributor authorship preserved. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #32377 by tightening dashboard PTY teardown so
tui_gateway.slash_workerdescendants do not survive repeated/api/ptydisconnects.Root Cause
The dashboard PTY close path signalled only the PTY root process. If that process exited while descendants such as the TUI gateway's slash worker were still alive, the worker could become orphaned and accumulate under the dashboard service cgroup. Separately,
_SlashWorker.close()did not wait after SIGKILL, and the worker process had no Linux parent-death guard for abrupt gateway exits.Changes
PtyBridgeand send SIGHUP -> SIGTERM -> SIGKILL to that group during close.PR_SET_PDEATHSIGsetup intui_gateway.slash_workeritself, avoidingpreexec_fnfrom the multithreaded gateway parent._SlashWorker.close()wait after SIGKILL and close stdio pipes in all exit paths.Validation
pytest tests/hermes_cli/test_pty_bridge.py::TestPtyBridgeClose::test_close_terminates_child_process_group -qpytest tests/tui_gateway/test_protocol.py::test_slash_worker_child_installs_linux_parent_death_signal tests/tui_gateway/test_protocol.py::test_slash_worker_close_waits_after_kill -qpytest tests/hermes_cli/test_pty_bridge.py tests/hermes_cli/test_web_server.py::TestPtyWebSocket -qpytest tests/tui_gateway/test_protocol.py -qpytest tests/test_tui_gateway_server.py::test_session_create_close_race_does_not_orphan_worker tests/test_tui_gateway_server.py::test_session_create_no_race_keeps_worker_alive -qpython -m py_compile hermes_cli/pty_bridge.py tui_gateway/server.py tui_gateway/slash_worker.pygit diff --checkNote:
pytest tests/test_tui_gateway_server.py::test_browser_manage_connect_default_local_reports_launch_hint -qfails in this checkout independently of these changes; the assertion expects a browser-launch hint that the current local environment/code path does not emit.