fix(v0.8.10): bug cluster (#558 #420 #421)#570
Conversation
Sandboxed shell sessions on macOS were rejecting reads/writes to
~/.cargo/registry/{cache,index,src} and ~/.cargo/git, making
`cargo build`/`cargo publish` unrunnable from inside the TUI's shell tool
(hit while shipping v0.8.9).
* Resolve cargo home via `CARGO_HOME` env (cargo's own override) with a
`$HOME/.cargo` fallback. New helper `resolve_cargo_home()` is shared by
the policy generator and the param table to keep them in lockstep —
emit one without the other and `sandbox-exec` refuses to load the
profile.
* Always allow read access on `(param "CARGO_HOME")`. Grant write access
to the `registry/` and `git/` subpaths whenever the policy isn't
read-only — those directories must be mutable for `cargo` to populate
them on a cache miss.
* Skip the cargo block entirely when neither `CARGO_HOME` nor `HOME` is
set so we never reference an undefined `(param ...)`. (Practically
only fires in stripped CI containers.)
Two tests cover the policy/param sync — one with HOME set, one with
both vars cleared — using a module-local `ENV_LOCK` mutex to serialize
env mutation, mirroring the pattern landed in `main.rs` at d06eaed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stdio MCP child processes were getting SIGKILL'd via tokio's `kill_on_drop(true)` on TUI exit. The contract calls for SIGTERM so well-behaved servers can flush pending state before dying. Changes: * New `async fn shutdown(&mut self)` on `McpTransport` (default no-op). `StdioTransport` overrides it to send SIGTERM via `libc::kill` and await child exit up to a 2-second grace window before letting drop fire SIGKILL as the backstop. Graceful path on Unix; on Windows the `kill_on_drop` (TerminateProcess) path remains unchanged because there's no SIGTERM-equivalent. * New `Drop` on `StdioTransport` sends SIGTERM as a fallback for code paths that didn't call `shutdown` explicitly. Drop is sync, so the signal arrives microseconds before tokio's own Child drop fires SIGKILL, but it still gives MCP servers that handle SIGTERM idempotently a chance to start cleanup. * New `McpPool::shutdown_all` walks every connection, calls the async shutdown, and clears the pool. * The agent engine's run loop calls `shutdown_all` on `Op::Shutdown` before the pool drops so graceful exit is the default path. Best-effort — if the pool isn't initialized or the lock is contended, the Drop fallback still sends SIGTERM. Test: `stdio_transport_shutdown_terminates_child` spawns a real `cat` child, calls `shutdown`, asserts the call returns within the grace window, and confirms the pid is reaped (`kill(pid, 0)` returns ESRCH). Unix-only — Windows already exercised by the kill_on_drop path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…421) Shell-spawned children survive the TUI on abnormal exit (panic without unwind, SIGKILL of the parent, OOM). The existing cooperative cancel path SIGKILLs the whole process group via the cancellation token, but that only fires when the parent gets to run its drop / cleanup code. A crashed parent leaves children orphaned to init. * New `install_parent_death_signal` helper called on every shell Command setup. On Linux it adds a `pre_exec` hook that runs `prctl(PR_SET_PDEATHSIG, SIGTERM)` immediately after fork — the kernel then sends SIGTERM to the child the moment our process exits, even on SIGKILL of the TUI itself. * All three Command spawn sites in `tools/shell.rs` (one-shot, wait, interactive) get the same hook. * Documented the macOS / Windows gap: those platforms have no kernel equivalent. The cooperative path still handles normal shutdown; abnormal exit there is tracked as a watchdog follow-up per the issue's acceptance criteria. The pre_exec body is `unsafe`-marked because it runs in the post-fork async-signal-safe window. The closure only calls `libc::prctl` with stack-allocated constants; no heap, no locks. Errno is surfaced via `std::io::Error::last_os_error` but the spawn is not aborted — losing the safety net is strictly less bad than failing the user's command. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request implements graceful shutdown for MCP servers using SIGTERM and a grace period, updates the macOS sandbox policy to allow access to Cargo home directories, and adds parent-death signaling on Linux for shell-spawned processes. The review feedback identifies that the parent-death signal helper is missing from the background process spawning path and suggests parallelizing the MCP connection shutdown to prevent sequential delays during exit.
| /// handles normal shutdown; abnormal exit can leak children — tracked as a | ||
| /// follow-up watchdog item per the original issue's acceptance criteria. | ||
| #[cfg(target_os = "linux")] | ||
| fn install_parent_death_signal(cmd: &mut Command) { |
There was a problem hiding this comment.
The install_parent_death_signal helper is correctly applied to synchronous and interactive shell executions, but it appears to be missing from the background execution path in spawn_background_sandboxed (specifically for the non-TTY branch using std::process::Command around line 991). Background processes should also be configured to receive SIGTERM on parent death to ensure they are reaped if the TUI exits abnormally (e.g., due to a panic or SIGKILL).
| pub async fn shutdown_all(&mut self) { | ||
| let names: Vec<String> = self.connections.keys().cloned().collect(); | ||
| for name in names { | ||
| if let Some(conn) = self.connections.get_mut(&name) { | ||
| conn.transport.shutdown().await; | ||
| } | ||
| } | ||
| self.connections.clear(); | ||
| } |
There was a problem hiding this comment.
The shutdown of MCP connections is currently performed sequentially. If multiple servers are configured and some are slow to respond or hang, this could delay the TUI shutdown process for up to STDIO_SHUTDOWN_GRACE * N seconds. Consider parallelizing the shutdown process using join_all or a JoinSet to ensure a faster exit.
pub async fn shutdown_all(&mut self) {
let futures = self.connections.drain().map(|(_, mut conn)| async move {
conn.transport.shutdown().await;
});
futures_util::future::join_all(futures).await;
}…me pool Sub-agent UI labels rotate through `WHALE_NICKNAMES`. The list was English-only — every spawn produced "Blue", "Humpback", etc. Adding Simplified-Chinese names (蓝鲸, 座头鲸, 抹香鲸, …) interleaved with the English ones doubles the pool size and gives a roughly even mix on each new spawn, with the same wraparound behavior at index >= 48. Goal is friendly variety, not strict locale matching — a CN-locale user still gets some English names and vice versa. Pure cosmetic; no behavioral or persistence-format change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR targets the v0.8.10 patch release bug cluster by improving sandbox compatibility for Cargo on macOS, adding graceful shutdown for stdio-based MCP servers, and reducing leaked shell-spawned subprocesses on Linux via parent-death signaling.
Changes:
- Extend the macOS seatbelt sandbox policy to allow Cargo registry/git cache writes (when sandbox policy permits writes) and add tests to keep SBPL params/policy in sync.
- Add a
shutdown()hook toMcpTransport, implement SIGTERM + grace-window shutdown forStdioTransport, and wire pool-wide shutdown into the engine’s shutdown path. - On Linux, install
PR_SET_PDEATHSIG(SIGTERM)for shell-spawned children viapre_execto improve cleanup on abnormal parent exit.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
crates/tui/src/tools/shell.rs |
Adds Linux-only parent-death signal configuration for shell-spawned child processes. |
crates/tui/src/sandbox/seatbelt.rs |
Adds Cargo cache write allowances (registry/git) to the macOS seatbelt profile and new env-mutation-serialized tests. |
crates/tui/src/mcp.rs |
Introduces transport-level graceful shutdown and pool-wide shutdown for stdio MCP children (SIGTERM + bounded wait). |
crates/tui/src/core/engine.rs |
Calls MCP pool graceful shutdown on engine exit. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // only calls `libc::prctl` with stack-allocated constant arguments and | ||
| // does not touch heap memory or the parent's locks. Both requirements | ||
| // (async-signal-safe + no allocation in the post-fork window) are met. | ||
| unsafe { | ||
| cmd.pre_exec(|| { | ||
| let result = libc::prctl(libc::PR_SET_PDEATHSIG, libc::SIGTERM, 0, 0, 0); | ||
| if result == -1 { | ||
| // Surface the errno but do not abort the spawn — the child | ||
| // will simply lose the parent-death cleanup safety net. | ||
| Err(std::io::Error::last_os_error()) | ||
| } else { | ||
| Ok(()) | ||
| } |
| Some(PathBuf::from(home).join(".cargo")) | ||
| } | ||
|
|
| impl Drop for StdioTransport { | ||
| fn drop(&mut self) { | ||
| send_sigterm(&self.child); | ||
| } |
| let names: Vec<String> = self.connections.keys().cloned().collect(); | ||
| for name in names { | ||
| if let Some(conn) = self.connections.get_mut(&name) { | ||
| conn.transport.shutdown().await; | ||
| } |
| // Best-effort: pool may not exist (no MCP configured) and the lock | ||
| // can fail under contention; either way the kill_on_drop fallback | ||
| // still reaps the children. |
- add /memory help and clearer invalid-subcommand guidance - register /memory in shared slash-command help - align memory docs with current behavior and config - add focused tests for help and discovery
First-run users hit Welcome → API key → Trust → Tips with no obvious way to discover that a Chinese / Japanese / Portuguese UI exists. Issue #566 surfaced this from a Chinese user. The TUI already has full translations for `en`, `ja`, `zh-Hans`, `pt-BR` (plus `auto` detection from `LC_ALL` / `LANG`); the only gap was discoverability. * New `OnboardingState::Language` variant inserted between Welcome and ApiKey. `Welcome → Language → ApiKey/Trust/Tips` is the new flow; `Esc` from Language returns to Welcome. * New `tui/onboarding/language.rs` panel renders the picker with hotkeys 1-5 for `auto` / `en` / `ja` / `zh-Hans` / `pt-BR`. Each row shows the native name (日本語, 简体中文, …) plus an English label so the user doesn't have to read the target language to pick it. The currently persisted setting is highlighted with a filled bullet. * Selecting a hotkey calls the new `App::set_locale_from_onboarding` which writes through `Settings::set("locale", …)` + `Settings::save` and re-resolves `app.ui_locale` immediately so the rest of onboarding renders in the chosen language. Pressing Enter keeps the current setting (defaults to `auto`). * `onboarding_step` now reports `1/N` … `N/N` correctly with the new step inserted (Welcome=1, Language=2, ApiKey=3 if needed, …). * Doesn't expand the supported-locale set — the QA-pending list in `localization::PLANNED_QA_LOCALES` is unchanged. We only show what ships with full coverage today. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
) * fix(sandbox): allow ~/.cargo/registry under macOS seatbelt (Hmbown#558) Sandboxed shell sessions on macOS were rejecting reads/writes to ~/.cargo/registry/{cache,index,src} and ~/.cargo/git, making `cargo build`/`cargo publish` unrunnable from inside the TUI's shell tool (hit while shipping v0.8.9). * Resolve cargo home via `CARGO_HOME` env (cargo's own override) with a `$HOME/.cargo` fallback. New helper `resolve_cargo_home()` is shared by the policy generator and the param table to keep them in lockstep — emit one without the other and `sandbox-exec` refuses to load the profile. * Always allow read access on `(param "CARGO_HOME")`. Grant write access to the `registry/` and `git/` subpaths whenever the policy isn't read-only — those directories must be mutable for `cargo` to populate them on a cache miss. * Skip the cargo block entirely when neither `CARGO_HOME` nor `HOME` is set so we never reference an undefined `(param ...)`. (Practically only fires in stripped CI containers.) Two tests cover the policy/param sync — one with HOME set, one with both vars cleared — using a module-local `ENV_LOCK` mutex to serialize env mutation, mirroring the pattern landed in `main.rs` at d06eaed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(mcp): graceful SIGTERM shutdown for stdio servers (Hmbown#420) Stdio MCP child processes were getting SIGKILL'd via tokio's `kill_on_drop(true)` on TUI exit. The contract calls for SIGTERM so well-behaved servers can flush pending state before dying. Changes: * New `async fn shutdown(&mut self)` on `McpTransport` (default no-op). `StdioTransport` overrides it to send SIGTERM via `libc::kill` and await child exit up to a 2-second grace window before letting drop fire SIGKILL as the backstop. Graceful path on Unix; on Windows the `kill_on_drop` (TerminateProcess) path remains unchanged because there's no SIGTERM-equivalent. * New `Drop` on `StdioTransport` sends SIGTERM as a fallback for code paths that didn't call `shutdown` explicitly. Drop is sync, so the signal arrives microseconds before tokio's own Child drop fires SIGKILL, but it still gives MCP servers that handle SIGTERM idempotently a chance to start cleanup. * New `McpPool::shutdown_all` walks every connection, calls the async shutdown, and clears the pool. * The agent engine's run loop calls `shutdown_all` on `Op::Shutdown` before the pool drops so graceful exit is the default path. Best-effort — if the pool isn't initialized or the lock is contended, the Drop fallback still sends SIGTERM. Test: `stdio_transport_shutdown_terminates_child` spawns a real `cat` child, calls `shutdown`, asserts the call returns within the grace window, and confirms the pid is reaped (`kill(pid, 0)` returns ESRCH). Unix-only — Windows already exercised by the kill_on_drop path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(shell): set PR_SET_PDEATHSIG on Linux to reap orphaned children (Hmbown#421) Shell-spawned children survive the TUI on abnormal exit (panic without unwind, SIGKILL of the parent, OOM). The existing cooperative cancel path SIGKILLs the whole process group via the cancellation token, but that only fires when the parent gets to run its drop / cleanup code. A crashed parent leaves children orphaned to init. * New `install_parent_death_signal` helper called on every shell Command setup. On Linux it adds a `pre_exec` hook that runs `prctl(PR_SET_PDEATHSIG, SIGTERM)` immediately after fork — the kernel then sends SIGTERM to the child the moment our process exits, even on SIGKILL of the TUI itself. * All three Command spawn sites in `tools/shell.rs` (one-shot, wait, interactive) get the same hook. * Documented the macOS / Windows gap: those platforms have no kernel equivalent. The cooperative path still handles normal shutdown; abnormal exit there is tracked as a watchdog follow-up per the issue's acceptance criteria. The pre_exec body is `unsafe`-marked because it runs in the post-fork async-signal-safe window. The closure only calls `libc::prctl` with stack-allocated constants; no heap, no locks. Errno is surfaced via `std::io::Error::last_os_error` but the spawn is not aborted — losing the safety net is strictly less bad than failing the user's command. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(subagent): interleave Chinese whale names with English in nickname pool Sub-agent UI labels rotate through `WHALE_NICKNAMES`. The list was English-only — every spawn produced "Blue", "Humpback", etc. Adding Simplified-Chinese names (蓝鲸, 座头鲸, 抹香鲸, …) interleaved with the English ones doubles the pool size and gives a roughly even mix on each new spawn, with the same wraparound behavior at index >= 48. Goal is friendly variety, not strict locale matching — a CN-locale user still gets some English names and vice versa. Pure cosmetic; no behavioral or persistence-format change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * style: cargo fmt for seatbelt cargo home block * memory: polish help and docs (Hmbown#569) - add /memory help and clearer invalid-subcommand guidance - register /memory in shared slash-command help - align memory docs with current behavior and config - add focused tests for help and discovery * feat(onboarding): language picker step before API key (Hmbown#566) First-run users hit Welcome → API key → Trust → Tips with no obvious way to discover that a Chinese / Japanese / Portuguese UI exists. Issue Hmbown#566 surfaced this from a Chinese user. The TUI already has full translations for `en`, `ja`, `zh-Hans`, `pt-BR` (plus `auto` detection from `LC_ALL` / `LANG`); the only gap was discoverability. * New `OnboardingState::Language` variant inserted between Welcome and ApiKey. `Welcome → Language → ApiKey/Trust/Tips` is the new flow; `Esc` from Language returns to Welcome. * New `tui/onboarding/language.rs` panel renders the picker with hotkeys 1-5 for `auto` / `en` / `ja` / `zh-Hans` / `pt-BR`. Each row shows the native name (日本語, 简体中文, …) plus an English label so the user doesn't have to read the target language to pick it. The currently persisted setting is highlighted with a filled bullet. * Selecting a hotkey calls the new `App::set_locale_from_onboarding` which writes through `Settings::set("locale", …)` + `Settings::save` and re-resolves `app.ui_locale` immediately so the rest of onboarding renders in the chosen language. Pressing Enter keeps the current setting (defaults to `auto`). * `onboarding_step` now reports `1/N` … `N/N` correctly with the new step inserted (Welcome=1, Language=2, ApiKey=3 if needed, …). * Doesn't expand the supported-locale set — the QA-pending list in `localization::PLANNED_QA_LOCALES` is unchanged. We only show what ships with full coverage today. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: 20bytes <133551439+20bytes@users.noreply.github.com>
Summary
Bugs scoped for the v0.8.10 patch release plus two UX improvements that surfaced during the work.
Bug fixes
~/.cargo/registry/{cache,index,src}and~/.cargo/git. Read access always; write access whenever the policy isn't read-only. Hit while shipping v0.8.9 —cargo publishfrom inside the TUI's shell tool would get sandbox-denied. Resolution honorsCARGO_HOMEenv (cargo's own override) with$HOME/.cargofallback. Skipped entirely when neither var is set so we never reference an undefined(param ...).async fn shutdownonMcpTransport(default no-op) overridden onStdioTransportto send SIGTERM and wait up to 2s for graceful exit before tokio'skill_on_dropfires SIGKILL as the backstop. NewMcpPool::shutdown_allwalks every connection. Wired into the engine'sOp::Shutdownpath so graceful exit is the default. A Drop fallback onStdioTransportstill SIGTERMs even on abnormal exit paths.PR_SET_PDEATHSIG(SIGTERM)on Linux viapre_exec. The kernel sends SIGTERM the moment the parent (TUI) exits, even on SIGKILL of the parent process — closing the leak window the cooperative-cancellation path can't cover. macOS / Windows don't have a kernel equivalent;kill_on_drop+ the existing process_group SIGKILL on cancellation still cover normal shutdown there. Watchdog work for those platforms tracked as a follow-up.UX
Blue/蓝鲸/Humpback/座头鲸/ …). Pure cosmetic for the sub-agent labeling pool; doubles the size and gives a roughly even mix on each new spawn. Same wraparound behavior at index ≥ 48.OnboardingState::Languagestep inserted between Welcome and ApiKey. Users see hotkeys 1-5 forauto/en/ja/zh-Hans/pt-BRwith each row showing the native name (日本語, 简体中文, …) plus an English label so the target language is reachable without already speaking it. Selection persists immediately to~/.deepseek/settings.tomlviaSettings::set("locale", …)and re-resolvesapp.ui_localeso the rest of onboarding renders in the chosen language. Esc returns to Welcome; Enter keeps the current setting (defaults toauto).Closed (already fixed in v0.8.8 — verification pass)
crates/tui/src/tui/ui.rs:177(configurable timeout, 500ms default, logs warning on timeout). Closed with verification.crates/tui/src/tui/widgets/footer.rs:300(every status-line color reads fromapp.ui_theme.*). Closed with verification.Test plan
cargo fmt --all -- --checkcleancargo clippy --workspace --all-targets --all-features --locked -- -D warningscleancargo test --workspace --all-features --locked— 2018 main tests + crate suites green; new tests added:test_cargo_home_paths_emitted_in_policy_and_params_when_home_set—(param "CARGO_HOME")policy/params stay in lockstep with HOME settest_cargo_home_skipped_when_no_env— both omitted when neither HOME nor CARGO_HOME is set (sandbox-exec would refuse to load otherwise)stdio_transport_shutdown_terminates_child— spawns a realcat, callsshutdown, asserts return within grace window and that pid is reaped (kill(pid, 0)returns ESRCH)ENV_LOCKmutex serializes env-mutating tests, mirroring the pattern landed inmain.rsat d06eaed🤖 Generated with Claude Code