fix(agents): dispatch subagent spawn in process#90612
Conversation
|
Codex review: needs maintainer review before merge. Reviewed June 7, 2026, 5:50 AM ET / 09:50 UTC. Summary PR surface: Source +115, Tests +167, Docs +2. Total +284 across 7 files. Reproducibility: no. high-confidence live reproduction was established here. Source inspection shows current main still uses Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land the focused in-process dispatch fix after required CI and maintainer acceptance of the proof override, keeping the WebSocket fallback for callers outside the gateway process. Do we have a high-confidence way to reproduce the issue? No high-confidence live reproduction was established here. Source inspection shows current main still uses Is this the best way to solve the issue? Yes, this appears to be the best bounded fix for the reported self-connection failure mode: reuse the existing in-process gateway dispatch seam when a gateway context exists, while leaving broader event-loop starvation work out of scope. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against bab18d567b0c. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +115, Tests +167, Docs +2. Total +284 across 7 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
Land-ready proof for final head Maintainer fixups added here:
Local proof:
CI proof:
Known proof gap: no live saturated gateway/Feishu run. The override is for the narrow in-process dispatch/timer behavior covered by focused gateway/subagent tests plus full CI. |
* fix(agents): dispatch subagent spawn in process * docs: update subagent gateway dispatch note * fix(gateway): keep in-process dispatch timeout budget * test(gateway): avoid promise executor timer returns --------- Co-authored-by: Peter Steinberger <steipete@gmail.com>
* fix(agents): dispatch subagent spawn in process * docs: update subagent gateway dispatch note * fix(gateway): keep in-process dispatch timeout budget * test(gateway): avoid promise executor timer returns --------- Co-authored-by: Peter Steinberger <steipete@gmail.com>
Summary
Route
sessions_spawn's internal gateway RPC through in-process dispatch when the caller is already running inside the gateway process.Background
A Feishu-delivered profiling request exposed a
sessions_spawntimeout while several agent runs were active in the same gateway process. The observed failure was not a Feishu delivery problem: the message send path eventually returned a successful message id. The failing path was the subagent spawn control plane:sessions_spawncalled back into the same gateway overws://127.0.0.1:<port>, while the gateway event loop was saturated by active agent work. That self-connection pattern can delay the WebSocket handshake and the timeout timer itself, so a configured 60s timeout can surface much later.Announce delivery already avoids this class of failure by dispatching the gateway method in process. This PR applies the same ownership boundary to subagent spawn's internal gateway calls.
What changed
hasInProcessGatewayContext()so callers can detect whether in-process gateway dispatch is available.callSubagentGateway()to usedispatchGatewayMethodInProcess()when a gateway context exists, avoiding loopback WebSocket self-connections for in-processsessions_spawnwork.callGateway()WebSocket path.sessions.deleteandsessions.patchforce a synthetic admin client, while ordinaryagentcalls keep the normal write-scoped behavior.timeoutMsbounds the initial response wait, while accepted/final two-phase calls still return the initial accepted response immediately whenexpectFinalis not requested and wait for final only when it is requested.What this does not do
Verification
node scripts/run-vitest.mjs src/agents/subagent-spawn.test.ts src/gateway/server-plugins.test.tspnpm run tsgo:corepnpm run tsgo:test:srcgit diff --check.agents/skills/autoreview/scripts/autoreview --mode commit --commit HEAD --prompt "Review the final rebased sessions_spawn in-process dispatch commit. Focus on gateway scope preservation, timeout semantics, fallback behavior, and subagent cleanup."Real behavior proof
Behavior addressed:
sessions_spawnno longer self-connects to the local gateway WebSocket when it is already executing inside the gateway process.Real environment tested: Local OpenClaw source checkout on macOS, rebased onto current
upstream/main.Exact steps or command run after this patch:
node scripts/run-vitest.mjs src/agents/subagent-spawn.test.ts src/gateway/server-plugins.test.ts;pnpm run tsgo:core;pnpm run tsgo:test:src;git diff --check; final autoreview command listed above.Evidence after fix: The targeted Vitest run passed 2 shards covering
src/gateway/server-plugins.test.tsandsrc/agents/subagent-spawn.test.ts;tsgo:coreandtsgo:test:srccompleted successfully; final autoreview reported no accepted/actionable findings.Observed result after fix: In-process subagent spawn calls dispatch through
dispatchGatewayMethodInProcess; admin cleanup calls retain synthetic admin scope; initial in-process responses time out if no response is sent, but accepted two-phase responses still return before handler completion unless final output is requested.What was not tested: A live saturated-gateway repro with multiple concurrent real model streams and Feishu/Telegram delivery was not run in this PR.