fix(gateway): protect in-flight subagents from busy-mode interrupts#32076
Merged
Conversation
…30170) When a user sends a conversational follow-up while delegate_task is running, gateway/run.py calls running_agent.interrupt(event.text) on the PARENT agent. AIAgent.interrupt() then cascades synchronously through self._active_children and calls interrupt() on every child subagent, aborting in-flight delegate_task work. The user sees the fallback cascade with no root-cause in the gateway log, and minutes of subagent progress are destroyed — the exact failure mode reported in Add GatewayRunner._agent_has_active_subagents(running_agent) — a static helper that returns True iff the parent is currently driving subagents via delegate_task. The helper is type-defensive: it ignores truthy MagicMock auto-attributes (so this doesn't accidentally fire in every test mock that hits the busy path), the _AGENT_PENDING_SENTINEL placeholder, and missing locks. Wire the helper into both interrupt branches: 1. _handle_active_session_busy_message — the adapter-level busy handler. When busy_input_mode == 'interrupt' AND the parent has active subagents, demote to 'queue' semantics: skip the parent.interrupt() call, merge the message into the pending queue, and surface a dedicated ack ("⏳ Subagent working — your message is queued for when it finishes (use /stop to cancel everything).") so the operator knows the message wasn't lost and discovers the explicit escape hatch. 2. The PRIORITY interrupt branch inside _handle_message — the non-command fast path. Same rationale, same demotion. Routes through _queue_or_replace_pending_event so the next-turn pickup stays unchanged. Explicit /stop and /new commands take a completely different path (_interrupt_and_clear_session in the slash-command dispatch at line ~6771) and are NOT affected by this guard — the operator still has a way to force-cancel everything when they actually mean it. Configured 'queue' and 'steer' modes are also untouched: 'queue' already does the right thing, and 'steer' goes through running_agent.steer() which does NOT cascade to children (so subagents survive a steer too). This is Phase 1 of the fix outlined in #30170 — the minimum viable change that stops subagent loss. Phase 2 (delegation-aware steer forwarding to active children) and Phase 3 (async delegation, #11508) are intentionally out of scope. Refs #30170.
17 new tests in tests/gateway/test_subagent_protection_30170.py pin
down both the detection helper and the demotion behaviour:
* TestAgentHasActiveSubagents — 11 cases covering the precision and
defensiveness of _agent_has_active_subagents:
- returns False for None, _AGENT_PENDING_SENTINEL, and stub
agents that lack the _active_children attribute;
- returns False for an empty list (the steady state of an idle
AIAgent);
- returns True for one or many children;
- works when _active_children_lock is None (test stubs);
- rejects truthy MagicMock auto-attributes — this is the
regression-guard for "every MagicMock-based gateway test
suddenly demotes to queue mode" (which is how this was
originally found);
- accepts list/tuple/set as the children container.
* TestBusyHandlerDemotesInterruptForSubagents — 6 cases driving
_handle_active_session_busy_message directly:
- parent.interrupt is NOT called when subagents are active,
message is still merged into the pending queue;
- ack copy mentions "Subagent working", "queued", and the
/stop escape hatch — and does NOT mention "Interrupting";
- with no subagents, behaviour is byte-identical to the
pre-#30170 interrupt path (parent.interrupt called with the
user text, ack says "Interrupting");
- configured queue mode keeps its vanilla "Queued for the next
turn" ack (the #30170 demotion-specific copy must NOT fire);
- configured steer mode still routes to running_agent.steer()
even when subagents are active (the guard is interrupt-only);
- _AGENT_PENDING_SENTINEL does not trigger demotion.
Refs #30170.
Contributor
🔎 Lint report:
|
| Rule | Count |
|---|---|
unresolved-attribute |
2 |
invalid-argument-type |
1 |
unresolved-import |
1 |
First entries
tests/gateway/test_subagent_protection_30170.py:41: [unresolved-attribute] unresolved-attribute: Unresolved attribute `constants` on type `ModuleType`
tests/gateway/test_subagent_protection_30170.py:46: [unresolved-attribute] unresolved-attribute: Unresolved attribute `ChatType` on type `ModuleType`
tests/gateway/test_subagent_protection_30170.py:65: [invalid-argument-type] invalid-argument-type: Argument is incorrect: Expected `Platform`, found `MagicMock`
tests/gateway/test_subagent_protection_30170.py:35: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
✅ Fixed issues: none
Unchanged: 4934 pre-existing issues carried over.
Diagnostics are surfaced as warnings — this check never fails the build.
Collaborator
The cherry-pick comment referenced 'line ~6771' for the /stop handler, but on current main the handler is at a different offset. Remove the hard-coded line number — the 'above' reference is sufficient.
This was referenced May 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Queues conversational follow-ups instead of cascading
AIAgent.interrupt()through_active_childrenwhen the parent is runningdelegate_task. Fixes #30170.Salvage of PR #30183 by @xxxigm (cherry-picked onto current main, conflict resolved). @BROCCOLO1D (#30241) and @dieutx (#30819) also submitted fixes for the same issue — credited in the triage comment.
Root cause
AIAgent.interrupt()unconditionally cascades to every child in_active_children. When a user sends a follow-up message during delegation, the gateway callsrunning_agent.interrupt(text)→ parent propagates to children → subagent work destroyed.Changes
gateway/run.py: new_agent_has_active_subagents()static helper — returns True iff the agent has a non-empty_active_childrenlist (type-defensive, MagicMock-safe, lock-optional)gateway/run.py: warm path (_handle_active_session_busy_message) — demotesinterrupt→queuewhen subagents are activegateway/run.py: cold PRIORITY path (_handle_message) — same guard before therunning_agent.interrupt()callgateway/run.py: delegation-specific ack:"⏳ Subagent working — your message is queued for when it finishes (use /stop to cancel everything)."tests/gateway/test_subagent_protection_30170.py: 17 regression testsWhat is NOT changed
/stopand/newstill cascade the full interrupt chain (go through_interrupt_and_clear_session, not the busy-message handler)queueandsteermodes are unchanged — the guard only fires wheneffective_mode == "interrupt"Validation
test_subagent_protection_30170.pytest_busy_session_ack.pytest_busy_session_auth_bypass.pyCommits
5eea4b014— @xxxigm: gateway-level subagent protection (cherry-picked from PR fix(#30170): protect in-flight subagents from busy-mode interrupts #30183)f9804080f— @xxxigm: regression tests (cherry-picked from PR fix(#30170): protect in-flight subagents from busy-mode interrupts #30183)