Skip to content

fix(gateway): protect in-flight subagents from busy-mode interrupts#32076

Merged
daimon-nous[bot] merged 3 commits into
mainfrom
fix/30170-delegate-interrupt-guard
May 25, 2026
Merged

fix(gateway): protect in-flight subagents from busy-mode interrupts#32076
daimon-nous[bot] merged 3 commits into
mainfrom
fix/30170-delegate-interrupt-guard

Conversation

@daimon-nous

@daimon-nous daimon-nous Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

Summary

Queues conversational follow-ups instead of cascading AIAgent.interrupt() through _active_children when the parent is running delegate_task. Fixes #30170.

Salvage of PR #30183 by @xxxigm (cherry-picked onto current main, conflict resolved). @BROCCOLO1D (#30241) and @dieutx (#30819) also submitted fixes for the same issue — credited in the triage comment.

Root cause

AIAgent.interrupt() unconditionally cascades to every child in _active_children. When a user sends a follow-up message during delegation, the gateway calls running_agent.interrupt(text) → parent propagates to children → subagent work destroyed.

Changes

  • gateway/run.py: new _agent_has_active_subagents() static helper — returns True iff the agent has a non-empty _active_children list (type-defensive, MagicMock-safe, lock-optional)
  • gateway/run.py: warm path (_handle_active_session_busy_message) — demotes interruptqueue when subagents are active
  • gateway/run.py: cold PRIORITY path (_handle_message) — same guard before the running_agent.interrupt() call
  • gateway/run.py: delegation-specific ack: "⏳ Subagent working — your message is queued for when it finishes (use /stop to cancel everything)."
  • tests/gateway/test_subagent_protection_30170.py: 17 regression tests

What is NOT changed

  • /stop and /new still cascade the full interrupt chain (go through _interrupt_and_clear_session, not the busy-message handler)
  • queue and steer modes are unchanged — the guard only fires when effective_mode == "interrupt"
  • Shutdown/drain interrupt propagation is unchanged

Validation

Suite Result
test_subagent_protection_30170.py 17/17 ✅
test_busy_session_ack.py 16/16 ✅
test_busy_session_auth_bypass.py 4/4 ✅

Commits

  1. 5eea4b014@xxxigm: gateway-level subagent protection (cherry-picked from PR fix(#30170): protect in-flight subagents from busy-mode interrupts #30183)
  2. f9804080f@xxxigm: regression tests (cherry-picked from PR fix(#30170): protect in-flight subagents from busy-mode interrupts #30183)

xxxigm added 2 commits May 25, 2026 12:47
…30170)

When a user sends a conversational follow-up while delegate_task is
running, gateway/run.py calls running_agent.interrupt(event.text) on
the PARENT agent. AIAgent.interrupt() then cascades synchronously
through self._active_children and calls interrupt() on every child
subagent, aborting in-flight delegate_task work. The user sees the
fallback cascade with no root-cause in the gateway log, and minutes of
subagent progress are destroyed — the exact failure mode reported in

Add GatewayRunner._agent_has_active_subagents(running_agent) — a
static helper that returns True iff the parent is currently driving
subagents via delegate_task. The helper is type-defensive: it ignores
truthy MagicMock auto-attributes (so this doesn't accidentally fire
in every test mock that hits the busy path), the _AGENT_PENDING_SENTINEL
placeholder, and missing locks.

Wire the helper into both interrupt branches:

  1. _handle_active_session_busy_message — the adapter-level busy
     handler. When busy_input_mode == 'interrupt' AND the parent has
     active subagents, demote to 'queue' semantics: skip the
     parent.interrupt() call, merge the message into the pending
     queue, and surface a dedicated ack ("⏳ Subagent working — your
     message is queued for when it finishes (use /stop to cancel
     everything).") so the operator knows the message wasn't lost and
     discovers the explicit escape hatch.

  2. The PRIORITY interrupt branch inside _handle_message — the
     non-command fast path. Same rationale, same demotion. Routes
     through _queue_or_replace_pending_event so the next-turn pickup
     stays unchanged.

Explicit /stop and /new commands take a completely different path
(_interrupt_and_clear_session in the slash-command dispatch at line
~6771) and are NOT affected by this guard — the operator still has a
way to force-cancel everything when they actually mean it. Configured
'queue' and 'steer' modes are also untouched: 'queue' already does the
right thing, and 'steer' goes through running_agent.steer() which does
NOT cascade to children (so subagents survive a steer too).

This is Phase 1 of the fix outlined in #30170 — the minimum viable
change that stops subagent loss. Phase 2 (delegation-aware steer
forwarding to active children) and Phase 3 (async delegation, #11508)
are intentionally out of scope.

Refs #30170.
17 new tests in tests/gateway/test_subagent_protection_30170.py pin
down both the detection helper and the demotion behaviour:

  * TestAgentHasActiveSubagents — 11 cases covering the precision and
    defensiveness of _agent_has_active_subagents:
      - returns False for None, _AGENT_PENDING_SENTINEL, and stub
        agents that lack the _active_children attribute;
      - returns False for an empty list (the steady state of an idle
        AIAgent);
      - returns True for one or many children;
      - works when _active_children_lock is None (test stubs);
      - rejects truthy MagicMock auto-attributes — this is the
        regression-guard for "every MagicMock-based gateway test
        suddenly demotes to queue mode" (which is how this was
        originally found);
      - accepts list/tuple/set as the children container.

  * TestBusyHandlerDemotesInterruptForSubagents — 6 cases driving
    _handle_active_session_busy_message directly:
      - parent.interrupt is NOT called when subagents are active,
        message is still merged into the pending queue;
      - ack copy mentions "Subagent working", "queued", and the
        /stop escape hatch — and does NOT mention "Interrupting";
      - with no subagents, behaviour is byte-identical to the
        pre-#30170 interrupt path (parent.interrupt called with the
        user text, ack says "Interrupting");
      - configured queue mode keeps its vanilla "Queued for the next
        turn" ack (the #30170 demotion-specific copy must NOT fire);
      - configured steer mode still routes to running_agent.steer()
        even when subagents are active (the guard is interrupt-only);
      - _AGENT_PENDING_SENTINEL does not trigger demotion.

Refs #30170.
@github-actions

github-actions Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: fix/30170-delegate-interrupt-guard vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9329 on HEAD, 9325 on base (🆕 +4)

🆕 New issues (4):

Rule Count
unresolved-attribute 2
invalid-argument-type 1
unresolved-import 1
First entries
tests/gateway/test_subagent_protection_30170.py:41: [unresolved-attribute] unresolved-attribute: Unresolved attribute `constants` on type `ModuleType`
tests/gateway/test_subagent_protection_30170.py:46: [unresolved-attribute] unresolved-attribute: Unresolved attribute `ChatType` on type `ModuleType`
tests/gateway/test_subagent_protection_30170.py:65: [invalid-argument-type] invalid-argument-type: Argument is incorrect: Expected `Platform`, found `MagicMock`
tests/gateway/test_subagent_protection_30170.py:35: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`

✅ Fixed issues: none

Unchanged: 4934 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/gateway Gateway runner, session dispatch, delivery tool/delegate Subagent delegation labels May 25, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Salvage of #30183 by @xxxigm (cherry-picked onto current main). Competing PRs #30241 (closed) and #30819 (open) by @BROCCOLO1D and @dieutx respectively address the same issue #30170. This PR is the canonical merge target.

The cherry-pick comment referenced 'line ~6771' for the /stop handler,
but on current main the handler is at a different offset. Remove the
hard-coded line number — the 'above' reference is sufficient.
@daimon-nous daimon-nous Bot merged commit b62af47 into main May 25, 2026
26 checks passed
@daimon-nous daimon-nous Bot deleted the fix/30170-delegate-interrupt-guard branch May 25, 2026 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P1 High — major feature broken, no workaround tool/delegate Subagent delegation type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Sending a message while delegate_task is running kills the subagent — interrupt propagates unconditionally to children

2 participants