Skip to content

[Feature]: delegate_task_stream with mid-flight interrupt for synchronous delegation #9556

@easyvibecoding

Description

@easyvibecoding

Problem

Today delegate_task is a synchronous tool call that returns only the final message from the child. The parent orchestrator has no visibility into what the child is doing mid-flight, and no way to interrupt a clearly-off-track subagent until the child ends or a hard timeout fires.

This shows up in two failure modes:

  1. Child walked off-road — parent asked for "fix the import in src/X.py", child spent 90s refactoring unrelated modules. Parent can only find out after paying the full wall-time and token cost.

  2. Child stuck on rate-limited upstream — during MiniMax peak hour (15:00-17:30 Asia/Shanghai), 529 retries can stall a single claude -p subprocess for 30-90s per call. Parent has no signal the child is throttled vs. genuinely working.

Proposal

Two-part addition, minimal enough to not disturb the existing blocking contract:

1. delegate_task_stream (new)

for event in delegate_task_stream(goal=..., profile=...):
    # event.kind in ("token", "tool_call", "tool_result", "reasoning", "done")
    if event.kind == "token":
        ...
    if event.kind == "tool_call" and looks_wrong(event.payload):
        cancel()
        break

Streams the child's token output, tool calls, and tool results to the parent as they happen. Parent can interrupt by closing the generator.

2. Interrupt contract

On cancellation:

  • Child's in-flight tool call is allowed to finish (file writes shouldn't be half-applied).
  • Child's subprocess receives a structured "stop" message — graceful shutdown, not SIGKILL.
  • Child's partial work + any tool-result side effects are returned to the parent as a final event so the orchestrator can decide what to keep.

Why it matters

  • Wasted-work recovery — a parent that can see the first 200 tokens of reasoning can often tell if delegation is about to fail, and reroute before paying full cost. In the delegation benchmark this was the single clearest signal that could have saved the 234s task10 trial.
  • Rate-limit awareness — streaming naturally exposes "child is retrying on 529" vs. "child is generating tokens," which today look identical from the parent's side.
  • Composability — an async/streaming primitive is what the async_delegation work (feat: non-blocking background agent delegation (async_delegation toolset) #5586) ends up needing for status polling. Building the streaming primitive once covers both.

Non-goals / scope

Evidence

  • During internal peak-hour testing of a skill that delegates through the Anthropic-compat shim, a stuck-but-retrying claude -p call held the orchestrator for ~130s with zero visibility; the parent eventually timed out and lost the partial work. Streaming + cancel would have let the parent bail at 20s and reroute.
  • The delegation benchmark (30 runs) shows strategy B (uniform delegation) losing wall-time mostly because it can't unchoose a bad delegation mid-flight; a streaming + interrupt primitive is what would let a hybrid strategy recover.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havetool/delegateSubagent delegationtype/featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions