Skip to content

Gateway should enforce runTimeoutSeconds and emit terminal child.timeout event #87444

@ssdatye

Description

@ssdatye

Problem

When sessions_spawn is called with runTimeoutSeconds, the value is accepted but does not appear to be enforced as a hard release signal. Today (2026-05-27) two subagent runs in the same workspace stalled past their declared timeouts:

  • TSK-20260527-0006 tester subagent: 1800s timeout, still showed status='active (waiting on 1 child)' over an hour past the deadline.
  • TSK-20260527-0010 v1 coder subagent: 1800s timeout, no fs activity for 90 minutes, parent orchestrator never received any completion event.

The parent was parked via sessions_yield expecting a push-based completion event. None arrived. Recovery required a full gateway restart.

Expected behavior

At exactly runTimeoutSeconds after spawn, the gateway should:

  1. Mark the child run as failed (reason=timeout).
  2. Emit a synthetic completion event to the parent session so sessions_yield unparks.
  3. Free the slot in the active subagents list.

This should happen regardless of whether the underlying agent process is still alive — runTimeoutSeconds is a contract with the parent, not a hint to the child.

Repro sketch

  1. Spawn a child via sessions_spawn with runTimeoutSeconds: 60.
  2. Have the child intentionally hang (e.g. infinite sleep loop on a tool call) or simulate a model-side drop.
  3. Parent calls sessions_yield.
  4. Observe: parent remains parked indefinitely; subagents action=list still reports the child as active well past 60s.

Workaround

Today the only recovery is gateway restart. A subagents action='kill' would be the manual escape hatch (filed separately).

Impact

Any multi-lane orchestration is fragile under transient failures. Parent agents cannot make progress decisions because the contract they were given (timeout) is not honored.

Metadata

Metadata

Assignees

Labels

P1High-priority user-facing bug, regression, or broken workflow.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions