Skip to content

fix: reconcile subagent wait timeouts#82888

Merged
steipete merged 1 commit into
mainfrom
fix/subagent-wait-timeout-reconcile
May 17, 2026
Merged

fix: reconcile subagent wait timeouts#82888
steipete merged 1 commit into
mainfrom
fix/subagent-wait-timeout-reconcile

Conversation

@steipete

@steipete steipete commented May 17, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Refactors subagent session-store terminal reconciliation into a shared helper used by both live wait handling and registry sweeps.
  • Keeps parent subagent runs active when agent.wait returns a plain poll timeout before the child session settles, then retries/reconciles the later completion.
  • Guards session-store reconciliation with run freshness so stale terminal rows from reused/reactivated child sessions cannot complete a newer run.

Fixes #82787.

Verification

  • CodexReview: clean, no accepted/actionable findings reported.
  • pnpm test src/agents/subagent-registry.test.ts src/agents/subagent-registry.lifecycle-retry-grace.e2e.test.ts src/agents/openclaw-tools.subagents.sessions-spawn.lifecycle.test.ts -- --reporter=dot
  • git diff --check
  • pnpm check:changed
  • Testbox: provider=blacksmith-testbox id=tbx_01krt1rxpkb7vj53mkaqwfserq exit=0

Real behavior proof

Behavior addressed: parent subagent waiters no longer record a terminal timeout when agent.wait only hit its poll timeout and the child session is still running.
Real environment tested: local OpenClaw checkout plus Blacksmith Testbox changed-check environment for the touched runtime surface.
Exact steps or command run after this patch: focused subagent registry/session-spawn Vitest command, CodexReview, git diff --check, and pnpm check:changed in Testbox tbx_01krt1rxpkb7vj53mkaqwfserq.
Evidence after fix: terminal output from the rebased focused run and remote Testbox run:

[test] passed 2 Vitest shards in 10.42s
blacksmith run summary sync=delegated command=1m52.657s total=1m55.545s exit=0
provider=blacksmith-testbox leaseId=tbx_01krt1rxpkb7vj53mkaqwfserq
Import cycle check: 0 runtime value cycle(s).
Found 0 warnings and 0 errors.

Observed result after fix: the unsettled child-session timeout, terminal wait timeout, stale terminal row rejection, and terminal session-store reconciliation regressions pass; changed-check completed with exit 0 in Testbox.
What was not tested: live Discord/Telegram provider roundtrip with a real delayed subagent; covered with registry/session-spawn e2e simulation and Testbox checks instead.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: L maintainer Maintainer-authored PR labels May 17, 2026
@clawsweeper

clawsweeper Bot commented May 17, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper status: review started.

I am starting a fresh review of this pull request: fix: reconcile subagent wait timeouts This is item 1/1 in the current shard. Shard 0/1.

This placeholder means the worker is alive and reading the current context. I will edit this same comment with the actual review when the claws are done clicking.

Crustacean status: shell secured, claws on keyboard, evidence pebbles being sorted.

@steipete steipete force-pushed the fix/subagent-wait-timeout-reconcile branch from 36d9eff to e32ddd8 Compare May 17, 2026 04:07
@steipete steipete force-pushed the fix/subagent-wait-timeout-reconcile branch from e32ddd8 to 2c32315 Compare May 17, 2026 04:08
@steipete steipete added the proof: override Maintainer override for the external PR real behavior proof gate. label May 17, 2026
@steipete steipete merged commit 5d81c29 into main May 17, 2026
121 of 126 checks passed
@steipete steipete deleted the fix/subagent-wait-timeout-reconcile branch May 17, 2026 04:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling maintainer Maintainer-authored PR proof: override Maintainer override for the external PR real behavior proof gate. size: L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parent subagent wait can time out before delayed child starts, leaving requester unaware of success

1 participant