fix: Subagent completion direct announce often fails with no visible reply by galiniliev · Pull Request #82804 · openclaw/openclaw

galiniliev · 2026-05-17T00:21:56Z

Summary

Problem: completed subagent announcements could fail with completion agent did not produce a visible reply after the requester wake path hit a stale session id (queue_message_failed reason=no_active_run).
Why it matters: the child run may have completed with usable output, but the requester can still see no visible completion update if both wake routing and the automatic direct handoff dead-end.
What changed: when the initial requester wake fails with no_active_run and the automatic completion-agent handoff returns no visible payload, OpenClaw retries the requester-agent handoff once with sourceReplyDeliveryMode: "message_tool_only" and deliver: false.
What did NOT change (scope boundary): no raw child completion output is sent to external chat; the requester agent still mediates the final visible update, grouped child-result guardrails remain mediated, and generated-media message-tool enforcement keeps its existing contract.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes [Bug]: Subagent completion direct announce fails with no visible reply #82803
Related [Bug]: Subagent completion silently lost — no retry, no notification, no auto-restart on timeout #44925
Related fix(subagents): add sendMessage fallback + callGateway fallthrough for delivery drops #79059
Related fix: require mediated delivery for subagent announce completions #80223
This PR fixes a bug or regression

Real behavior proof (required for external PRs)

Behavior addressed: Completed subagent direct announces no longer dead-end at the reported no_active_run plus no-visible-payload path. After the stale requester wake fails and the automatic direct handoff has no visible output, the runtime now retries a mediated requester-agent handoff that requires message-tool delivery instead of raw-sending child output.

Real environment tested: Windows local Codex worktree based on origin/main c2e9091, Node v24.15.0. Dependencies were partially installed after pnpm install hit an esbuild postinstall spawn EPERM; the direct Vitest entry was available and used for the focused delivery seam.

Exact steps or command run after this patch: $env:OPENCLAW_VITEST_MAX_WORKERS='1'; node node_modules\vitest\vitest.mjs run src/agents/subagent-announce-delivery.test.ts --reporter=dot

Evidence after fix: Copied terminal capture from the post-review focused delivery run:

RUN  v4.1.6 C:/OpenClaw/worktrees/bug-001-subagent-completion-direct-announce

Test Files  2 passed (2)
Tests  86 passed (86)
Start at  18:50:08
Duration  27.03s (transform 12.67s, setup 967ms, import 15.41s, tests 10.11s, environment 0ms)

The updated assertions simulate queueEmbeddedPiMessageWithOutcome returning reason: "no_active_run", then an automatic direct completion handoff with empty payloads. The runtime performs a second requester-agent handoff with deliver: false, sourceReplyDeliveryMode: "message_tool_only", and a :message-tool idempotency key; the test only marks delivery successful when the second handoff reports committed message-tool evidence. The fallback sendMessage mock is not called.

Observed result after fix: single completed subagent thread/channel cases with stale requester runs now complete through a mediated message-tool-only retry when the first direct handoff is empty. If the retry still lacks message-tool evidence, delivery remains failed and queued for the existing retry/give-up machinery instead of raw-sending child output.

What was not tested: no live gateway/provider/channel rerun was performed. The after-fix proof is local delivery-seam execution, not a private live session replay.

Before evidence: raw runtime log excerpt from the affected gateway trace that this patch addresses:

Trace/proof:
- gateway-dev.log:27070
  "Subagent completion direct announce failed for run c73d9446-0a7f-422d-a904-4f0a5e92b556: completion agent did not produce a visible reply"
  traceId=be3befc660e5cba4364d3d60bdbcc9a9 spanId=c0e47fb6cf0e786b
- Neighboring same trace:
  gateway-dev.log:27069 "queue message failed: sessionId=4d1ec534-2295-41cf-b55a-9300cc14f1f1 reason=no_active_run"

Root Cause (if applicable)

Root cause: sendSubagentAnnounceDirectly detected the empty automatic completion-agent handoff but did not distinguish the reported stale requester wake (no_active_run) from ordinary no-visible output, so the path could fail without trying the stricter message-tool-only mediated handoff.
Missing detection / guardrail: coverage previously proved only the already-existing direct-then-steer branch. It did not cover no_active_run followed by an empty automatic direct handoff.
Contributing context (if known): prior fallback scaffolding that raw-sent child output was removed by 92284bc / fix(agents): clean subagent fallback scaffolding #78700; this patch keeps the repair within the requester-agent delivery contract documented for no-output handoffs.

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file: src/agents/subagent-announce-delivery.test.ts
Scenario the test should lock in: a stale requester wake returns no_active_run, the automatic requester-agent completion handoff returns empty payloads, and the runtime retries the same mediated handoff with sourceReplyDeliveryMode: "message_tool_only" without raw-sending child completion text.
Why this is the smallest reliable guardrail: it exercises the delivery decision seam directly without requiring a live provider to intentionally produce an empty final response.
Existing test that already covers this (if any): existing no-visible-output tests covered the failure and the direct-then-steer fallback; this PR adds the stale-run message-tool retry behavior.
If no new test is added, why not: N/A

User-visible / Behavior Changes

Completed subagent announcements that previously dead-ended after a stale requester wake and empty automatic handoff can now be retried through a message-tool-only requester-agent handoff, producing a visible update when the requester agent sends through the message tool.

Diagram (if applicable)

Before:
[subagent completed] -> [wake requester: no_active_run] -> [automatic direct handoff: empty payload] -> [delivery failure]

After:
[subagent completed] -> [wake requester: no_active_run] -> [automatic direct handoff: empty payload] -> [message-tool-only requester handoff] -> [visible requester update]

Security Impact (required)

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No
If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

OS: Windows local worktree for the regression test; original gateway log OS not grounded.
Runtime/container: Node v24.15.0 for local Vitest; OpenClaw current main c2e9091 before fix.
Model/provider: NOT_ENOUGH_INFO from the original log evidence.
Integration/channel (if any): regression covers Slack-style thread/channel delivery helpers; original log channel not grounded.
Relevant config (redacted): NOT_ENOUGH_INFO

Steps

Run the focused delivery regression file.
In the single-completion cases, mock the requester wake queue attempt as reason: "no_active_run".
Mock the automatic requester-agent direct handoff as { result: { payloads: [] } }.
Verify the runtime makes a second requester-agent handoff with deliver: false and sourceReplyDeliveryMode: "message_tool_only".
Verify delivery succeeds only when that second handoff reports committed message-tool evidence, and verify the raw sendMessage fallback mock is not called.

Expected

Stale requester wake plus empty automatic direct handoff retries through a mediated message-tool-only requester-agent handoff.
Raw child output is not sent directly to external chat.
Grouped and media completion guardrails remain enforced.

Actual

Matches expected after this patch.

Evidence

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios: focused delivery regression file passed locally with 86 tests across both configured Vitest projects.
Edge cases checked: stale-run no-visible single thread/channel completions retry through message-tool-only handoff; grouped child-result fallback remains mediated and does not raw-send; generated-media message-tool enforcement remains covered by existing tests in the same file.
What you did not verify: live provider/channel behavior requiring private sessions or credentials.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No
If yes, exact upgrade steps: N/A

Risks and Mitigations

Risk: the message-tool-only retry can still fail if the requester agent does not send through the message tool.
- Mitigation: that failure remains explicit and feeds the existing retry/give-up machinery instead of bypassing the requester-agent contract with raw child output.

clawsweeper · 2026-05-17T00:22:48Z

Codex review: needs real behavior proof before merge.

Summary
Review failed before ClawSweeper could summarize the requested change.

Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path.

Real behavior proof
Not applicable: Real behavior proof was not assessed because the Codex review failed.

Next step before merge
Review did not complete, so no work-lane recommendation was made.

Review details

Best possible solution:

Retry the Codex review after fixing the execution failure.

Do we have a high-confidence way to reproduce the issue?

Unclear. The review failed before ClawSweeper could establish a reproduction path.

Is this the best way to solve the issue?

Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction.

What I checked:

failure reason: codex execution failed.
codex failure detail: Codex review failed for this PR with exit 1.
codex stdout: Per-item Codex failure; continuing with the rest of the shard.

Likely related people:

unknown: Codex failed before it could trace repository history. (role: review did not complete; confidence: low)

Remaining risk / open question:

No close action taken because the review did not complete.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 9e67f53b913a.

steipete · 2026-05-17T02:20:06Z

Closing as superseded by #82834.

Both PRs target #82803 and the same subagent announce delivery path. #82834 keeps the no-visible-reply fallback, adds the broader mediated/message-tool completion coverage, and updates docs/changelog, so that is the canonical review target.

openclaw-barnacle Bot added agents Agent runtime and tooling size: S maintainer Maintainer-authored PR labels May 17, 2026

clawsweeper Bot mentioned this pull request May 17, 2026

[Bug]: Subagent completion direct announce fails with no visible reply #82803

Closed

clawsweeper Bot added mantis: telegram-visible-proof Mantis should capture Telegram visible proof. P1 High-priority user-facing bug, regression, or broken workflow. labels May 17, 2026

github-actions Bot mentioned this pull request May 17, 2026

🦞 OpenClaw 生态日报 2026-05-17 zx0828/big_model_radar#61

Open

galiniliev changed the title ~~fix: fallback subagent completion announces~~ fix: Subagent completion direct announce often fails with no visible reply May 17, 2026

openclaw-barnacle Bot added size: M and removed size: S labels May 17, 2026

steipete mentioned this pull request May 17, 2026

fix(agents): harden subagent completion delivery #82834

Merged

galiniliev added 3 commits May 17, 2026 02:08

fix: fallback subagent completion announces

b6e6286

fix: keep subagent completions mediated

838ec13

fix: retry stale completion handoffs via message tool

97ee119

galiniliev force-pushed the bug-001-subagent-completion-direct-announce branch from 72c68a2 to 97ee119 Compare May 17, 2026 02:11

clawsweeper Bot removed the mantis: telegram-visible-proof Mantis should capture Telegram visible proof. label May 17, 2026

steipete closed this May 17, 2026

This was referenced May 18, 2026

[Bug]: Stale subagent completion direct announce still fails with no visible reply #83699

Closed

fix: recover stale subagent completion announces #83700

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Subagent completion direct announce often fails with no visible reply#82804

fix: Subagent completion direct announce often fails with no visible reply#82804
galiniliev wants to merge 3 commits into
openclaw:mainfrom
galiniliev:bug-001-subagent-completion-direct-announce

galiniliev commented May 17, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 17, 2026 •

edited

Loading

Uh oh!

steipete commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

galiniliev commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Real behavior proof (required for external PRs)

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Uh oh!

clawsweeper Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steipete commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

galiniliev commented May 17, 2026 •

edited

Loading

clawsweeper Bot commented May 17, 2026 •

edited

Loading