fix(agents): skip dormant completion wake probes by galiniliev · Pull Request #82964 · openclaw/openclaw

galiniliev · 2026-05-17T06:10:51Z

Summary

Problem: dormant requester sessions could still be probed through queueEmbeddedPiMessageWithOutcome, producing reason=no_active_run when a stored session id no longer had an active embedded or reply-run handle.
Why it matters: late subagent completion announcements should take the requester-agent/direct handoff path instead of adding stale-run queue failures before visible delivery.
What changed: subagent announce delivery now only attempts requester wake/steer when requester activity is active; dormant completion delivery skips the wake probe and proceeds directly to the persisted handoff.
What did NOT change (scope boundary): active requester steering, active wake failure fallback, delivery routing, and embedded-run queue semantics are unchanged.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes [Bug]: Dormant completion delivery probes stale embedded run sessions #82963
Related #
This PR fixes a bug or regression

Real behavior proof (required for external PRs)

External contributors must show after-fix evidence from a real OpenClaw setup. Unit tests, mocks, lint, typechecks, snapshots, and CI are supplemental only. Screenshots are encouraged even for CLI, console, text, or log changes; terminal screenshots and copied live output count. Be mindful of private information like IP addresses, API keys, phone numbers, non-public endpoints, or other private details when providing evidence.

Behavior addressed: dormant subagent completion delivery no longer probes a stale stored requester session id before using requester-agent/direct delivery.
Real environment tested: local Windows checkout/worktree, Node v24.15.0, direct Vitest invocation after the worktree dependency install populated dependencies but failed the esbuild postinstall validation.
Exact steps or command run after this patch: node node_modules\vitest\vitest.mjs run src/agents/subagent-announce-delivery.test.ts --reporter=verbose -t "dormant completion requesters|steer fallback"; node node_modules\vitest\vitest.mjs run src/agents/subagent-announce-delivery.test.ts --reporter=dot; git diff --check.
Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output): terminal capture:

RUN  v4.1.6 [redacted worktree]

✓ |agents-core| ../../src/agents/subagent-announce-delivery.test.ts > deliverSubagentAnnouncement completion delivery > keeps direct external delivery for dormant completion requesters 159ms
✓ |agents-core| ../../src/agents/subagent-announce-delivery.test.ts > deliverSubagentAnnouncement completion delivery > uses in-process agent dispatch for dormant completion requesters 123ms
✓ |agents-core| ../../src/agents/subagent-announce-delivery.test.ts > deliverSubagentAnnouncement completion delivery > uses steer fallback when a completion handoff has no visible output 124ms
✓ |agents-support| ../../src/agents/subagent-announce-delivery.test.ts > deliverSubagentAnnouncement completion delivery > keeps direct external delivery for dormant completion requesters 161ms
✓ |agents-support| ../../src/agents/subagent-announce-delivery.test.ts > deliverSubagentAnnouncement completion delivery > uses in-process agent dispatch for dormant completion requesters 127ms
✓ |agents-support| ../../src/agents/subagent-announce-delivery.test.ts > deliverSubagentAnnouncement completion delivery > uses steer fallback when a completion handoff has no visible output 126ms

Test Files  2 passed (2)
Tests  6 passed | 86 skipped (92)
Duration  25.79s

RUN  v4.1.6 [redacted worktree]

Test Files  2 passed (2)
Tests  92 passed (92)
Duration  28.49s

git diff --check: passed with no output

Observed result after fix: the dormant completion requester test asserts the direct external handoff params and expect(queueEmbeddedPiMessageWithOutcome).not.toHaveBeenCalled(), while the active fallback test still proves steering fallback remains available for active requester sessions.
What was not tested: no live gateway/provider rerun against the original private session, because the evidence folder only included redacted local logs and no reusable credentials/session state.
Before evidence (optional but encouraged): redacted gateway log excerpt from the bug evidence:

Observed count: 192 lines matching "reason=no_active_run".
gateway-dev.log line 26944: "queue message failed: sessionId=[redacted session id] reason=no_active_run"
gateway-dev.log line 30248: "queue message failed: sessionId=[redacted session id] reason=no_active_run"

Root Cause (if applicable)

For bug fixes or regressions, explain why this happened, not just what changed. Otherwise write N/A. If the cause is unclear, write Unknown.

Root cause: subagent completion delivery treated the presence of a stored requester sessionId as enough to attempt a wake/steer, even when requester activity reported that the embedded/reply run was no longer active.
Missing detection / guardrail: dormant completion coverage verified direct handoff routing but did not assert that stale run wake was skipped.
Contributing context (if known): queueEmbeddedPiMessageWithOutcome correctly returns no_active_run when no active embedded or reply-run handle exists; the caller was using it for dormant completions that should not require an active streaming handle.

Regression Test Plan (if applicable)

For bug fixes or regressions, name the smallest reliable test coverage that should catch this. Otherwise write N/A.

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file: src/agents/subagent-announce-delivery.test.ts
Scenario the test should lock in: dormant completion requesters with stored session ids use direct handoff without calling queueEmbeddedPiMessageWithOutcome; active completion fallback can still steer.
Why this is the smallest reliable guardrail: it covers the delivery seam that decides between stale-run wake and requester-agent handoff without needing private gateway sessions.
Existing test that already covers this (if any): existing dormant delivery tests covered handoff params, but not the missing negative assertion on stale queue wake.
If no new test is added, why not: N/A

User-visible / Behavior Changes

Dormant late completion delivery no longer attempts stale embedded-run queue wake before using requester-agent/direct delivery, reducing reason=no_active_run churn for completed requester runs.

Diagram (if applicable)

Before:
[dormant completion + stored session id] -> [wake stale run] -> [no_active_run] -> [direct handoff]

After:
[dormant completion + stored session id] -> [direct handoff]

Security Impact (required)

New permissions/capabilities? (Yes/No) No
Secrets/tokens handling changed? (Yes/No) No
New/changed network calls? (Yes/No) No
Command/tool execution surface changed? (Yes/No) No
Data access scope changed? (Yes/No) No
If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

OS: Windows
Runtime/container: local Node v24.15.0 worktree; direct Vitest fallback because pnpm install repeatedly failed during esbuild postinstall validation after populating dependencies.
Model/provider: NOT_ENOUGH_INFO
Integration/channel (if any): subagent completion delivery seam; original logs did not include a public reusable channel setup.
Relevant config (redacted): NOT_ENOUGH_INFO

Steps

Run the focused completion delivery tests for dormant requesters and active steer fallback.
Run the full touched test file.
Run patch whitespace validation.

Expected

Dormant completion requesters use direct handoff and do not call embedded-run queue wake.
Active requester completion fallback can still steer.

Actual

Focused completion delivery tests passed in both project shards.
Full src/agents/subagent-announce-delivery.test.ts passed in both project shards.
git diff --check passed.

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios: dormant completion handoff does not queue stale embedded-run wake; active completion fallback still steers; touched test file passes.
Edge cases checked: active wake rejection still logs and falls back to requester-agent handoff because the warning remains inside the active-only branch.
What you did not verify: live replay of the original private gateway session and provider credentials.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

Backward compatible? (Yes/No) Yes
Config/env changes? (Yes/No) No
Migration needed? (Yes/No) No
If yes, exact upgrade steps: N/A

Risks and Mitigations

Risk: an active requester might be misclassified as inactive and skip steering.
- Mitigation: the active check already uses isEmbeddedPiRunActive, which includes embedded and reply-run activity; existing active steering coverage still passes.

clawsweeper · 2026-05-17T06:11:46Z

Codex review: needs real behavior proof before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
This PR gates subagent completion wake/steer attempts on active requester activity, updates completion-delivery tests, and adds a changelog entry.

Reproducibility: yes. at source level: current main probes the embedded-run queue when a stored requester session id exists, while the docs describe steering only for still-active requester runs. I did not establish a live reproduction for the private gateway logs.

PR rating
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🐚 platinum hermit
Summary: The patch itself is small and plausible, but missing real behavior proof keeps the external PR below merge-ready quality.

Rank-up moves:

Add redacted after-fix real-runtime proof showing dormant completion delivery skips the stale wake and still reaches requester-agent/direct handoff.

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

PR egg
🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?

The egg game starts only after the PR passes the real-behavior proof check.
Before that, no creature, rarity, or ASCII portrait is rolled. The treat waits for real proof.
This is still just collectible flavor: proof affects review readiness, not creature quality.

Real behavior proof
Needs real behavior proof before merge: The PR body contains copied Vitest/seam-test output and redacted before logs, but no after-fix real gateway/provider proof; the contributor should add redacted live output, logs, terminal capture, recording, or a linked artifact and avoid exposing private data. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge
Why this matters: - The patch changes completion delivery branch selection based on requester activity; if a live requester is incorrectly reported inactive, an active completion may skip embedded-run steering and rely on requester-agent/direct handoff instead.

The PR still lacks after-fix real-runtime proof that the gateway/provider path stops producing stale wake attempts while preserving visible completion delivery.

Maintainer options:

Require live delivery proof (recommended)
Pause merge until the contributor adds a redacted after-fix gateway/provider log, terminal capture, or linked artifact showing a dormant completion skips wake and reaches requester-agent/direct handoff.
Accept seam-test proof
Maintainers may intentionally merge on source inspection plus the focused regression tests if the original private session cannot be replayed.
Ask an agents delivery owner
Have a recent agents delivery contributor confirm that requester activity is the right authority for wake eligibility before accepting the branch-selection risk.

Next step before merge
Needs maintainer handling for the protected label and external real-behavior proof gate; no narrow automated code repair is indicated.

Security
Cleared: The diff only changes agents delivery branching, focused tests, and changelog text; it adds no dependency, workflow, secret, permission, install, or supply-chain surface.

Review details

Best possible solution:

Land the active-only wake guard after maintainer/area-owner review and either redacted live-runtime proof or an explicit maintainer decision that the seam-test proof is enough for this private-session bug.

Do we have a high-confidence way to reproduce the issue?

Yes at source level: current main probes the embedded-run queue when a stored requester session id exists, while the docs describe steering only for still-active requester runs. I did not establish a live reproduction for the private gateway logs.

Is this the best way to solve the issue?

Yes for the code shape: gating wake and steer attempts on isActive is the narrow maintainable fix for stale dormant requesters. The remaining question is merge readiness, not a different implementation path.

Label justifications:

P2: This is a focused agents delivery bugfix with limited blast radius, not an emergency or broad product change.
merge-risk: 🚨 message-delivery: The PR changes whether completion messages attempt embedded-run steering or fall through to requester-agent/direct delivery.
merge-risk: 🚨 session-state: The new guard depends on requester session activity state, so stale or incorrect activity could change how completions attach to an active requester session.

What I checked:

Current-main stale wake path: Current main attempts completion wake whenever a requester activity session id exists; the active check only controls warning output, so a dormant stored session id still reaches the embedded-run queue helper. (src/agents/subagent-announce-delivery.ts:694, 131577a4dc6d)
Queue helper inactive-run contract: The embedded-run queue helper reports a structured failure when no embedded or reply-run handle is active for a session id, which matches the reported stale wake symptom class. (src/agents/pi-embedded-runner/runs.ts:233, 131577a4dc6d)
Documented active-only steering model: The docs say OpenClaw first tries to wake or steer the requester run only if the requester run is still active, supporting the PR's active-only guard direction. Public docs: docs/tools/subagents.md. (docs/tools/subagents.md:88, 131577a4dc6d)
PR diff scope: The PR diff requires isActive in both the primary/fallback steer path and completion direct wake path, adds a negative queue-helper assertion for dormant completion requesters, and keeps active fallback coverage by making that test activity active. (src/agents/subagent-announce-delivery.ts:457, 0108ebb2b311)
Real behavior proof remains test-only: The PR body provides copied Vitest output and redacted before logs, but explicitly says no live gateway/provider rerun was performed against the original private session. (0108ebb2b311)
Protected review state: The GitHub context shows the PR currently carries the protected maintainer label, so this cleanup review should not close it automatically.

Likely related people:

@Patrick-Erichsen: Current-main blame for the requester activity, dispatch, and embedded queue paths points to the recent inter-session provenance work that carried this delivery code forward. (role: recent area contributor; confidence: high; commits: 721ad1587ae5; files: src/agents/subagent-announce-delivery.ts, src/agents/subagent-announce-dispatch.ts, src/agents/pi-embedded-runner/runs.ts)
@steipete: History shows the subagent announce delivery helper split and earlier unified pipeline refactor, and shortlog shows the most commits in the central touched files. (role: major refactor author and heavy area contributor; confidence: high; commits: b75be0914491, 4258a3307f5a, 8477a67fafc6; files: src/agents/subagent-announce-delivery.ts, src/agents/subagent-announce-dispatch.ts, src/agents/subagent-announce-delivery.test.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 131577a4dc6d.

galiniliev · 2026-05-19T03:07:25Z

Landing proof for head 0108ebb

Behavior addressed: dormant subagent completion delivery skips stale embedded-run wake probes and uses requester-agent/direct handoff.

Local verification:

node scripts/run-vitest.mjs src/agents/subagent-announce-delivery.test.ts passed: 2 files, 102 tests.
git diff --check passed.
Configured Codex autoreview via local Copilot endpoint on HEAD: no actionable regressions found.

CI verification on exact head:

Real behavior proof: success, run 26073530156 / job 76659640855.
build-artifacts: success, run 26073531227 / job 76659670652.
check-prod-types: success, run 26073531227 / job 76659670656.
check-lint: success, run 26073531227 / job 76659670689.
Critical Quality (network-runtime-boundary): success, run 26073531171 / job 76659655720.
PR rollup: mergeable and clean; no pending or failing checks.

Known proof gap: no live replay of the original private gateway session or provider credentials; maintainer is accepting the focused seam regression plus redacted before-log evidence for this narrow delivery-branch fix.

…026.5.20) (#615) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [ghcr.io/openclaw/openclaw](https://openclaw.ai) ([source](https://github.com/openclaw/openclaw)) | patch | `2026.5.19` → `2026.5.20` | --- > ⚠️ **Warning** > > Some dependencies could not be looked up. Check the [Dependency Dashboard](issues/567) for more information. --- ### Release Notes <details> <summary>openclaw/openclaw (ghcr.io/openclaw/openclaw)</summary> ### [`v2026.5.20`](https://github.com/openclaw/openclaw/blob/HEAD/CHANGELOG.md#2026520) [Compare Source](openclaw/openclaw@v2026.5.19...v2026.5.20) ##### Changes - Exec approvals: remove the old `cat SKILL.md && printf ... && <skill-wrapper>` allowlist compatibility path so skill files must be loaded with the read tool and only the real skill executable is auto-allowed. - Discord: let voice sessions follow configured Discord users into voice channels, with allowed-channel checks, multi-user handoff, bounded reconciliation, and DAVE recovery preservation. ([#84264](openclaw/openclaw#84264)) Thanks [@fuller-stack-dev](https://github.com/fuller-stack-dev). - Discord/voice: include bounded `IDENTITY.md`, `USER.md`, and `SOUL.md` profile context in realtime voice session instructions by default, with `voice.realtime.bootstrapContextFiles: []` available to disable it. ([#84499](openclaw/openclaw#84499)) Thanks [@fuller-stack-dev](https://github.com/fuller-stack-dev). - Dependencies: bump the bundled Codex harness to `@openai/codex` `0.132.0` and refresh the app-server model-list docs for the new catalog. - CLI/policy: add the bundled Policy plugin for policy-backed channel conformance checks, doctor lint findings, and opt-in workspace repair. ([#80407](openclaw/openclaw#80407)) Thanks [@giodl73-repo](https://github.com/giodl73-repo). - Agents/config: allow `agents.list[].experimental.localModelLean` so lean local-model mode can be enabled for one configured agent instead of globally. - Providers/xAI: add device-code OAuth login so remote and headless setups can authorize xAI without a localhost browser callback. ([#84005](openclaw/openclaw#84005)) Thanks [@fuller-stack-dev](https://github.com/fuller-stack-dev). - Providers/OpenRouter: honor provider-level `params.provider` routing policy for OpenRouter requests, with model and agent params overriding the defaults. Thanks [@amknight](https://github.com/amknight). ##### Fixes - CLI/tasks: include stale-running task maintenance decisions in `openclaw tasks maintenance --json` so retained and reconcile candidates explain backing-session, cron, CLI, and wedged-subagent state. ([#84691](openclaw/openclaw#84691)) Thanks [@efpiva](https://github.com/efpiva). - Codex app-server: keep system-prompt reports working when bootstrap hooks provide workspace files with only a path and content, so hook-supplied SOUL/IDENTITY/TOOLS/USER context still reports injected characters correctly. ([#84736](openclaw/openclaw#84736)) Thanks [@JARVIS-Glasses](https://github.com/JARVIS-Glasses). - Providers/MiniMax music: stop advertising `durationSeconds` control and remove prompt-injected duration hints, so `music_generate` reports MiniMax duration as an unsupported override instead of suggesting MiniMax can enforce track length. Fixes [#84508](openclaw/openclaw#84508). Thanks [@neeravmakwana](https://github.com/neeravmakwana). - Doctor: warn when sandbox tool policy hides configured MCP server tools before provider requests. ([#84699](openclaw/openclaw#84699)) Thanks [@nxmxbbd](https://github.com/nxmxbbd). - WhatsApp: update Baileys to `7.0.0-rc12`. - Build: suppress per-locale `rolldown-plugin-dts:fake-js` CommonJS dts warnings emitted while bundling the intentionally-inlined `zod/v4/locales/*.d.cts` files, so `pnpm build` output stays readable after the 0.25.1 plugin bump. Thanks [@romneyda](https://github.com/romneyda). - CLI/nodes: route lazy plugin-registration logs to stderr for JSON-mode `openclaw nodes` commands so stdout stays parseable. ([#84684](openclaw/openclaw#84684)) Thanks [@TurboTheTurtle](https://github.com/TurboTheTurtle). - Approvals: route manual `/approve` decisions through the trusted approval runtime so active exec and plugin approvals no longer look unknown or expired. - Mac app: update the About settings copyright year to 2026. ([#84385](openclaw/openclaw#84385)) Thanks [@pejmanjohn](https://github.com/pejmanjohn). - Dependencies: update `@openclaw/fs-safe` to `0.2.7` so OpenClaw's default Python-helper-off policy keeps best-effort Node write fallbacks for private stores, secret writes, run logs, and media attachments on Linux/macOS. - Infra/secrets: restore the fail-closed contract for `tryReadSecretFileSync` so credential loaders that pass `rejectSymlink: true` (Telegram, LINE, Zalo, IRC, Nextcloud Talk tokens) refuse symlinked credential files instead of silently accepting them, and the infra-state CI shard's secret-file symlink test passes again. Thanks [@romneyda](https://github.com/romneyda). - Browser: honor the configured image sanitization limit for screenshots and labeled snapshots so browser-captured images follow the same resize policy as other image results. ([#84595](openclaw/openclaw#84595)) - Doctor: remove unrecognized `models.providers.*.models[*].compat.thinkingFormat` values during `doctor --fix` so stale provider model config can validate after upgrade. Fixes [#77803](openclaw/openclaw#77803). - Doctor: warn when `openclaw.json` stores plaintext secret-bearing config fields, including model provider API keys and sensitive provider headers. ([#84718](openclaw/openclaw#84718)) Thanks [@lukaIvanic](https://github.com/lukaIvanic). - Status: show the configured default, session-selected model, reason, clear hint, and docs link when a session remains pinned to a model that differs from `agents.defaults.model.primary`. - WebChat: clear stale typing indicators when session change events mark the active chat run complete. - Mac app: keep local packaging signed with a stable app identity for permission testing and fix Control UI production builds under current Vite/Highlight.js exports. - macOS app: update the embedded Peekaboo bridge to 3.2.1 so OpenClaw-hosted UI automation works with current Peekaboo CLI capture flows. - Cron: deliver preferred final assistant output for successful scheduled runs when trailing plain tool warnings remain in diagnostics instead of marking the run failed. - fix(mattermost): fail closed on missing channel type \[AI]. ([#84091](openclaw/openclaw#84091)) Thanks [@pgondhi987](https://github.com/pgondhi987). - Recheck rebuilt system.run argv \[AI]. ([#84090](openclaw/openclaw#84090)) Thanks [@pgondhi987](https://github.com/pgondhi987). - CLI: keep the private QA subcommand out of exported command descriptors unless `OPENCLAW_ENABLE_PRIVATE_QA_CLI=1`, so root help and subcommand markers match runtime registration. ([#84519](openclaw/openclaw#84519)) - CLI/cron: bound `openclaw cron show` job lookup pagination so non-advancing or unbounded `cron.list` responses fail instead of hanging the command. Fixes [#83856](openclaw/openclaw#83856). ([#83989](openclaw/openclaw#83989)) - Agents/messages: stop message-tool-only turns after a successful source-channel `message` send while keeping transcript mirrors under the session write lock. ([#84289](openclaw/openclaw#84289)) - Agents: filter silent heartbeat response-tool transcript artifacts out of embedded context snapshots so later user turns are not polluted by heartbeat no-op messages. ([#83477](openclaw/openclaw#83477)) Thanks [@fuller-stack-dev](https://github.com/fuller-stack-dev). - Agents/OpenAI: log repeated strict tool-schema downgrade diagnostics once per provider/model/tool signature, reducing duplicate debug noise while preserving `strict=false` fallback behavior. Fixes [#82930](openclaw/openclaw#82930). ([#82933](openclaw/openclaw#82933)) Thanks [@galiniliev](https://github.com/galiniliev). - Agents/code mode: spell out the `exec` tool's JavaScript/TypeScript, no Node module, and catalog-bridge constraints in model-visible schema text so agents can use enabled tools without trial-and-error. ([#84269](openclaw/openclaw#84269)) Thanks [@Kaspre](https://github.com/Kaspre). - Codex: give `image_generate` dynamic-tool calls a 120s default watchdog when no per-call or configured image timeout is set, so image generation no longer falls back to the generic 30s bridge timeout. ([#84254](openclaw/openclaw#84254)) Thanks [@moritzmmayerhofer](https://github.com/moritzmmayerhofer). - Codex: avoid duplicate dynamic tool terminal diagnostics while large diagnostic backlogs drain without blocking tool responses. ([#82937](openclaw/openclaw#82937)) Thanks [@galiniliev](https://github.com/galiniliev). - CLI/message: include a stable top-level `messageId` in `openclaw message --json` output when channel sends return one. ([#84191](openclaw/openclaw#84191)) Thanks [@100menotu001](https://github.com/100menotu001). - Cron: preserve legacy top-level array `jobs.json` stores when loading or adding scheduled jobs so old cron jobs are no longer treated as an empty store during upgrade. Fixes [#60799](openclaw/openclaw#60799). ([#84433](openclaw/openclaw#84433)) Thanks [@IWhatsskill](https://github.com/IWhatsskill). - Gateway/agents: use an agent's `identity.name` in Gateway agent summaries when `agents.list[].name` is unset, so configured agent labels remain visible in clients. ([#84355](openclaw/openclaw#84355); refs [#57835](openclaw/openclaw#57835)) Thanks [@luoyanglang](https://github.com/luoyanglang). - Channels/replies: keep normal `/verbose` failed-tool progress compact in message-tool replies and prevent late text-only tool output from appearing after the final answer. ([#84303](openclaw/openclaw#84303)) Thanks [@VACInc](https://github.com/VACInc). - Plugins/hooks: apply a default 30-second timeout to `before_compaction` and `after_compaction` hooks so a hung plugin handler no longer blocks compaction completion. ([#84153](openclaw/openclaw#84153)) - Discord: preserve disabled presentation buttons when adapting and rendering Discord message controls. ([#84188](openclaw/openclaw#84188)) Thanks [@100menotu001](https://github.com/100menotu001). - Twitch: add a test-only client-manager registry reset helper so non-isolated Twitch tests can clear cached managers between cases. Fixes [#83887](openclaw/openclaw#83887). ([#84244](openclaw/openclaw#84244)) Thanks [@hclsys](https://github.com/hclsys). - Cron: run main-session scheduled work on a cron-owned wake lane while preserving reply delivery context, so background cron turns no longer block human main-session chat. Fixes [#82766](openclaw/openclaw#82766). ([#82767](openclaw/openclaw#82767)) Thanks [@galiniliev](https://github.com/galiniliev). - Cron: use structured embedded-run denial metadata for isolated scheduled tasks so blocked exec requests fail the job without treating ordinary assistant prose as a denial. ([#84067](openclaw/openclaw#84067)) Thanks [@abnershang](https://github.com/abnershang). - Cron: keep recovered tool warnings diagnostic for successful scheduled runs so final cron output is delivered instead of being replaced by a post-processing warning. ([#84045](openclaw/openclaw#84045)) Thanks [@abnershang](https://github.com/abnershang). - Plugins/perf: thread explicit plugin discovery results through `loadBundledCapabilityRuntimeRegistry`, `resolveBundledPluginSources`, and `listChannelCatalogEntries` so callers that already hold a discovery result skip redundant filesystem walks. Thanks [@SebTardif](https://github.com/SebTardif). - harden update restart script creation \[AI]. ([#84088](openclaw/openclaw#84088)) Thanks [@pgondhi987](https://github.com/pgondhi987). - Docker: keep the bundled Codex plugin in official release image keep lists so the default OpenAI agent harness remains available after Docker pruning. Fixes [#83613](openclaw/openclaw#83613). ([#83626](openclaw/openclaw#83626)) Thanks [@YuanHanzhong](https://github.com/YuanHanzhong). - CLI/channels: preserve the first line of `openclaw channels logs` output when the rolling tail window starts exactly on a line boundary, mirroring the already-fixed `readLogSlice` behavior in `src/logging/log-tail.ts`. - Control UI: treat terminal session status as authoritative over stale active-run flags so completed terminal runs stop showing abort/live UI. ([#84057](openclaw/openclaw#84057)) - CLI: preserve embedded equals signs in inline root option values instead of truncating after the second separator. ([#83995](openclaw/openclaw#83995)) Thanks [@ThiagoCAltoe](https://github.com/ThiagoCAltoe). - Matrix/config: accept `messages.queue.byChannel.matrix` queue overrides and keep queue provider schema/type keys aligned for Matrix, Google Chat, and Mattermost. Thanks [@bdjben](https://github.com/bdjben). - CLI: format `openclaw acp client` failures through the shared error formatter so object-shaped errors stay readable instead of printing `[object Object]`. Fixes [#83904](openclaw/openclaw#83904). ([#84080](openclaw/openclaw#84080)) - Providers/Ollama: default unknown-capabilities models to tool-capable so discovered native Ollama models can use tools when `/api/show` omits capabilities. ([#84055](openclaw/openclaw#84055)) Thanks [@dutifulbob](https://github.com/dutifulbob). - Installer/Windows: launch `install.ps1` onboarding as an attached child process so fresh native Windows installs do not freeze visibly at `Starting setup...` or corrupt the wizard's terminal rendering. - CLI/update: keep restart health checks working across one-version CLI/Gateway protocol skew and use the managed Gateway service Node for all follow-up commands even when the package root is unchanged, so `openclaw update` no longer silently switches the gateway to a different Node binary when multiple Node installations are present. Thanks [@amknight](https://github.com/amknight). - CLI/gateway: include the running Gateway version in `gateway status` JSON output, preserving existing server metadata while falling back to status RPC data for read probes. Fixes [#56222](openclaw/openclaw#56222). Thanks [@galiniliev](https://github.com/galiniliev). - Memory/search: close local embedding providers when active-memory searches time out so pending local model loads and embedding contexts are aborted and released. ([#83858](openclaw/openclaw#83858)) Thanks [@brokemac79](https://github.com/brokemac79). - CLI/nodes: request pending node surface approval scopes before `openclaw nodes approve` so exec-capable node approval can use admin-scoped Gateway credentials instead of failing with `missing scope: operator.admin`. ([#84392](openclaw/openclaw#84392)) Thanks [@joshavant](https://github.com/joshavant). - Gateway: reject slow node event sends before outbound buffers grow unbounded and log the rejected payload diagnostic. ([#84387](openclaw/openclaw#84387)) Thanks [@samzong](https://github.com/samzong). - Agents: include bounded trajectory queued-writer diagnostics in `pi-trajectory-flush` timeout warnings so flush stalls show pending writes, queued bytes, and append state. Fixes [#82961](openclaw/openclaw#82961). ([#82962](openclaw/openclaw#82962)) Thanks [@galiniliev](https://github.com/galiniliev). - Agents/subagents: recover stale completion announces by retrying unsupported transcript-wait wakes without transcript waiting and forcing a message-tool handoff when the requester run is already stale. Fixes [#83699](openclaw/openclaw#83699). ([#83700](openclaw/openclaw#83700)) Thanks [@galiniliev](https://github.com/galiniliev). - Agents/subagents: constrain wildcard subagent target allowlists to configured agents while preserving explicitly listed compatibility targets. Fixes [#84040](openclaw/openclaw#84040). ([#84357](openclaw/openclaw#84357)) Thanks [@joshavant](https://github.com/joshavant). - Providers/Anthropic: route Anthropic model refs selected with Claude CLI auth through the Claude CLI runtime so shorthand refs such as `anthropic/opus-4.7` no longer fall back to embedded Anthropic billing. Fixes [#84222](openclaw/openclaw#84222). ([#84374](openclaw/openclaw#84374)) Thanks [@joshavant](https://github.com/joshavant). - Agents: honor explicit `models.providers.<id>.timeoutSeconds` values above the default idle watchdog for cloud and self-hosted providers, so long first-token waits no longer fall back at \~120s when the provider timeout is higher. ([#83979](openclaw/openclaw#83979)) Thanks [@yujiawei](https://github.com/yujiawei). - Agents/Codex: keep encrypted Responses reasoning replay provenance-bound so stale mirrored Codex transcripts drop invalid encrypted content before request assembly while preserving matching same-session replay. Fixes [#83836](openclaw/openclaw#83836). ([#84367](openclaw/openclaw#84367)) Thanks [@joshavant](https://github.com/joshavant). - Agents/subagents: skip stale embedded-run wake probes for dormant completion requesters, so late subagent completions go straight to requester-agent/direct handoff instead of producing `reason=no_active_run` queue noise. ([#82964](openclaw/openclaw#82964)) Thanks [@galiniliev](https://github.com/galiniliev). - CLI: retry config snapshot reads after a transient failure so one rejected read no longer poisons later commands in the same process. ([#83931](openclaw/openclaw#83931)) Thanks [@honor2030](https://github.com/honor2030). - Media: decode URL path basenames before using them as remote media fallback filenames, so files like `My%20Report.pdf` are surfaced as `My Report.pdf`. Fixes [#84050](openclaw/openclaw#84050). ([#84052](openclaw/openclaw#84052)) Thanks [@jbetala7](https://github.com/jbetala7). - WhatsApp: clarify inbound group diagnostics so observed but unregistered groups point to `channels.whatsapp.groups` without changing routing or sender authorization. ([#83846](openclaw/openclaw#83846)) Thanks [@neeravmakwana](https://github.com/neeravmakwana). - WhatsApp: drain pending outbound deliveries on a 30s periodic timer in addition to the reconnect handler, so messages enqueued while the provider is already connected no longer wait for the next reconnect to send. ([#79083](openclaw/openclaw#79083)) Thanks [@Oviemudiaga](https://github.com/Oviemudiaga). - CLI/TUI: include gateway plugin slash commands in TUI autocomplete, so connected sessions can suggest plugin-owned commands exposed by the running Gateway. ([#83640](openclaw/openclaw#83640)) Thanks [@se7en-agent](https://github.com/se7en-agent). - Gateway/mobile: restore QR setup-code handoff of bounded operator tokens for iOS and Android onboarding while keeping admin and pairing scopes out of bootstrap. ([#83684](openclaw/openclaw#83684)) Thanks [@ngutman](https://github.com/ngutman). - iOS: repair Release archive compilation for the TestFlight build. ([#84255](openclaw/openclaw#84255)) Thanks [@ngutman](https://github.com/ngutman). - Agents/compaction: bound plugin-owned CLI transcript compaction with the host safety timeout so a hung context engine can no longer stall post-turn cleanup. ([#84083](openclaw/openclaw#84083)) Thanks [@100yenadmin](https://github.com/100yenadmin). - Control UI/usage: truncate long context skill, tool, and file names in the usage panel while keeping the full name available on hover. ([#42197](openclaw/openclaw#42197)) Thanks [@Rain120](https://github.com/Rain120). - Codex: respect explicit `models auth order set` and `config.auth.order` precedence over stale `lastGood` in `/codex account`, and show `no working credential` when every explicit-order profile is ineligible instead of marking a lower-ranked profile as active. Fixes [#84386](openclaw/openclaw#84386). ([#84412](openclaw/openclaw#84412)) Thanks [@openperf](https://github.com/openperf). - Agents: honor `messages.suppressToolErrors` for mutating tool failures so configured chat surfaces do not receive separate warning payloads. ([#81561](openclaw/openclaw#81561)) Thanks [@moeedahmed](https://github.com/moeedahmed). - Agents/fallback: surface billing guidance for mixed rate-limit plus billing fallback exhaustion instead of generic failure copy. Fixes [#79396](openclaw/openclaw#79396). ([#79489](openclaw/openclaw#79489)) Thanks [@aayushprsingh](https://github.com/aayushprsingh). </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).  Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/615

openclaw-barnacle Bot added agents Agent runtime and tooling size: XS maintainer Maintainer-authored PR labels May 17, 2026

clawsweeper Bot added P2 Normal backlog priority with limited blast radius. impact:message-loss Channel message delivery can be lost, duplicated, or misrouted. impact:session-state Session, memory, transcript, context, or agent state can drift or corrupt. labels May 17, 2026

galiniliev assigned steipete May 18, 2026

clawsweeper Bot mentioned this pull request May 18, 2026

[Bug]: Subagent completion silently lost — no retry, no notification, no auto-restart on timeout #44925

Open

fix(agents): skip dormant completion wake probes

c7fe3fa

galiniliev force-pushed the bug-002-embedded-run-completion-delivery branch from 3b94f91 to c7fe3fa Compare May 19, 2026 02:52

docs(changelog): note dormant completion wake fix

0108ebb

galiniliev merged commit 57ec361 into openclaw:main May 19, 2026
98 checks passed

galiniliev mentioned this pull request May 19, 2026

[Bug]: Dormant completion delivery probes stale embedded run sessions #82963

Closed

BryceMurray mentioned this pull request May 24, 2026

[Bug]: Newly-created Telegram group-topic sessions wedge indefinitely on first inbound — claude-cli --resume hangs against UUID with no prior project transcript (2026.5.20 regression) #86095

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(agents): skip dormant completion wake probes#82964

fix(agents): skip dormant completion wake probes#82964
galiniliev merged 2 commits into
openclaw:mainfrom
galiniliev:bug-002-embedded-run-completion-delivery

galiniliev commented May 17, 2026

Uh oh!

clawsweeper Bot commented May 17, 2026 •

edited

Loading

Uh oh!

galiniliev commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

galiniliev commented May 17, 2026

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Real behavior proof (required for external PRs)

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Uh oh!

clawsweeper Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

galiniliev commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clawsweeper Bot commented May 17, 2026 •

edited

Loading