Skip to content

[Fix] Reject slow node event sends#84387

Merged
frankekn merged 3 commits into
openclaw:mainfrom
samzong:fix/node-fanout-backpressure
May 21, 2026
Merged

[Fix] Reject slow node event sends#84387
frankekn merged 3 commits into
openclaw:mainfrom
samzong:fix/node-fanout-backpressure

Conversation

@samzong

@samzong samzong commented May 20, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Problem: node outbound fanout accepted event writes even when the node WebSocket send buffer was already beyond the Gateway slow-consumer threshold.
  • Solution: gate node event sends on MAX_BUFFERED_BYTES, close saturated node sockets with 1008 slow consumer, and return false without writing another frame.
  • Changed surface: NodeRegistry normal/raw event sends plus a regression test for raw node fanout.
  • Scope boundary: no node.invoke pending-limit changes, no subscription/runtime fanout rewrite, and no protocol or chat delta changes.

Motivation / Context

Slow node sockets should not be allowed to accumulate unbounded outbound event frames during fanout. The operator broadcast path already applies the Gateway slow-consumer threshold; node event sends now use the same bounded behavior at the send boundary.

Change Type

  • Bug fix
  • Feature
  • Refactor
  • Documentation
  • Tests only
  • Chore / CI / build
  • Security

Scope

  • CLI
  • Gateway / orchestration
  • Agent loop / tools
  • Channels / messaging integrations
  • Plugin runtime / SDK
  • UI
  • Docs
  • Tests / infra

Linked Issue / PR

N/A - no linked issue.

  • This PR fixes a bug or regression
  • This PR is follow-up / cleanup
  • This PR is exploratory and should not land as-is

Real behavior proof

Behavior addressed: a real node WebSocket whose server-side send buffer is already over the Gateway threshold is closed as a slow consumer, and the rejected node event is not written.

Real environment tested: local OpenClaw checkout on macOS, using an actual ws WebSocketServer plus WebSocket client, the Gateway createGatewayNodeSessionRuntime, and the patched NodeRegistry. The client TCP read side was paused with _socket.pause() so the server socket accumulated real WebSocket backpressure instead of using a fake bufferedAmount.

Exact steps or command run after this patch: node --input-type=module --import tsx - with a one-off proof script that starts a local WebSocket server/client pair, pauses the client TCP reader, registers the server socket as a node through createGatewayNodeSessionRuntime, subscribes that node to agent:main:main, sends subscribed chat frames until serverSocket.bufferedAmount > MAX_BUFFERED_BYTES, then calls NodeRegistry.sendEventRaw(...) with a final chat payload.

Evidence after fix:

{
  "transport": "real ws WebSocketServer + WebSocket client; client TCP read paused",
  "acceptedFramesBeforeThreshold": 13,
  "maxBufferedBytes": 52428800,
  "bufferedAmountBeforeRejectedSend": 54527815,
  "rejectedSendReturned": false,
  "serverSocketReadyStateAfterRejectedSend": 2,
  "sendCallsDeltaOnRejectedSend": 0,
  "bufferedAmountDeltaOnRejectedSend": 17,
  "closeCalledWith": {
    "code": 1008,
    "reason": "slow consumer"
  },
  "clientMessagesObservedBeforeResume": 0,
  "clientSawRejectedFrameBeforeResume": false
}

Observed result after fix: the socket crossed the 50 MiB threshold at 54527815 bytes, the rejected sendEventRaw() returned false, no additional WebSocket send call happened for that rejected event, and the actual server socket was moved to CLOSING with close(1008, "slow consumer").

What was not tested: a full managed Gateway process with a real paired node app on a remote slow network. This proof uses real local WebSocket transport and the Gateway node-session runtime path, but not the production pairing/client process.

Before evidence: the pre-fix regression returned true and wrote one frame with bufferedAmount = Number.MAX_SAFE_INTEGER; the added regression failed before the implementation with expected true to be false.

Root Cause

The node event send path in NodeRegistry did not consult WebSocket bufferedAmount before writing frames, while the operator broadcast path already rejected slow consumers. Subscription fanout preserved one-time payload serialization, but the final node socket send boundary had no slow-consumer guardrail.

Missing detection: there was no node-registry regression covering a saturated node socket.

Regression Test Plan

  • Unit test added or updated
  • Integration / E2E test added or updated
  • Existing test coverage is sufficient
  • Not testable in automation

Target: src/gateway/node-registry.test.ts

Scenario covered: a fake node socket with bufferedAmount = MAX_BUFFERED_BYTES + 1 rejects a raw event send, avoids socket.send, and closes the socket with 1008 slow consumer.

Existing guardrail: the same file still verifies the raw event envelope shape for successful sends.

User-visible / Behavior Changes

Slow node connections whose outbound WebSocket buffer exceeds the Gateway threshold are closed instead of accepting more node event frames.

Diagram / Data Flow

Before:

Gateway fanout -> saturated node socket -> additional frame queued

After:

Gateway fanout -> NodeRegistry buffer check -> close slow node socket -> no new frame

Security Impact

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

N/A - all answers are No.

Repro + Verification

Environment: local macOS checkout, Node through the repo wrapper for tests, plus a real local ws server/client pair for transport proof.

Steps:

  1. Run the real WebSocket proof script described in the Real behavior proof section.
  2. Run node scripts/run-vitest.mjs src/gateway/node-registry.test.ts --reporter=verbose.
  3. Run node scripts/run-vitest.mjs src/gateway/gateway-misc.test.ts --testNamePattern "runtime forwards subscribed node payload json|routes events" --reporter=verbose.
  4. Run git diff --check.

Expected: the real saturated node socket rejects the final event without a send call and closes with 1008 slow consumer; focused regression tests pass.

Actual: expected slow-consumer behavior observed in the real WebSocket proof output above; focused test runs passed.

Evidence

  • Failing test / log before fix
  • Passing test / log after fix
  • Screenshot / recording
  • CI run
  • Manual QA notes

Additional checks run:

  • node scripts/run-vitest.mjs src/gateway/node-registry.test.ts --reporter=verbose (Test Files 3 passed (3), Tests 51 passed (51))
  • node scripts/run-vitest.mjs src/gateway/gateway-misc.test.ts --testNamePattern "runtime forwards subscribed node payload json|routes events" --reporter=verbose
  • git diff --check
  • codex review --uncommitted
  • pre-ship review after commit: no must-fix issues

Human Verification

Verified scenarios:

  • Real local WebSocket backpressure with paused client TCP reads and server-side bufferedAmount > MAX_BUFFERED_BYTES.
  • Rejected raw node event returned false, made zero additional send calls, and closed the real server socket with 1008 slow consumer.
  • Red/green regression for saturated raw node event sends.
  • Full src/gateway/node-registry.test.ts focused suite.
  • Subscription fanout target behavior in src/gateway/gateway-misc.test.ts.
  • codex-review clean pass.
  • Pre-ship review found no must-fix issues.

Edge cases considered:

  • Invalid raw payload still returns false before the slow-buffer check matters.
  • Missing node still returns false.
  • Normal raw event envelope behavior is unchanged.
  • Saturated node socket is closed instead of accepting another event frame.

Not verified:

  • Full managed Gateway with a real paired remote node app under OS/network backpressure.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

N/A - no existing review conversations were addressed while preparing this PR.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

N/A - no migration is required.

Risks / Mitigations

Risk: slow nodes are disconnected sooner when their outbound WebSocket buffer remains over the Gateway threshold.

Mitigation: this uses the existing Gateway slow-consumer threshold and close policy, and the regression test locks the no-send behavior.

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime size: XS proof: supplied External PR includes structured after-fix real behavior proof. labels May 20, 2026
@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR adds a MAX_BUFFERED_BYTES guard to NodeRegistry event sends, closes saturated node WebSockets with 1008, emits a large-payload diagnostic, and adds a raw-send regression test.

Reproducibility: yes. Current main clearly sends NodeRegistry events without a bufferedAmount guard, and the PR body supplies a real local WebSocket backpressure proof plus a red/green raw-send regression scenario.

PR rating
Overall: 🦞 diamond lobster
Proof: 🦞 diamond lobster
Patch quality: 🦞 diamond lobster
Summary: Strong real transport proof, focused implementation, and no blocking findings make this above-average, pending maintainer acceptance of the availability tradeoff.

What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Sufficient (live_output): The PR body includes copied live output from a real local ws server/client backpressure setup showing the saturated node socket closed with 1008 and no rejected frame send.

Risk before merge

  • Merging intentionally changes saturated node sockets from accepting another queued event frame to closing with 1008 once bufferedAmount exceeds MAX_BUFFERED_BYTES.
  • Because node.invoke.request uses the same NodeRegistry send boundary, saturated node invokes will now fail fast with UNAVAILABLE and close the socket instead of enqueueing and later timing out.
  • The supplied proof exercises real local ws transport and the gateway node-session runtime, but not a full managed Gateway with a remote slow node app under real network backpressure.

Maintainer options:

  1. Accept NodeRegistry threshold parity (recommended)
    Land if maintainers agree saturated node sends should use the same MAX_BUFFERED_BYTES close policy already used by gateway broadcasts.
  2. Ask for managed-node proof
    Request a redacted managed Gateway plus paired node run if local ws transport proof is not enough for this availability path.
  3. Pause for node backpressure policy
    Pause this PR if maintainers want node event sends, node.invoke sends, drops, and disconnects decided as one explicit backpressure policy.

Next step before merge
No automated repair is indicated; maintainers should decide whether to accept the slow-node disconnect semantics and whether the supplied local ws proof is enough before merge.

Security
Cleared: The diff only adds bounded WebSocket send checks and a focused regression test; it does not change dependencies, CI, secrets, permissions, package execution, or credential handling.

Review details

Best possible solution:

Land the NodeRegistry guard if maintainers accept applying the existing Gateway slow-consumer close policy to node event and invoke sends, with normal CI/focused checks green.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main clearly sends NodeRegistry events without a bufferedAmount guard, and the PR body supplies a real local WebSocket backpressure proof plus a red/green raw-send regression scenario.

Is this the best way to solve the issue?

Yes. Reusing the existing Gateway MAX_BUFFERED_BYTES policy at the NodeRegistry send boundary is the narrow maintainable fix; the remaining question is maintainer acceptance of the disconnect semantics for slow nodes.

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes copied live output from a real local ws server/client backpressure setup showing the saturated node socket closed with 1008 and no rejected frame send.

Label justifications:

  • P2: This is a focused gateway bug fix with limited blast radius, not an emergency or broad feature program.
  • merge-risk: 🚨 availability: The PR deliberately closes slow node WebSockets and can make saturated node event or invoke workflows fail fast instead of continuing to queue frames.
  • rating: 🦞 diamond lobster: Current PR rating is 🦞 diamond lobster because proof is 🦞 diamond lobster, patch quality is 🦞 diamond lobster, and Strong real transport proof, focused implementation, and no blocking findings make this above-average, pending maintainer acceptance of the availability tradeoff.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body includes copied live output from a real local ws server/client backpressure setup showing the saturated node socket closed with 1008 and no rejected frame send.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes copied live output from a real local ws server/client backpressure setup showing the saturated node socket closed with 1008 and no rejected frame send.

What I checked:

  • Current main lacks the node send-buffer guard: On current main, NodeRegistry.sendEventInternal and sendEventRawInternal write directly to the node socket and only return false if send throws or the raw payload marker is invalid; there is no bufferedAmount check before writing. (src/gateway/node-registry.ts:659, ec7495c993a6)
  • Existing gateway broadcast policy already closes slow consumers: The operator broadcast path already compares socket.bufferedAmount to MAX_BUFFERED_BYTES, logs a gateway.ws.outbound_buffer diagnostic, and closes non-droppable slow sockets with code 1008 and reason slow consumer. (src/gateway/server-broadcast.ts:152, ec7495c993a6)
  • Shared threshold contract: MAX_BUFFERED_BYTES is the existing per-connection WebSocket send-buffer limit at 50 MiB, so the PR reuses an existing gateway constant rather than inventing a separate node threshold. (src/gateway/server-constants.ts:4, ec7495c993a6)
  • PR guard implementation: Commit 664e05e adds rejectSlowNodeSocket before both normal and raw NodeRegistry event sends and adds the raw saturated-socket regression test; commit 3d5401b adds the diagnostic emission asserted by the test. (src/gateway/node-registry.ts:659, 664e05e50d76)
  • Availability surface includes invoke sends: NodeRegistry.invoke sends node.invoke.request through sendEventToSession, and the PR inserts the same slow-socket rejection in that private send boundary, so saturated invokes will fail fast and close the node instead of enqueueing or timing out. (src/gateway/node-registry.ts:422, ec7495c993a6)
  • Real behavior proof supplied: The PR body includes copied live output from a real ws WebSocketServer plus WebSocket client with the client TCP read side paused; the final raw send returned false, made zero additional send calls, and closed with 1008 slow consumer after bufferedAmount exceeded the 50 MiB threshold. (3d5401b765df)

Likely related people:

  • steipete: Peter Steinberger authored the recent current-main NodeRegistry and broadcast lines in blame, and his history includes gateway node identity, WebSocket, and broadcast/runtime work across the central files. (role: feature-history owner and recent area contributor; confidence: high; commits: e57fa51412cc, f16c176a4cc1, 586176730cfc; files: src/gateway/node-registry.ts, src/gateway/server-broadcast.ts, src/gateway/server-node-session-runtime.ts)
  • vincentkoc: Vincent Koc recently changed the gateway broadcast seam in the same slow-consumer broadcast path that this PR mirrors for nodes. (role: recent adjacent contributor; confidence: medium; commits: 7308e72fac98; files: src/gateway/server-broadcast.ts)
  • gumadeiras: Gustavo Madeira Santana recently split gateway startup/runtime seams, including server-node-session-runtime, which is the runtime wrapper around NodeRegistry raw node fanout. (role: adjacent gateway runtime contributor; confidence: medium; commits: 8de63ca26825; files: src/gateway/server-node-session-runtime.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against ec7495c993a6.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. labels May 20, 2026
@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 💎 rare Gilded Review Wisp

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 💎 rare.
Trait: watches the merge queue.
Image traits: location flaky test forest; accessory review stamp; palette cobalt, lime, and pearl; mood determined; pose nestled inside a glowing shell; shell soft velvet shell; lighting moonlit rim light; background delicate sparkle particles.
Share on X: post this hatch
Copy: My PR egg hatched a 💎 rare Gilded Review Wisp in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 20, 2026
@frankekn frankekn self-assigned this May 21, 2026
@openclaw-barnacle openclaw-barnacle Bot added size: S and removed size: XS proof: sufficient ClawSweeper judged the real behavior proof convincing. labels May 21, 2026
@frankekn

Copy link
Copy Markdown
Contributor

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 21, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 21, 2026
@frankekn frankekn force-pushed the fix/node-fanout-backpressure branch from e922eb1 to b459f9e Compare May 21, 2026 08:21
@frankekn frankekn merged commit 88fe39b into openclaw:main May 21, 2026
97 checks passed
@frankekn

Copy link
Copy Markdown
Contributor

Merged via squash.

Thanks @samzong!

eleboucher pushed a commit to eleboucher/homelab that referenced this pull request May 21, 2026
…026.5.20) (#615)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [ghcr.io/openclaw/openclaw](https://openclaw.ai) ([source](https://github.com/openclaw/openclaw)) | patch | `2026.5.19` → `2026.5.20` |

---

> ⚠️ **Warning**
>
> Some dependencies could not be looked up. Check the [Dependency Dashboard](issues/567) for more information.

---

### Release Notes

<details>
<summary>openclaw/openclaw (ghcr.io/openclaw/openclaw)</summary>

### [`v2026.5.20`](https://github.com/openclaw/openclaw/blob/HEAD/CHANGELOG.md#2026520)

[Compare Source](openclaw/openclaw@v2026.5.19...v2026.5.20)

##### Changes

- Exec approvals: remove the old `cat SKILL.md && printf ... && <skill-wrapper>` allowlist compatibility path so skill files must be loaded with the read tool and only the real skill executable is auto-allowed.
- Discord: let voice sessions follow configured Discord users into voice channels, with allowed-channel checks, multi-user handoff, bounded reconciliation, and DAVE recovery preservation. ([#&#8203;84264](openclaw/openclaw#84264)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).
- Discord/voice: include bounded `IDENTITY.md`, `USER.md`, and `SOUL.md` profile context in realtime voice session instructions by default, with `voice.realtime.bootstrapContextFiles: []` available to disable it. ([#&#8203;84499](openclaw/openclaw#84499)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).
- Dependencies: bump the bundled Codex harness to `@openai/codex` `0.132.0` and refresh the app-server model-list docs for the new catalog.
- CLI/policy: add the bundled Policy plugin for policy-backed channel conformance checks, doctor lint findings, and opt-in workspace repair. ([#&#8203;80407](openclaw/openclaw#80407)) Thanks [@&#8203;giodl73-repo](https://github.com/giodl73-repo).
- Agents/config: allow `agents.list[].experimental.localModelLean` so lean local-model mode can be enabled for one configured agent instead of globally.
- Providers/xAI: add device-code OAuth login so remote and headless setups can authorize xAI without a localhost browser callback. ([#&#8203;84005](openclaw/openclaw#84005)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).
- Providers/OpenRouter: honor provider-level `params.provider` routing policy for OpenRouter requests, with model and agent params overriding the defaults. Thanks [@&#8203;amknight](https://github.com/amknight).

##### Fixes

- CLI/tasks: include stale-running task maintenance decisions in `openclaw tasks maintenance --json` so retained and reconcile candidates explain backing-session, cron, CLI, and wedged-subagent state. ([#&#8203;84691](openclaw/openclaw#84691)) Thanks [@&#8203;efpiva](https://github.com/efpiva).
- Codex app-server: keep system-prompt reports working when bootstrap hooks provide workspace files with only a path and content, so hook-supplied SOUL/IDENTITY/TOOLS/USER context still reports injected characters correctly. ([#&#8203;84736](openclaw/openclaw#84736)) Thanks [@&#8203;JARVIS-Glasses](https://github.com/JARVIS-Glasses).
- Providers/MiniMax music: stop advertising `durationSeconds` control and remove prompt-injected duration hints, so `music_generate` reports MiniMax duration as an unsupported override instead of suggesting MiniMax can enforce track length. Fixes [#&#8203;84508](openclaw/openclaw#84508). Thanks [@&#8203;neeravmakwana](https://github.com/neeravmakwana).
- Doctor: warn when sandbox tool policy hides configured MCP server tools before provider requests. ([#&#8203;84699](openclaw/openclaw#84699)) Thanks [@&#8203;nxmxbbd](https://github.com/nxmxbbd).
- WhatsApp: update Baileys to `7.0.0-rc12`.
- Build: suppress per-locale `rolldown-plugin-dts:fake-js` CommonJS dts warnings emitted while bundling the intentionally-inlined `zod/v4/locales/*.d.cts` files, so `pnpm build` output stays readable after the 0.25.1 plugin bump. Thanks [@&#8203;romneyda](https://github.com/romneyda).
- CLI/nodes: route lazy plugin-registration logs to stderr for JSON-mode `openclaw nodes` commands so stdout stays parseable. ([#&#8203;84684](openclaw/openclaw#84684)) Thanks [@&#8203;TurboTheTurtle](https://github.com/TurboTheTurtle).
- Approvals: route manual `/approve` decisions through the trusted approval runtime so active exec and plugin approvals no longer look unknown or expired.
- Mac app: update the About settings copyright year to 2026. ([#&#8203;84385](openclaw/openclaw#84385)) Thanks [@&#8203;pejmanjohn](https://github.com/pejmanjohn).
- Dependencies: update `@openclaw/fs-safe` to `0.2.7` so OpenClaw's default Python-helper-off policy keeps best-effort Node write fallbacks for private stores, secret writes, run logs, and media attachments on Linux/macOS.
- Infra/secrets: restore the fail-closed contract for `tryReadSecretFileSync` so credential loaders that pass `rejectSymlink: true` (Telegram, LINE, Zalo, IRC, Nextcloud Talk tokens) refuse symlinked credential files instead of silently accepting them, and the infra-state CI shard's secret-file symlink test passes again. Thanks [@&#8203;romneyda](https://github.com/romneyda).
- Browser: honor the configured image sanitization limit for screenshots and labeled snapshots so browser-captured images follow the same resize policy as other image results. ([#&#8203;84595](openclaw/openclaw#84595))
- Doctor: remove unrecognized `models.providers.*.models[*].compat.thinkingFormat` values during `doctor --fix` so stale provider model config can validate after upgrade. Fixes [#&#8203;77803](openclaw/openclaw#77803).
- Doctor: warn when `openclaw.json` stores plaintext secret-bearing config fields, including model provider API keys and sensitive provider headers. ([#&#8203;84718](openclaw/openclaw#84718)) Thanks [@&#8203;lukaIvanic](https://github.com/lukaIvanic).
- Status: show the configured default, session-selected model, reason, clear hint, and docs link when a session remains pinned to a model that differs from `agents.defaults.model.primary`.
- WebChat: clear stale typing indicators when session change events mark the active chat run complete.
- Mac app: keep local packaging signed with a stable app identity for permission testing and fix Control UI production builds under current Vite/Highlight.js exports.
- macOS app: update the embedded Peekaboo bridge to 3.2.1 so OpenClaw-hosted UI automation works with current Peekaboo CLI capture flows.
- Cron: deliver preferred final assistant output for successful scheduled runs when trailing plain tool warnings remain in diagnostics instead of marking the run failed.
- fix(mattermost): fail closed on missing channel type \[AI]. ([#&#8203;84091](openclaw/openclaw#84091)) Thanks [@&#8203;pgondhi987](https://github.com/pgondhi987).
- Recheck rebuilt system.run argv \[AI]. ([#&#8203;84090](openclaw/openclaw#84090)) Thanks [@&#8203;pgondhi987](https://github.com/pgondhi987).
- CLI: keep the private QA subcommand out of exported command descriptors unless `OPENCLAW_ENABLE_PRIVATE_QA_CLI=1`, so root help and subcommand markers match runtime registration. ([#&#8203;84519](openclaw/openclaw#84519))
- CLI/cron: bound `openclaw cron show` job lookup pagination so non-advancing or unbounded `cron.list` responses fail instead of hanging the command. Fixes [#&#8203;83856](openclaw/openclaw#83856). ([#&#8203;83989](openclaw/openclaw#83989))
- Agents/messages: stop message-tool-only turns after a successful source-channel `message` send while keeping transcript mirrors under the session write lock. ([#&#8203;84289](openclaw/openclaw#84289))
- Agents: filter silent heartbeat response-tool transcript artifacts out of embedded context snapshots so later user turns are not polluted by heartbeat no-op messages. ([#&#8203;83477](openclaw/openclaw#83477)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).
- Agents/OpenAI: log repeated strict tool-schema downgrade diagnostics once per provider/model/tool signature, reducing duplicate debug noise while preserving `strict=false` fallback behavior. Fixes [#&#8203;82930](openclaw/openclaw#82930). ([#&#8203;82933](openclaw/openclaw#82933)) Thanks [@&#8203;galiniliev](https://github.com/galiniliev).
- Agents/code mode: spell out the `exec` tool's JavaScript/TypeScript, no Node module, and catalog-bridge constraints in model-visible schema text so agents can use enabled tools without trial-and-error. ([#&#8203;84269](openclaw/openclaw#84269)) Thanks [@&#8203;Kaspre](https://github.com/Kaspre).
- Codex: give `image_generate` dynamic-tool calls a 120s default watchdog when no per-call or configured image timeout is set, so image generation no longer falls back to the generic 30s bridge timeout. ([#&#8203;84254](openclaw/openclaw#84254)) Thanks [@&#8203;moritzmmayerhofer](https://github.com/moritzmmayerhofer).
- Codex: avoid duplicate dynamic tool terminal diagnostics while large diagnostic backlogs drain without blocking tool responses. ([#&#8203;82937](openclaw/openclaw#82937)) Thanks [@&#8203;galiniliev](https://github.com/galiniliev).
- CLI/message: include a stable top-level `messageId` in `openclaw message --json` output when channel sends return one. ([#&#8203;84191](openclaw/openclaw#84191)) Thanks [@&#8203;100menotu001](https://github.com/100menotu001).
- Cron: preserve legacy top-level array `jobs.json` stores when loading or adding scheduled jobs so old cron jobs are no longer treated as an empty store during upgrade. Fixes [#&#8203;60799](openclaw/openclaw#60799). ([#&#8203;84433](openclaw/openclaw#84433)) Thanks [@&#8203;IWhatsskill](https://github.com/IWhatsskill).
- Gateway/agents: use an agent's `identity.name` in Gateway agent summaries when `agents.list[].name` is unset, so configured agent labels remain visible in clients. ([#&#8203;84355](openclaw/openclaw#84355); refs [#&#8203;57835](openclaw/openclaw#57835)) Thanks [@&#8203;luoyanglang](https://github.com/luoyanglang).
- Channels/replies: keep normal `/verbose` failed-tool progress compact in message-tool replies and prevent late text-only tool output from appearing after the final answer. ([#&#8203;84303](openclaw/openclaw#84303)) Thanks [@&#8203;VACInc](https://github.com/VACInc).
- Plugins/hooks: apply a default 30-second timeout to `before_compaction` and `after_compaction` hooks so a hung plugin handler no longer blocks compaction completion. ([#&#8203;84153](openclaw/openclaw#84153))
- Discord: preserve disabled presentation buttons when adapting and rendering Discord message controls. ([#&#8203;84188](openclaw/openclaw#84188)) Thanks [@&#8203;100menotu001](https://github.com/100menotu001).
- Twitch: add a test-only client-manager registry reset helper so non-isolated Twitch tests can clear cached managers between cases. Fixes [#&#8203;83887](openclaw/openclaw#83887). ([#&#8203;84244](openclaw/openclaw#84244)) Thanks [@&#8203;hclsys](https://github.com/hclsys).
- Cron: run main-session scheduled work on a cron-owned wake lane while preserving reply delivery context, so background cron turns no longer block human main-session chat. Fixes [#&#8203;82766](openclaw/openclaw#82766). ([#&#8203;82767](openclaw/openclaw#82767)) Thanks [@&#8203;galiniliev](https://github.com/galiniliev).
- Cron: use structured embedded-run denial metadata for isolated scheduled tasks so blocked exec requests fail the job without treating ordinary assistant prose as a denial. ([#&#8203;84067](openclaw/openclaw#84067)) Thanks [@&#8203;abnershang](https://github.com/abnershang).
- Cron: keep recovered tool warnings diagnostic for successful scheduled runs so final cron output is delivered instead of being replaced by a post-processing warning. ([#&#8203;84045](openclaw/openclaw#84045)) Thanks [@&#8203;abnershang](https://github.com/abnershang).
- Plugins/perf: thread explicit plugin discovery results through `loadBundledCapabilityRuntimeRegistry`, `resolveBundledPluginSources`, and `listChannelCatalogEntries` so callers that already hold a discovery result skip redundant filesystem walks. Thanks [@&#8203;SebTardif](https://github.com/SebTardif).
- harden update restart script creation \[AI]. ([#&#8203;84088](openclaw/openclaw#84088)) Thanks [@&#8203;pgondhi987](https://github.com/pgondhi987).
- Docker: keep the bundled Codex plugin in official release image keep lists so the default OpenAI agent harness remains available after Docker pruning. Fixes [#&#8203;83613](openclaw/openclaw#83613). ([#&#8203;83626](openclaw/openclaw#83626)) Thanks [@&#8203;YuanHanzhong](https://github.com/YuanHanzhong).
- CLI/channels: preserve the first line of `openclaw channels logs` output when the rolling tail window starts exactly on a line boundary, mirroring the already-fixed `readLogSlice` behavior in `src/logging/log-tail.ts`.
- Control UI: treat terminal session status as authoritative over stale active-run flags so completed terminal runs stop showing abort/live UI. ([#&#8203;84057](openclaw/openclaw#84057))
- CLI: preserve embedded equals signs in inline root option values instead of truncating after the second separator. ([#&#8203;83995](openclaw/openclaw#83995)) Thanks [@&#8203;ThiagoCAltoe](https://github.com/ThiagoCAltoe).
- Matrix/config: accept `messages.queue.byChannel.matrix` queue overrides and keep queue provider schema/type keys aligned for Matrix, Google Chat, and Mattermost. Thanks [@&#8203;bdjben](https://github.com/bdjben).
- CLI: format `openclaw acp client` failures through the shared error formatter so object-shaped errors stay readable instead of printing `[object Object]`. Fixes [#&#8203;83904](openclaw/openclaw#83904). ([#&#8203;84080](openclaw/openclaw#84080))
- Providers/Ollama: default unknown-capabilities models to tool-capable so discovered native Ollama models can use tools when `/api/show` omits capabilities. ([#&#8203;84055](openclaw/openclaw#84055)) Thanks [@&#8203;dutifulbob](https://github.com/dutifulbob).
- Installer/Windows: launch `install.ps1` onboarding as an attached child process so fresh native Windows installs do not freeze visibly at `Starting setup...` or corrupt the wizard's terminal rendering.
- CLI/update: keep restart health checks working across one-version CLI/Gateway protocol skew and use the managed Gateway service Node for all follow-up commands even when the package root is unchanged, so `openclaw update` no longer silently switches the gateway to a different Node binary when multiple Node installations are present. Thanks [@&#8203;amknight](https://github.com/amknight).
- CLI/gateway: include the running Gateway version in `gateway status` JSON output, preserving existing server metadata while falling back to status RPC data for read probes. Fixes [#&#8203;56222](openclaw/openclaw#56222). Thanks [@&#8203;galiniliev](https://github.com/galiniliev).
- Memory/search: close local embedding providers when active-memory searches time out so pending local model loads and embedding contexts are aborted and released. ([#&#8203;83858](openclaw/openclaw#83858)) Thanks [@&#8203;brokemac79](https://github.com/brokemac79).
- CLI/nodes: request pending node surface approval scopes before `openclaw nodes approve` so exec-capable node approval can use admin-scoped Gateway credentials instead of failing with `missing scope: operator.admin`. ([#&#8203;84392](openclaw/openclaw#84392)) Thanks [@&#8203;joshavant](https://github.com/joshavant).
- Gateway: reject slow node event sends before outbound buffers grow unbounded and log the rejected payload diagnostic. ([#&#8203;84387](openclaw/openclaw#84387)) Thanks [@&#8203;samzong](https://github.com/samzong).
- Agents: include bounded trajectory queued-writer diagnostics in `pi-trajectory-flush` timeout warnings so flush stalls show pending writes, queued bytes, and append state. Fixes [#&#8203;82961](openclaw/openclaw#82961). ([#&#8203;82962](openclaw/openclaw#82962)) Thanks [@&#8203;galiniliev](https://github.com/galiniliev).
- Agents/subagents: recover stale completion announces by retrying unsupported transcript-wait wakes without transcript waiting and forcing a message-tool handoff when the requester run is already stale. Fixes [#&#8203;83699](openclaw/openclaw#83699). ([#&#8203;83700](openclaw/openclaw#83700)) Thanks [@&#8203;galiniliev](https://github.com/galiniliev).
- Agents/subagents: constrain wildcard subagent target allowlists to configured agents while preserving explicitly listed compatibility targets. Fixes [#&#8203;84040](openclaw/openclaw#84040). ([#&#8203;84357](openclaw/openclaw#84357)) Thanks [@&#8203;joshavant](https://github.com/joshavant).
- Providers/Anthropic: route Anthropic model refs selected with Claude CLI auth through the Claude CLI runtime so shorthand refs such as `anthropic/opus-4.7` no longer fall back to embedded Anthropic billing. Fixes [#&#8203;84222](openclaw/openclaw#84222). ([#&#8203;84374](openclaw/openclaw#84374)) Thanks [@&#8203;joshavant](https://github.com/joshavant).
- Agents: honor explicit `models.providers.<id>.timeoutSeconds` values above the default idle watchdog for cloud and self-hosted providers, so long first-token waits no longer fall back at \~120s when the provider timeout is higher. ([#&#8203;83979](openclaw/openclaw#83979)) Thanks [@&#8203;yujiawei](https://github.com/yujiawei).
- Agents/Codex: keep encrypted Responses reasoning replay provenance-bound so stale mirrored Codex transcripts drop invalid encrypted content before request assembly while preserving matching same-session replay. Fixes [#&#8203;83836](openclaw/openclaw#83836). ([#&#8203;84367](openclaw/openclaw#84367)) Thanks [@&#8203;joshavant](https://github.com/joshavant).
- Agents/subagents: skip stale embedded-run wake probes for dormant completion requesters, so late subagent completions go straight to requester-agent/direct handoff instead of producing `reason=no_active_run` queue noise. ([#&#8203;82964](openclaw/openclaw#82964)) Thanks [@&#8203;galiniliev](https://github.com/galiniliev).
- CLI: retry config snapshot reads after a transient failure so one rejected read no longer poisons later commands in the same process. ([#&#8203;83931](openclaw/openclaw#83931)) Thanks [@&#8203;honor2030](https://github.com/honor2030).
- Media: decode URL path basenames before using them as remote media fallback filenames, so files like `My%20Report.pdf` are surfaced as `My Report.pdf`. Fixes [#&#8203;84050](openclaw/openclaw#84050). ([#&#8203;84052](openclaw/openclaw#84052)) Thanks [@&#8203;jbetala7](https://github.com/jbetala7).
- WhatsApp: clarify inbound group diagnostics so observed but unregistered groups point to `channels.whatsapp.groups` without changing routing or sender authorization. ([#&#8203;83846](openclaw/openclaw#83846)) Thanks [@&#8203;neeravmakwana](https://github.com/neeravmakwana).
- WhatsApp: drain pending outbound deliveries on a 30s periodic timer in addition to the reconnect handler, so messages enqueued while the provider is already connected no longer wait for the next reconnect to send. ([#&#8203;79083](openclaw/openclaw#79083)) Thanks [@&#8203;Oviemudiaga](https://github.com/Oviemudiaga).
- CLI/TUI: include gateway plugin slash commands in TUI autocomplete, so connected sessions can suggest plugin-owned commands exposed by the running Gateway. ([#&#8203;83640](openclaw/openclaw#83640)) Thanks [@&#8203;se7en-agent](https://github.com/se7en-agent).
- Gateway/mobile: restore QR setup-code handoff of bounded operator tokens for iOS and Android onboarding while keeping admin and pairing scopes out of bootstrap. ([#&#8203;83684](openclaw/openclaw#83684)) Thanks [@&#8203;ngutman](https://github.com/ngutman).
- iOS: repair Release archive compilation for the TestFlight build. ([#&#8203;84255](openclaw/openclaw#84255)) Thanks [@&#8203;ngutman](https://github.com/ngutman).
- Agents/compaction: bound plugin-owned CLI transcript compaction with the host safety timeout so a hung context engine can no longer stall post-turn cleanup. ([#&#8203;84083](openclaw/openclaw#84083)) Thanks [@&#8203;100yenadmin](https://github.com/100yenadmin).
- Control UI/usage: truncate long context skill, tool, and file names in the usage panel while keeping the full name available on hover. ([#&#8203;42197](openclaw/openclaw#42197)) Thanks [@&#8203;Rain120](https://github.com/Rain120).
- Codex: respect explicit `models auth order set` and `config.auth.order` precedence over stale `lastGood` in `/codex account`, and show `no working credential` when every explicit-order profile is ineligible instead of marking a lower-ranked profile as active. Fixes [#&#8203;84386](openclaw/openclaw#84386). ([#&#8203;84412](openclaw/openclaw#84412)) Thanks [@&#8203;openperf](https://github.com/openperf).
- Agents: honor `messages.suppressToolErrors` for mutating tool failures so configured chat surfaces do not receive separate warning payloads. ([#&#8203;81561](openclaw/openclaw#81561)) Thanks [@&#8203;moeedahmed](https://github.com/moeedahmed).
- Agents/fallback: surface billing guidance for mixed rate-limit plus billing fallback exhaustion instead of generic failure copy. Fixes [#&#8203;79396](openclaw/openclaw#79396). ([#&#8203;79489](openclaw/openclaw#79489)) Thanks [@&#8203;aayushprsingh](https://github.com/aayushprsingh).

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these updates again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xMDEuMSIsInVwZGF0ZWRJblZlciI6IjQzLjEwMS4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL3BhdGNoIl19-->

Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/615
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
Merged via squash.

Prepared head SHA: b459f9e
Co-authored-by: samzong <13782141+samzong@users.noreply.github.com>
Co-authored-by: frankekn <4488090+frankekn@users.noreply.github.com>
Reviewed-by: @frankekn
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
Merged via squash.

Prepared head SHA: b459f9e
Co-authored-by: samzong <13782141+samzong@users.noreply.github.com>
Co-authored-by: frankekn <4488090+frankekn@users.noreply.github.com>
Reviewed-by: @frankekn
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
Merged via squash.

Prepared head SHA: b459f9e
Co-authored-by: samzong <13782141+samzong@users.noreply.github.com>
Co-authored-by: frankekn <4488090+frankekn@users.noreply.github.com>
Reviewed-by: @frankekn
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
Merged via squash.

Prepared head SHA: b459f9e
Co-authored-by: samzong <13782141+samzong@users.noreply.github.com>
Co-authored-by: frankekn <4488090+frankekn@users.noreply.github.com>
Reviewed-by: @frankekn
galiniliev pushed a commit to galiniliev/openclaw that referenced this pull request May 25, 2026
Merged via squash.

Prepared head SHA: b459f9e
Co-authored-by: samzong <13782141+samzong@users.noreply.github.com>
Co-authored-by: frankekn <4488090+frankekn@users.noreply.github.com>
Reviewed-by: @frankekn
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
Merged via squash.

Prepared head SHA: b459f9e
Co-authored-by: samzong <13782141+samzong@users.noreply.github.com>
Co-authored-by: frankekn <4488090+frankekn@users.noreply.github.com>
Reviewed-by: @frankekn
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
Merged via squash.

Prepared head SHA: b459f9e
Co-authored-by: samzong <13782141+samzong@users.noreply.github.com>
Co-authored-by: frankekn <4488090+frankekn@users.noreply.github.com>
Reviewed-by: @frankekn
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
Merged via squash.

Prepared head SHA: b459f9e
Co-authored-by: samzong <13782141+samzong@users.noreply.github.com>
Co-authored-by: frankekn <4488090+frankekn@users.noreply.github.com>
Reviewed-by: @frankekn
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
Merged via squash.

Prepared head SHA: b459f9e
Co-authored-by: samzong <13782141+samzong@users.noreply.github.com>
Co-authored-by: frankekn <4488090+frankekn@users.noreply.github.com>
Reviewed-by: @frankekn
SYU8384 pushed a commit to SYU8384/openclaw that referenced this pull request Jun 3, 2026
Merged via squash.

Prepared head SHA: b459f9e
Co-authored-by: samzong <13782141+samzong@users.noreply.github.com>
Co-authored-by: frankekn <4488090+frankekn@users.noreply.github.com>
Reviewed-by: @frankekn
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
Merged via squash.

Prepared head SHA: b459f9e
Co-authored-by: samzong <13782141+samzong@users.noreply.github.com>
Co-authored-by: frankekn <4488090+frankekn@users.noreply.github.com>
Reviewed-by: @frankekn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gateway Gateway runtime merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. P2 Normal backlog priority with limited blast radius. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. size: S status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants