fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition#85764
Conversation
|
Codex review: found issues before merge. Latest ClawSweeper review: 2026-05-23 23:17 UTC / May 23, 2026, 7:17 PM ET. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
PR Surface View PR surface stats
Summary Reproducibility: yes. at source level: current main's acquisition-time inspection ignores PR rating Rank-up moves:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. Real behavior proof Risk before merge
Maintainer options:
Next step before merge Security Review findings
Review detailsBest possible solution: Land the narrow fix after maintainers accept holder-recorded max-hold as a cross-process reclaim policy, update docs/help, and keep the unchanged-snapshot removal and regression coverage. Do we have a high-confidence way to reproduce the issue? Yes at source level: current main's acquisition-time inspection ignores Is this the best way to solve the issue? Mostly yes, if maintainers accept the existing max-hold policy as authoritative across processes. The safer merge shape is this narrow code path plus aligned docs/help and explicit maintainer acceptance of live-holder reclaim semantics. Label justifications:
Full review comments:
Overall correctness: patch is correct What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 5c4a733912a8. |
|
ClawSweeper PR egg ✨ Hatched: 🥚 common Pearl Diff Drake Hatch commandComment Hatchability rules:
Rarity: 🥚 common. What is this egg doing here?
|
be527e1 to
b8b849e
Compare
|
@clawsweeper[bot] re-review — proof updated with full evidence (before/after journal logs, PID alive confirmation, session file size 400KB, zero SessionWriteLockTimeoutError after fix, unit test added by @steipete, all 27 CI checks green). |
|
🦞👀 Command router queued. I will update this comment with the next step. |
b8b849e to
c064bcf
Compare
ee6e0b3 to
5d18976
Compare
1886a70 to
3d3ea93
Compare
…uisition - Adds optional maxHoldMs parameter to inspectLockPayload - Inspect now marks locks as stale when held longer than maxHoldMs - Passes maxHoldMs through inspectLockPayloadForSession - acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs This ensures that when a live process holds a lock for longer than maxHoldMs (default 5min), other processes can reclaim it during acquisition — matching the watchdog's existing enforcement. Previously shouldReclaim only used staleMs (30min default), meaning a lock held for 10+ minutes by a live PID would never be reclaimable, causing 60s timeout failures and gateway freezes. Closes openclaw#85762
3d3ea93 to
a7876bf
Compare
…uisition (#85764) * fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition - Adds optional maxHoldMs parameter to inspectLockPayload - Inspect now marks locks as stale when held longer than maxHoldMs - Passes maxHoldMs through inspectLockPayloadForSession - acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs This ensures that when a live process holds a lock for longer than maxHoldMs (default 5min), other processes can reclaim it during acquisition — matching the watchdog's existing enforcement. Previously shouldReclaim only used staleMs (30min default), meaning a lock held for 10+ minutes by a live PID would never be reclaimable, causing 60s timeout failures and gateway freezes. Closes #85762 * fix(session-lock): add dead-PID fast-path before retry loop Adds a fast-path check at the top of acquireSessionWriteLock: if the lock file's owner PID is dead, remove it immediately before entering the retry loop. This saves up to timeoutMs (60s) of futile waiting when the previous lock holder has died. The shouldReclaim callback already handles this case, but only iteratively through the retry loop. The fast-path eliminates that unnecessary delay. * fix(session-lock): enforce max hold during acquisition * fix(session-lock): revalidate max hold safely * fix(session-lock): honor holder max-hold policy * fix(session-lock): keep cleanup from reclaiming live holders * fix(session-lock): remove stale locks only when unchanged * fix(session-lock): skip self-held max-hold reclaim * fix(ci): refresh gateway protocol checks --------- Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>
…026.5.22) (#645) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [ghcr.io/openclaw/openclaw](https://openclaw.ai) ([source](https://github.com/openclaw/openclaw)) | patch | `2026.5.20` → `2026.5.22` | --- ### Release Notes <details> <summary>openclaw/openclaw (ghcr.io/openclaw/openclaw)</summary> ### [`v2026.5.22`](https://github.com/openclaw/openclaw/releases/tag/v2026.5.22): openclaw 2026.5.22 [Compare Source](https://github.com/openclaw/openclaw/compare/v2026.5.20...v2026.5.22) ##### 2026.5.22 ##### Changes - Gateway/perf: reuse process-stable channel catalog reads, avoid repeated bundled-channel boundary checks, and rotate gateway watch CPU profiles so benchmark runs do not accumulate unbounded artifacts. - Gateway/perf: reuse immutable plugin metadata snapshots across startup, config, model, channel, setup, and secret metadata readers so hot paths avoid repeated plugin file stats and manifest registry reloads. - Gateway/perf: lazy-load startup-idle plugin work, core gateway method handlers, and the embedded ACPX runtime so Gateway health and ready signals no longer wait on unused handler trees or ACPX probes. - Gateway/perf: cache plugin SDK public-surface alias maps and skip irrelevant macOS Linuxbrew PATH probes so Gateway startup avoids repeated filesystem walks and slow missing-directory stats. - Meeting Notes: add a source-only external meeting-notes plugin and SDK source-provider contract outside the core npm package, with auto-start capture config, manual transcript imports, read-only `openclaw meeting-notes` CLI access, and Discord voice as the first live source. - Docs/channels/config: add Signal `configPath`, Telegram wildcard topic defaults, local-time backup archive names, Termux home fallback, include-path validation, secret-scanner-safe placeholder guidance, Gemini CLI/Antigravity media guidance, and macOS VM auto-login guidance. Thanks [@​NorseGaud](https://github.com/NorseGaud), [@​yudistiraashadi](https://github.com/yudistiraashadi), [@​huangqian8](https://github.com/huangqian8), [@​VibhorGautam](https://github.com/VibhorGautam), [@​maweibin](https://github.com/maweibin), [@​tianxingleo](https://github.com/tianxingleo), [@​IgnacioPro](https://github.com/IgnacioPro), and [@​xzcxzcyy-claw](https://github.com/xzcxzcyy-claw). - Docs: clarify model-usage portability, Codex migration prerequisites, status bootstrap wording, thread-bound subagent limits, hook ownership, and config-preserving safety guidance. Thanks [@​aniruddhaadak80](https://github.com/aniruddhaadak80), [@​leno23](https://github.com/leno23), [@​TomDjerry](https://github.com/TomDjerry), [@​matthewxmurphy](https://github.com/matthewxmurphy), [@​vincentkoc](https://github.com/vincentkoc), and [@​stablegenius49](https://github.com/stablegenius49). - Docs: clarify README onboarding and Gateway startup paths, WhatsApp QR/408 recovery, cron output language prompts, skill advanced features, gateway upstream 403 troubleshooting, and plugin fallback override guidance. Thanks [@​deepujain](https://github.com/deepujain), [@​Zacxxx](https://github.com/Zacxxx), [@​Jah-yee](https://github.com/Jah-yee), [@​neyric](https://github.com/neyric), [@​usimic](https://github.com/usimic), [@​Renu-Cybe](https://github.com/Renu-Cybe), [@​BigUncle](https://github.com/BigUncle), and [@​SeashoreShi](https://github.com/SeashoreShi). - Docs: clarify context-pruning ratio bounds, local dashboard recovery, CLI env markers, remote onboarding token behavior, and Peekaboo Bridge permissions for subprocess agents. Thanks [@​ayesha-aziz123](https://github.com/ayesha-aziz123), [@​dishraters](https://github.com/dishraters), [@​hougangdev](https://github.com/hougangdev), and [@​brandonlipman](https://github.com/brandonlipman). - Docs: clarify browser CDP diagnostics, Plugin SDK allowlist imports, status-reaction timing defaults, queue steering behavior, limited-tool troubleshooting, cron HEARTBEAT handling, Telegram multi-agent groups, Bitwarden SecretRef setup, and EasyRunner deployments. Thanks [@​Quratulain-bilal](https://github.com/Quratulain-bilal), [@​mbelinky](https://github.com/mbelinky), [@​Mickey-](https://github.com/Mickey-), [@​vancece](https://github.com/vancece), [@​xenouzik](https://github.com/xenouzik), [@​posigit](https://github.com/posigit), [@​surlymochan](https://github.com/surlymochan), [@​janaka](https://github.com/janaka), and [@​choiking](https://github.com/choiking). - Crabbox/Testbox: run clean sparse-checkout Testbox syncs from a temporary full checkout and route remote changed gates through Corepack pnpm. - Docs: clarify IPv4-only Gateway BYOH binding, trusted-proxy scope clearing, Android pairing approval, macOS Accessibility grants, Zalo profile env vars, password-store SecretRef setup, and Chinese memory navigation. Thanks [@​itskai-dev](https://github.com/itskai-dev), [@​gwh7078](https://github.com/gwh7078), [@​longstoryscott](https://github.com/longstoryscott), [@​MoeJaberr](https://github.com/MoeJaberr), and [@​yuaiccc](https://github.com/yuaiccc). - Docs: consolidate GLM under Z.AI, add the Upstash Box install guide and Gateway exposure runbook, clarify MEDIA directives, Copilot and Voyage setup, config path quoting, real behavior proof, and memory-file write guidance. Thanks [@​BobDu](https://github.com/BobDu), [@​alitariksahin](https://github.com/alitariksahin), [@​Jefsky](https://github.com/Jefsky), [@​musaabhasan](https://github.com/musaabhasan), [@​OmerZeyveli](https://github.com/OmerZeyveli), [@​leno23](https://github.com/leno23), [@​WuKongAI-CMU](https://github.com/WuKongAI-CMU), [@​luoyanglang](https://github.com/luoyanglang), and [@​majin1102](https://github.com/majin1102). - Docs: clarify media provider credentials, Codex/OpenClaw code-mode boundaries, Slack and Telegram ack reactions, Feishu dynamic agents, secrets plaintext boundaries, memory guidance, and Chinese glossary terms. Thanks [@​nielskaspers](https://github.com/nielskaspers), [@​cosmopolitan033](https://github.com/cosmopolitan033), [@​drclaw-iq](https://github.com/drclaw-iq), [@​alexgduarte](https://github.com/alexgduarte), [@​zccyman](https://github.com/zccyman), [@​chengoak](https://github.com/chengoak), and [@​cassthebandit](https://github.com/cassthebandit). - Packaging: exclude documentation images and assets from the npm tarball, reducing published package size without affecting runtime docs search or CLI behavior. Thanks [@​SebTardif](https://github.com/SebTardif). - Media understanding: stop auto-probing Gemini CLI and use Antigravity CLI only as a lower-priority image/video fallback after configured provider APIs. - Agents/subagents: limit default sub-agent bootstrap context to `AGENTS.md` and `TOOLS.md`, keeping persona, identity, user, memory, heartbeat, and setup files out of delegated workers by default. ([#​85283](https://github.com/openclaw/openclaw/issues/85283)) Thanks [@​100yenadmin](https://github.com/100yenadmin). - Maintainer skills: exclude plugin SDK/API boundary work from `openclaw-landable-bug-sweep` so bugbash sweeps stay focused on small paper-cut fixes. - QA-Lab/diagnostics: extend the OpenTelemetry smoke harness to prove trace, metric, and log export, and add first-class Prometheus and observability smoke aliases. - Plugin SDK: add a generic channel-message poll sender so channel plugins can expose poll delivery without depending on channel-specific SDK facades. - Crabbox: keep the local wrapper's provider validation synced with the installed Crabbox binary while preserving supported aliases such as `docker` and `blacksmith`. ([#​85302](https://github.com/openclaw/openclaw/issues/85302)) Thanks [@​hxy91819](https://github.com/hxy91819). - Maintainer skills: add `openclaw-landable-bug-sweep` for producing five small, reviewed, CI-green OpenClaw bugfix PRs from issue/PR sweeps. - Control UI/chat: add search and Load More pagination to the chat session picker, keeping initial session loads bounded while making older conversations reachable. ([#​85237](https://github.com/openclaw/openclaw/issues/85237)) Thanks [@​amknight](https://github.com/amknight). - CLI/onboarding: start classic onboarding when bare `openclaw` runs before an authored config exists, while keeping configured installs on Crestodian. ([#​72343](https://github.com/openclaw/openclaw/issues/72343)) Thanks [@​fuller-stack-dev](https://github.com/fuller-stack-dev). - Discord: allow configuring a bounded `agentComponents.ttlMs` callback registry lifetime for long-running component workflows, with per-account overrides and a 24-hour cap. ([#​84189](https://github.com/openclaw/openclaw/issues/84189)) Thanks [@​100menotu001](https://github.com/100menotu001). - xAI/Grok: reuse xAI OAuth auth profiles for Grok `web_search`, thread active-agent auth through web search, add Grok model aliases, and let media providers declare default operation timeouts. ([#​85182](https://github.com/openclaw/openclaw/issues/85182)) Thanks [@​fuller-stack-dev](https://github.com/fuller-stack-dev). - Plugin SDK: add row-level session workflow helpers and deprecate `loadSessionStore` so plugins can read and patch sessions without depending on the legacy whole-store shape. ([#​84693](https://github.com/openclaw/openclaw/issues/84693)) Thanks [@​efpiva](https://github.com/efpiva). - Gateway/plugins: reuse a compatible Gateway startup plugin registry during dispatch so safe plugin dispatches avoid redundant registry loading. ([#​84324](https://github.com/openclaw/openclaw/issues/84324)) Thanks [@​ai-hpc](https://github.com/ai-hpc). - Plugins/SDK: add a general `embeddingProviders` capability contract and registration API so embeddings can become a reusable provider surface outside memory-specific adapters. - Dependencies: refresh provider, plugin, UI, and tooling packages, update `protobufjs` to 8.4.0 to clear the current npm advisory, and carry the Claude ACP completion patch forward to `@agentclientprotocol/claude-agent-acp` 0.36.1. - Agents/tools: remove the old sender-owner tool gating path so configured tools stay visible for trusted sessions while command and channel-action auth still carry real sender identity. - QA-Lab: add curated mock JSONL replay fixtures and first-drift reporting for runtime-parity audits. ([#​80323](https://github.com/openclaw/openclaw/issues/80323), refs [#​80176](https://github.com/openclaw/openclaw/issues/80176)) Thanks [@​100yenadmin](https://github.com/100yenadmin). - QA-Lab: add a QA bus tool-trace visibility scenario for sanitized tool-call assertions. - QA-Lab: replace generic evidence framing in seeded scenario prompts with concrete observed QA behavior. - QA-Lab: list named scenario packs in the coverage report so personal-agent privacy coverage stays visible in audits. - QA-Lab: list live transport lane membership in the coverage report so real transport checks stay separate from seeded qa-channel scenarios. - Release/package: run package integrity checks before package acceptance lanes so public install/update validation fails before private QA assets can leak into the package. - QA-Lab: include the optional 100-turn runtime parity soak in release-soak artifacts so long-run Codex/Pi transcript drift stays visible outside the default gate. ([#​80395](https://github.com/openclaw/openclaw/issues/80395)) Thanks [@​100yenadmin](https://github.com/100yenadmin). - QA-Lab: add a live-only long-context progress watchdog scenario for Codex app-server timeout and stalled-run sentinels. ([#​80323](https://github.com/openclaw/openclaw/issues/80323)) Thanks [@​100yenadmin](https://github.com/100yenadmin). - QA-Lab: tag gateway restart recovery and streaming final-integrity scenarios as live-only runtime parity lanes. ([#​80323](https://github.com/openclaw/openclaw/issues/80323)) Thanks [@​100yenadmin](https://github.com/100yenadmin). - QA-Lab: add a personal-agent failure recovery scenario that checks honest partial status, retry boundaries, and local recovery artifacts. ([#​83872](https://github.com/openclaw/openclaw/issues/83872)) Thanks [@​iFiras-Max1](https://github.com/iFiras-Max1). - QA-Lab: include an opt-in `update.run` package self-upgrade sentinel for destructive latest-package recovery checks. - QA-Lab: add Codex plugin lifecycle and auth-profile fixture coverage for missing installs, pinned-version drift, first-turn install ordering, and doctor migration safety. ([#​80323](https://github.com/openclaw/openclaw/issues/80323), refs [#​80174](https://github.com/openclaw/openclaw/issues/80174)) Thanks [@​100yenadmin](https://github.com/100yenadmin). - Models/perf: pre-warm the provider auth-state map at gateway startup so `/models` and every model-listing call short-circuits the per-provider plugin / external-CLI discovery on the hot path. Per-call cost drops from \~20 s to \~5 ms (\~4,100×); the one-time startup warm resets and re-warms after hot reloads. ([#​84816](https://github.com/openclaw/openclaw/issues/84816)) Thanks [@​sjf](https://github.com/sjf). - Release/security: ship the root npm package and OpenClaw-owned npm plugins with generated shrinkwrap, support bundled plugin runtime dependencies for suitable plugin tarballs, and require review for lockfile/shrinkwrap changes so published installs use locked dependency graphs. - Tests/perf: isolate doctor core health check unit coverage from real skills/workspace discovery so `doctor-core-checks` no longer dominates unit perf while keeping one real skills-readiness smoke. ([#​84493](https://github.com/openclaw/openclaw/issues/84493)) Thanks [@​frankekn](https://github.com/frankekn). ##### Fixes - WebChat: summarize internal message-tool source replies so tool cards no longer duplicate the visible reply body. ([#​84773](https://github.com/openclaw/openclaw/issues/84773)) Thanks [@​jason-allen-oneal](https://github.com/jason-allen-oneal). - Gateway: preserve deferred lifecycle-error cleanup across later non-terminal events so provider timeouts can persist failed session state instead of leaving sessions stuck running. ([#​85256](https://github.com/openclaw/openclaw/issues/85256), fixes [#​63819](https://github.com/openclaw/openclaw/issues/63819)) Thanks [@​samzong](https://github.com/samzong). - Agents/subagents: report tool-only child progress during timeout summaries instead of showing no visible output. - Telegram/ACP: preserve explicit `:topic:` conversation suffixes when inbound ACP targets do not carry a separate thread id. - Browser/proxy: bypass the managed proxy for the exact local managed Chrome CDP readiness and DevTools WebSocket endpoints, so `openclaw browser start` works when the operator proxy blocks loopback egress. ([#​83255](https://github.com/openclaw/openclaw/issues/83255)) Thanks [@​lightcap](https://github.com/lightcap). - Ollama: bypass the managed proxy for configured local embedding origins while keeping SSRF guardrails on unconfigured targets. Thanks [@​Kaspre](https://github.com/Kaspre). - OpenAI/images: route Codex API-key image generation through the native OpenAI Images API instead of the Codex OAuth streaming backend, avoiding 401s from valid API keys. - Agents/OpenAI completions: omit empty tool payload fields for proxy-like OpenAI-compatible endpoints so strict vLLM-style servers accept tool-free turns. ([#​85835](https://github.com/openclaw/openclaw/issues/85835)) Thanks [@​rendrag-git](https://github.com/rendrag-git). - Checks/Windows: route full `pnpm check` stage commands through the managed child runner so Windows avoids Node shell-argv deprecation warnings there too. - Checks/Windows: run managed child commands through explicit `cmd.exe` wrapping instead of Node shell mode with argv, avoiding Node 24 subprocess deprecation warnings during changed checks. - Gateway: omit internal stream-error placeholder entries from agent prompt history so failed assistant turns are not replayed as model-authored text. ([#​85652](https://github.com/openclaw/openclaw/issues/85652)) Thanks [@​anyech](https://github.com/anyech). - Sessions: enforce the session write-lock max-hold policy during lock acquisition so long-held locks can be reclaimed before the stale-lock window. ([#​85764](https://github.com/openclaw/openclaw/issues/85764)) Thanks [@​njuboy11](https://github.com/njuboy11). - Models: prune retired Groq, GitHub Copilot, OpenAI, xAI, and old Claude catalog entries, with doctor migration to upgrade existing configs to current provider refs. - Doctor/update: recognize junction-backed source checkouts as git installs by comparing canonical paths before showing package-manager update guidance. Fixes [#​82215](https://github.com/openclaw/openclaw/issues/82215). Thanks [@​igormf](https://github.com/igormf). - Channels: honor `/verbose on` for tool/progress summaries across direct chats, groups, channels, and forum topics while preserving quiet default behavior. ([#​85488](https://github.com/openclaw/openclaw/issues/85488)) Thanks [@​kurplunkin](https://github.com/kurplunkin). - CLI/skills: show an all-ready note with next-step commands when skill setup has no missing dependencies to install. ([#​85032](https://github.com/openclaw/openclaw/issues/85032)) Thanks [@​aniruddhaadak80](https://github.com/aniruddhaadak80). - Microsoft Foundry: route DeepSeek V4 Pro and Flash models through the Foundry Responses API while keeping older DeepSeek models on their existing path. ([#​85549](https://github.com/openclaw/openclaw/issues/85549)) Thanks [@​roslinmahmud](https://github.com/roslinmahmud). - Status/usage: show configured cost estimates for AWS SDK models in full usage output while keeping token-only usage replies cost-free. ([#​85619](https://github.com/openclaw/openclaw/issues/85619)) Thanks [@​ItsOtherMauridian](https://github.com/ItsOtherMauridian). - Agents/OpenAI Responses: retry non-visible reasoning-only turns for OpenAI Responses API families instead of treating them as empty failed turns. ([#​85603](https://github.com/openclaw/openclaw/issues/85603)) Thanks [@​SebTardif](https://github.com/SebTardif). - Directive tags: preserve message and content-part object identity when display stripping makes no directive-tag changes. ([#​85682](https://github.com/openclaw/openclaw/issues/85682)) Thanks [@​willamhou](https://github.com/willamhou). - Telegram: send local `path`/`filePath` and structured attachment media from `sendMessage` actions instead of dropping them or sending text-only messages. ([#​85219](https://github.com/openclaw/openclaw/issues/85219)) Thanks [@​keshavbotagent](https://github.com/keshavbotagent). - Sessions/status: show the estimated context budget when fresh provider usage is unavailable and clear stale estimates across session resets and compaction boundaries. ([#​84830](https://github.com/openclaw/openclaw/issues/84830)) Thanks [@​giodl73-repo](https://github.com/giodl73-repo). - Gateway/config: pin relative `OPENCLAW_STATE_DIR` overrides to an absolute path at startup so later working-directory changes cannot retarget gateway state. ([#​52264](https://github.com/openclaw/openclaw/issues/52264)) Thanks [@​PerfectPan](https://github.com/PerfectPan). - Release/package: run npm release, prepublish, and postpublish verification through Windows-safe npm command shims so native Windows checks can execute `npm.cmd` instead of treating it as a binary. - Agents/harness: pass CLI runtime aliases through harness selection so provider-owned CLI aliases no longer get rejected before reaching the right runtime. ([#​85631](https://github.com/openclaw/openclaw/issues/85631)) Thanks [@​potterdigital](https://github.com/potterdigital). - Secrets: show the irreversible apply warning after interactive `secrets configure` confirmation so confirmed migrations still get the final safety prompt. ([#​85638](https://github.com/openclaw/openclaw/issues/85638)) Thanks [@​alkor2000](https://github.com/alkor2000). - Agents/CLI output: ignore cumulative Claude `stream-json` result usage when assistant usage events are present, preventing inflated cache-read accounting. ([#​85625](https://github.com/openclaw/openclaw/issues/85625)) Thanks [@​zhouhe-xydt](https://github.com/zhouhe-xydt). - CLI: keep `waitForever()` alive by leaving its keep-alive interval ref'd so the public helper no longer exits immediately with Node's unsettled-await code. ([#​85694](https://github.com/openclaw/openclaw/issues/85694)) Thanks [@​m1qaweb](https://github.com/m1qaweb). - Agents/bootstrap: guard bootstrap name checks against missing file names so malformed bootstrap entries warn and truncate instead of crashing. Fixes [#​85523](https://github.com/openclaw/openclaw/issues/85523). ([#​85615](https://github.com/openclaw/openclaw/issues/85615)) Thanks [@​zhouhe-xydt](https://github.com/zhouhe-xydt). - CLI/tasks: reject partially numeric `openclaw tasks audit --limit` values so audit limits must be real positive integers instead of accepting strings like `5abc`. ([#​84901](https://github.com/openclaw/openclaw/issues/84901)) Thanks [@​jbetala7](https://github.com/jbetala7). - Status/diagnostics: bound deep Docker audit probes so `openclaw status --deep` reports slow container checks instead of hanging behind unbounded inspection. ([#​85476](https://github.com/openclaw/openclaw/issues/85476)) Thanks [@​giodl73-repo](https://github.com/giodl73-repo). - Providers/Anthropic: migrate 1M context handling to GA-capable Claude 4.x models by sizing eligible models at 1M without the retired `context-1m-2025-08-07` beta, ignoring that retired beta in older configs, and preserving OAuth-required Anthropic beta headers. ([#​45613](https://github.com/openclaw/openclaw/issues/45613)) Thanks [@​haoyu-haoyu](https://github.com/haoyu-haoyu). - Cron/Telegram: parse forum-topic delivery targets through the Telegram plugin instead of cron core, including `:topic:` and `:topicId` forms for announce delivery. Thanks [@​etticat](https://github.com/etticat). - Twitch: keep stale message-handler cleanup callbacks from removing newer handler registrations for the same account, preserving inbound message delivery after reconnects. Fixes [#​83888](https://github.com/openclaw/openclaw/issues/83888). ([#​85425](https://github.com/openclaw/openclaw/issues/85425)) Thanks [@​alkor2000](https://github.com/alkor2000). - Memory/LanceDB: expose public memory artifacts through the active memory provider bridge so memory-wiki imports durable memory files, daily notes, dream reports, and event logs without depending on memory-core internals. Fixes [#​83604](https://github.com/openclaw/openclaw/issues/83604). ([#​85060](https://github.com/openclaw/openclaw/issues/85060)) Thanks [@​brokemac79](https://github.com/brokemac79). - Crabbox: keep AWS hydration compatible with local Actions replay by inlining the hydrate workflow's Node/pnpm setup instead of invoking repo-local composite actions. - Agents/subagents: simplify native sub-agent completion handoff so children report their latest visible assistant result to the requester without using `message`, while keeping parent-owned message-tool delivery policy intact. Fixes [#​85070](https://github.com/openclaw/openclaw/issues/85070). ([#​85089](https://github.com/openclaw/openclaw/issues/85089)) Thanks [@​brokemac79](https://github.com/brokemac79). - Docker setup: stop printing the Gateway bearer token in setup logs and printed follow-up commands. - Agents: let embedded compaction fallback retries proceed when PI-compatible candidates do not need agent harness plugin preparation. - Agents/tools: honor configured custom provider API keys when deciding whether media, image-generation, video-generation, music-generation, and PDF tools are available. ([#​85570](https://github.com/openclaw/openclaw/issues/85570)) - StepFun: stop advertising stale generic API key auth choices so onboarding only offers runtime-backed Standard and Step Plan choices. - Diagnostics: keep OpenTelemetry log bodies behind explicit content capture and scrub scoped agent-session keys from OpenTelemetry and Prometheus labels while preserving bounded queue-lane prefixes. - Windows installer: fail Git checkout installs when `pnpm install` or `pnpm build` fails instead of writing a wrapper to a missing CLI build. - Sessions: surface previous-transcript archive failures during `/new` rotation so disk rename errors are logged instead of silently hiding stranded transcript files. Fixes [#​81984](https://github.com/openclaw/openclaw/issues/81984). ([#​85586](https://github.com/openclaw/openclaw/issues/85586), from [#​82081](https://github.com/openclaw/openclaw/issues/82081)) Thanks [@​0xghost42](https://github.com/0xghost42). - TUI/agents: mirror internal-ui message-tool replies into final chat output so message-tool-only agents remain visible in `openclaw tui`. Fixes [#​85538](https://github.com/openclaw/openclaw/issues/85538). Thanks [@​danpolasek](https://github.com/danpolasek). - Agents: keep parallel OpenAI-compatible tool-call deltas in separate argument buffers so interleaved tool calls no longer corrupt streamed arguments. ([#​82263](https://github.com/openclaw/openclaw/issues/82263)) Thanks [@​luna-system](https://github.com/luna-system). - Memory/doctor: report missing or unusable QMD workspace directories as workspace failures instead of generic binary failures. ([#​63167](https://github.com/openclaw/openclaw/issues/63167)) Thanks [@​sercada](https://github.com/sercada). - Debug proxy: record CONNECT client-socket errors and destroy the paired upstream socket so abrupt client disconnects no longer leak tunnel resources. ([#​82444](https://github.com/openclaw/openclaw/issues/82444)) Thanks [@​SebTardif](https://github.com/SebTardif). - Diffs: continue hydrating later diff cards when one card fails so a single broken card no longer blanks the whole diff viewer. ([#​84775](https://github.com/openclaw/openclaw/issues/84775)) Thanks [@​cosmopolitan033](https://github.com/cosmopolitan033). - Mac app: use the native settings sidebar window chrome so the sidebar toggle stays on the left and content no longer clips under oversized titlebar padding. - QA-Lab/Codex: bundle auth/plugin fixture imports for flow scenarios and let terminal async media tools end Codex app-server turns without timing out. ([#​80397](https://github.com/openclaw/openclaw/issues/80397), refs [#​80323](https://github.com/openclaw/openclaw/issues/80323)) Thanks [@​100yenadmin](https://github.com/100yenadmin). - Gateway/agents: preserve fresh session overrides and metadata when stale cached agent-session entries race with store updates, so subagent model/provider overrides and routing policy survive concurrent writes. ([#​19328](https://github.com/openclaw/openclaw/issues/19328)) Thanks [@​CodeReclaimers](https://github.com/CodeReclaimers). - Control UI/chat: keep chat session search inline with the session selector so the header no longer shows a duplicate standalone search row. - Control UI/chat: collapse focused-mode header chrome and suppress hidden-header scroll updates so focus mode no longer jumps while scrolling. Thanks [@​amknight](https://github.com/amknight). - Codex app-server: restart the native app-server and retry once when server-side compaction times out, so preflight compaction stalls recover instead of failing every dispatch. ([#​85500](https://github.com/openclaw/openclaw/issues/85500)) - Restore Control UI gateway token pairing \[AI]. ([#​85459](https://github.com/openclaw/openclaw/issues/85459)) Thanks [@​pgondhi987](https://github.com/pgondhi987). - OpenAI video: honor configured provider request private-network opt-in for local/custom video endpoints so explicitly trusted mock and self-hosted providers are not blocked. Thanks [@​shakkernerd](https://github.com/shakkernerd). - OpenAI video: send uploaded video edit requests to the documented `/videos/edits` endpoint with a `video` file instead of posting MP4 references to `/videos`. Thanks [@​shakkernerd](https://github.com/shakkernerd). - Agents/channels: preserve message-tool delivery evidence through gateway agent completion handoffs so successful generated media sends are not followed by false failure messages. Thanks [@​shakkernerd](https://github.com/shakkernerd). - CLI/update: repair managed npm plugin `openclaw` peer links during post-core convergence and reject stale or wrong-target peer links before restart. ([#​83794](https://github.com/openclaw/openclaw/issues/83794)) Thanks [@​fuller-stack-dev](https://github.com/fuller-stack-dev). - CLI/agents: default new omitted-account bindings to all accounts when the channel has multiple configured accounts, and clarify account-scope docs. ([#​49769](https://github.com/openclaw/openclaw/issues/49769)) Thanks [@​Gcaufy](https://github.com/Gcaufy). - Codex app-server: let authorized `/codex` control commands such as `/codex detach` escape plugin-owned conversation bindings while keeping unknown or unauthorized slash text routed to the bound plugin. Fixes [#​85157](https://github.com/openclaw/openclaw/issues/85157). ([#​85188](https://github.com/openclaw/openclaw/issues/85188)) Thanks [@​TurboTheTurtle](https://github.com/TurboTheTurtle). - Auto-reply/models: keep `/models` browse replies fast by sharing the bounded read-only catalog path with Gateway model listing. ([#​84735](https://github.com/openclaw/openclaw/issues/84735)) Thanks [@​safrano9999](https://github.com/safrano9999). - Codex app-server: disable native Code Mode when the effective exec host is `node` and keep OpenClaw `exec`/`process` available, so `/exec host=node` routes shell commands through the selected node instead of the gateway. Fixes [#​85012](https://github.com/openclaw/openclaw/issues/85012). ([#​85090](https://github.com/openclaw/openclaw/issues/85090)) Thanks [@​sahilsatralkar](https://github.com/sahilsatralkar). - Agents: bound embedded auto-compaction session write-lock watchdogs to the compaction timeout instead of the full run timeout, so stuck compaction cannot hold the live session lock for the whole run window. ([#​84949](https://github.com/openclaw/openclaw/issues/84949)) Thanks [@​luoyanglang](https://github.com/luoyanglang). - Gateway/agents: return phase-aware `agent.wait` timeout attribution and only cool auth profiles on provider-started timeouts. Refs [#​65504](https://github.com/openclaw/openclaw/issues/65504). Thanks [@​100yenadmin](https://github.com/100yenadmin). - Gateway: defer provider auth-state prewarm until after startup readiness so early gateway tool/session requests are not blocked by provider auth discovery. ([#​85272](https://github.com/openclaw/openclaw/issues/85272)) Thanks [@​dutifulbob](https://github.com/dutifulbob). - Gateway/models: coalesce provider auth-state rewarms after auth-profile failures and log event-loop delay for warm/rewarm work, so provider auth bursts no longer stack full auth sweeps behind channel replies. - Gateway/models: stop cancelled provider auth-state prewarms from continuing full provider sweeps, so reload and auth-failure bursts no longer keep startup busy. - Agents/Codex: show the first plan update as a transient chat status notice without counting it as final assistant content. - CLI/update: walk the macOS process ancestry and honor the inherited Gateway runtime PID before package updates stop the managed Gateway service, so nested in-band updater children can refuse instead of killing the LaunchAgent-supervised Gateway that owns them. Fixes [#​85120](https://github.com/openclaw/openclaw/issues/85120). - Gateway/LaunchAgent: wait for launchd reload bootout to finish and fall back to kickstart when bootstrap races, so reload handoff does not leave the service deregistered. Fixes [#​84630](https://github.com/openclaw/openclaw/issues/84630). ([#​84641](https://github.com/openclaw/openclaw/issues/84641)) Thanks [@​NianJiuZst](https://github.com/NianJiuZst). - Gateway/LaunchAgent: treat a concurrent launchd bootstrap as a successful restart when the service is already loaded, avoiding false macOS Gateway restart failures. Fixes [#​84721](https://github.com/openclaw/openclaw/issues/84721). ([#​84722](https://github.com/openclaw/openclaw/issues/84722)) Thanks [@​googlerest](https://github.com/googlerest). - Gateway/service: include the active `openclaw` command bin directory in managed service PATH generation and doctor audit expectations for npm-global macOS installs. Fixes [#​84201](https://github.com/openclaw/openclaw/issues/84201). ([#​84475](https://github.com/openclaw/openclaw/issues/84475)) Thanks [@​jbetala7](https://github.com/jbetala7). - Control UI/chat: disable the thinking selector for known non-reasoning models instead of showing duplicate Off choices. Fixes [#​84069](https://github.com/openclaw/openclaw/issues/84069). Thanks [@​DrippingMellow](https://github.com/DrippingMellow). - Memory: expand `~` in configured extra memory paths before resolving them, so home-relative folders are not treated as workspace-relative. Fixes [#​58026](https://github.com/openclaw/openclaw/issues/58026). Thanks [@​stadman](https://github.com/stadman). - Skills: treat `openclaw.os: macos` as Darwin when checking skill requirements, so macOS-only skills no longer report as missing on macOS hosts. Fixes [#​61338](https://github.com/openclaw/openclaw/issues/61338). Thanks [@​Jessecq1995](https://github.com/Jessecq1995). - Control UI/logs: strip ANSI escape sequences from displayed Gateway log messages so color codes no longer appear as raw text. Fixes [#​64399](https://github.com/openclaw/openclaw/issues/64399). Thanks [@​guguangxin-eng](https://github.com/guguangxin-eng). - Docker: pre-create the workspace and auth-profile config mount points with `node` ownership so first-run named volumes do not start root-owned. Fixes [#​85076](https://github.com/openclaw/openclaw/issues/85076). Thanks [@​Noerr](https://github.com/Noerr). - Telegram: pass configured markdown table mode through outbound markdown chunking so chunked sends render tables consistently. Fixes [#​85085](https://github.com/openclaw/openclaw/issues/85085). Thanks [@​ShuaiHui](https://github.com/ShuaiHui). - CLI/update: preserve managed Gateway service environment during package cutovers so macOS LaunchAgent repair/restart reads the pre-update service state instead of caller shell state. ([#​83026](https://github.com/openclaw/openclaw/issues/83026)) - Agents/providers: honor per-model `api` and `baseUrl` overrides in custom provider auth hooks and transport selection. Fixes [#​80487](https://github.com/openclaw/openclaw/issues/80487). ([#​80488](https://github.com/openclaw/openclaw/issues/80488)) Thanks [@​huveewomg](https://github.com/huveewomg). - Gateway/restart: eager-load the lifecycle runtime before in-place upgrade signal handling so package replacement does not deadlock restart imports. ([#​84890](https://github.com/openclaw/openclaw/issues/84890)) Thanks [@​myps6415](https://github.com/myps6415). - CLI/update: start managed Gateway update handoff helpers from a stable existing directory and tolerate deleted cwd/package roots during macOS LaunchAgent handoff. Fixes [#​83808](https://github.com/openclaw/openclaw/issues/83808). ([#​83875](https://github.com/openclaw/openclaw/issues/83875)) Thanks [@​jason-allen-oneal](https://github.com/jason-allen-oneal). - Skills: watch each shared skill directory once across agent workspaces instead of once per agent, preventing file-descriptor exhaustion (`EMFILE`) that disposed bundle-mcp processes and stalled sessions on multi-agent gateways. Fixes [#​84968](https://github.com/openclaw/openclaw/issues/84968). ([#​85130](https://github.com/openclaw/openclaw/issues/85130)) Thanks [@​openperf](https://github.com/openperf). - Release/security: keep generated npm shrinkwrap package versions inside the pnpm lock graph so published package locks cannot bypass pnpm dependency age and override policy. - Cron: honor `cron.retry.retryOn: ["network"]` for common network error codes such as `EAI_AGAIN`, `EHOSTUNREACH`, and `ENETUNREACH`. - Gateway chat: broadcast returned agent-run error payloads after an agent starts so ACP/WebChat clients receive terminal idle-timeout errors. Fixes [#​84945](https://github.com/openclaw/openclaw/issues/84945). - Gateway chat display: preserve OpenAI-compatible `prompt_tokens`, `completion_tokens`, and `total_tokens` usage fields in sanitized chat history so llama.cpp sessions keep context counts. Fixes [#​77992](https://github.com/openclaw/openclaw/issues/77992). Thanks [@​MarTT79](https://github.com/MarTT79). - Dashboard/CLI: allow macOS browser launching through `open` even when SSH environment variables are present, while preserving Linux SSH no-display protection. Fixes [#​67088](https://github.com/openclaw/openclaw/issues/67088). Thanks [@​theglove44](https://github.com/theglove44). - Codex app-server: keep native web search observations out of mirrored chat transcripts while preserving tool progress telemetry. Fixes [#​85109](https://github.com/openclaw/openclaw/issues/85109). Thanks [@​ugitmebaby](https://github.com/ugitmebaby). - OpenCode Go: strip unsupported Kimi reasoning replay fields before provider requests so repeated `kimi-k2.6` turns do not fail schema validation. Fixes [#​83812](https://github.com/openclaw/openclaw/issues/83812). Thanks [@​Sleeck](https://github.com/Sleeck). - Browser/CDP: add a WSL2 portproxy self-loop hint when Chrome DevTools endpoints accept connections but return an empty HTTP reply. Fixes [#​59209](https://github.com/openclaw/openclaw/issues/59209). Thanks [@​Owlock](https://github.com/Owlock). - Agents/OpenAI: preserve structured provider error code, type, and redacted body metadata on boundary-aware transport failures. - Doctor/Codex: point native Codex asset warnings at the canonical `openclaw migrate plan codex` preview command. Fixes [#​84948](https://github.com/openclaw/openclaw/issues/84948). Thanks [@​markoa](https://github.com/markoa). - CLI/models: make `capability model auth logout --agent` remove auth profiles from the selected non-default agent store. Fixes [#​85092](https://github.com/openclaw/openclaw/issues/85092). Thanks [@​islandpreneur007](https://github.com/islandpreneur007). - Gateway/models: reuse prepared provider auth metadata during model-listing auth checks so repeated lookups avoid broad plugin discovery while preserving synthetic local auth. - CLI/status: suppress systemd user-service setup hints when `openclaw status --deep` can already reach a running Gateway RPC service. Fixes [#​85094](https://github.com/openclaw/openclaw/issues/85094). Thanks [@​islandpreneur007](https://github.com/islandpreneur007). - CLI/devices: recover local approval when a same-device repair request replaces the request ID being approved. - CLI/agents: retry transient normal-close Gateway handshakes before falling back to embedded `openclaw agent` execution. - CLI/update: keep managed Gateway service stop/restart status lines out of `openclaw update --json` stdout so package-update automation can parse the JSON payload. - Plugins: resolve OpenClaw plugin SDK subpaths for native external plugin runtimes without mutating package installs or broadening process-wide module resolution. - Agents/OpenAI: preserve Responses and Chat Completions `reasoning_tokens` usage metadata without double-counting it in aggregate output tokens. ([#​85319](https://github.com/openclaw/openclaw/issues/85319)) - Control UI/chat: convert pasted `data:image/...;base64,...` clipboard text into an image attachment instead of dumping the payload into the composer. Fixes [#​62604](https://github.com/openclaw/openclaw/issues/62604). Thanks [@​cpwilhelmi](https://github.com/cpwilhelmi). - Providers/Gemini: strip fractional seconds from web-search time range filters so Gemini accepts freshness-bound search requests. ([#​85071](https://github.com/openclaw/openclaw/issues/85071)) Thanks [@​Noerr](https://github.com/Noerr). - OpenAI Codex: preserve image input support for sparse `openai-codex/gpt-5.5` catalog rows. ([#​85095](https://github.com/openclaw/openclaw/issues/85095)) Thanks [@​sercada](https://github.com/sercada). - CLI/models: add a piped or pasted API-key path for OpenAI Codex auth and warn when API keys are pasted into token-mode auth. ([#​85533](https://github.com/openclaw/openclaw/issues/85533)) Thanks [@​joshavant](https://github.com/joshavant). - Telegram: dead-letter missing-harness isolated ingress failures so a poisoned spooled update no longer blocks later same-lane messages. Fixes [#​85470](https://github.com/openclaw/openclaw/issues/85470). ([#​85605](https://github.com/openclaw/openclaw/issues/85605)) Thanks [@​joshavant](https://github.com/joshavant). - Plugins/discovery: strip `-plugin` package suffixes when deriving plugin id hints so package names line up with manifest ids. ([#​85170](https://github.com/openclaw/openclaw/issues/85170)) Thanks [@​JulyanXu](https://github.com/JulyanXu). - Tlon: stop advertising a non-existent agent tool contract in the plugin manifest. - Telegram: preserve fenced code block languages through Markdown rendering so Telegram receives `language-*` code classes. ([#​85209](https://github.com/openclaw/openclaw/issues/85209)) Thanks [@​leno23](https://github.com/leno23). - Windows installer: run npm and Corepack command shims from a Windows-local directory so installs launched from WSL2 UNC paths do not fail before OpenClaw is installed. - Windows updates: roll back git-backed updates to the previous checkout when dependency install, build, UI build, or doctor repair fails. - Windows installer: persist user-local portable Git on PATH and activate the repo-pinned pnpm version for git-backed installs and updates. - Windows installer: bootstrap a user-local portable Node.js when native Windows has no Node and no winget, Chocolatey, or Scoop, so first-run installs can continue on raw hosts. - Windows installer: extract the downloaded portable Node.js directory with native `tar` before falling back to .NET zip extraction, avoiding PowerShell 5.1 archive and path-length failures. - fix(integrations): enforce channel read target allowlists \[AI]. ([#​84982](https://github.com/openclaw/openclaw/issues/84982)) Thanks [@​pgondhi987](https://github.com/pgondhi987). - Agents/heartbeat: route single-owner `session.dmScope=main` direct-message exec and cron event wakes back to the agent main session so async completions no longer strand context in orphan direct-DM queues. Fixes [#​71581](https://github.com/openclaw/openclaw/issues/71581). ([#​83743](https://github.com/openclaw/openclaw/issues/83743)) Thanks [@​Kaspre](https://github.com/Kaspre). - Agents/code-mode: expose outer code-mode `exec` source through the `command` hook alias with `toolKind`/`toolInputKind` discriminators so exec-shaped policies can distinguish code-mode cells. ([#​83483](https://github.com/openclaw/openclaw/issues/83483)) Thanks [@​Kaspre](https://github.com/Kaspre). - Agents/code mode: return structured timeout and runtime-unavailable error codes for known worker failures. Fixes [#​83389](https://github.com/openclaw/openclaw/issues/83389). ([#​83444](https://github.com/openclaw/openclaw/issues/83444)) Thanks [@​Kaspre](https://github.com/Kaspre). - QA-Lab: isolate multi-scenario suite workers when scenarios need startup config patches, preventing message-routing config from leaking into unrelated scenarios. - QA-Lab: make the commitments heartbeat-target-none scenario request an immediate heartbeat instead of waiting for the next scheduled heartbeat. - Codex/Plugin SDK: deliver Codex-native subagent completions through a generic harness task runtime so harness-backed plugins can mirror durable task lifecycle and completion delivery without Codex-specific SDK imports. ([#​83445](https://github.com/openclaw/openclaw/issues/83445)) Thanks [@​bryanpearson](https://github.com/bryanpearson). - Gateway CLI: surface local post-challenge connect assembly failures immediately instead of waiting for the wrapper timeout. Fixes [#​68944](https://github.com/openclaw/openclaw/issues/68944). ([#​85253](https://github.com/openclaw/openclaw/issues/85253)) Thanks [@​samzong](https://github.com/samzong). - Messages: strip unsupported web-search citation control markers from outbound replies before they reach WebChat or external channels. Fixes [#​85193](https://github.com/openclaw/openclaw/issues/85193). ([#​85204](https://github.com/openclaw/openclaw/issues/85204)) Thanks [@​neeravmakwana](https://github.com/neeravmakwana). - Agents/exec: treat denied exec approvals as terminal instead of feeding them back into agent follow-up work, and recognize Chinese stop phrases in abort handling. Fixes [#​69386](https://github.com/openclaw/openclaw/issues/69386). ([#​85194](https://github.com/openclaw/openclaw/issues/85194)) Thanks [@​samzong](https://github.com/samzong). - CLI/agents: abort accepted Gateway-backed `openclaw agent` runs on SIGINT/SIGTERM so cron and supervisor timeouts do not leave remote agent work alive. Fixes [#​71710](https://github.com/openclaw/openclaw/issues/71710). ([#​84381](https://github.com/openclaw/openclaw/issues/84381)) Thanks [@​Kaspre](https://github.com/Kaspre). - Codex app-server: retry replay-safe stdio client-close turns once using structured failure metadata, while surfacing idle `turn/completed` timeouts instead of blindly replaying active shared-server turns. Thanks [@​VACInc](https://github.com/VACInc). - Codex app-server: reject command overrides that embed Node or package-manager arguments and point users to `appServer.args`, so Windows startup avoids shell parsing failures. ([#​84417](https://github.com/openclaw/openclaw/issues/84417)) Thanks [@​TurboTheTurtle](https://github.com/TurboTheTurtle). - Agents/Copilot: drop unsafe GitHub Copilot Responses reasoning replay items before send so Telegram direct sessions no longer fail on overlong replay IDs. Fixes [#​85197](https://github.com/openclaw/openclaw/issues/85197). ([#​85198](https://github.com/openclaw/openclaw/issues/85198)) Thanks [@​galiniliev](https://github.com/galiniliev). - UI: add accessible tooltips to the topbar color-mode buttons so System, Light, and Dark choices are labeled on hover and focus. ([#​85227](https://github.com/openclaw/openclaw/issues/85227)) Thanks [@​amknight](https://github.com/amknight). - fix: constrain Windows task script names \[AI]. ([#​85064](https://github.com/openclaw/openclaw/issues/85064)) Thanks [@​pgondhi987](https://github.com/pgondhi987). - Control UI: keep the chat session picker from hiding older or cross-agent configured conversations while preserving the bounded configured-agent refresh. ([#​85211](https://github.com/openclaw/openclaw/issues/85211)) Thanks [@​amknight](https://github.com/amknight). - Agents/Anthropic: preserve unsafe integer tool-call input values in streamed Anthropic tool-use JSON, preventing Discord-style IDs from being rounded before dispatch. Fixes [#​47229](https://github.com/openclaw/openclaw/issues/47229). ([#​83063](https://github.com/openclaw/openclaw/issues/83063)) Thanks [@​leno23](https://github.com/leno23). - Agents/Codex: estimate tool-heavy prompt pressure at the LLM boundary before provider submission, so persistent sessions compact before overflowing context windows. ([#​85541](https://github.com/openclaw/openclaw/issues/85541)) Thanks [@​fuller-stack-dev](https://github.com/fuller-stack-dev) and [@​joshavant](https://github.com/joshavant). - Agents/hooks: wait for local one-shot CLI and Codex `agent_end` plugin hooks before process cleanup so terminal observability flushes reliably. ([#​85007](https://github.com/openclaw/openclaw/issues/85007)) - Providers/Google: preserve Gemini 3 cron `thinkingDefault: "low"` when stale catalog metadata says `reasoning:false`, so scheduled runs keep provider-supported thinking instead of downgrading to off. ([#​85185](https://github.com/openclaw/openclaw/issues/85185)) Thanks [@​neeravmakwana](https://github.com/neeravmakwana). - CLI/agents: allow `openclaw agent --session-key` to target explicit session keys, including agent-scoped legacy keys. ([#​85121](https://github.com/openclaw/openclaw/issues/85121)) Thanks [@​Kaspre](https://github.com/Kaspre). - Auto-reply/ACP: wait for same-channel block reply delivery before starting tool work, while still honoring ACP dispatch aborts so stopped turns do not wait on slow channel sends. ([#​83722](https://github.com/openclaw/openclaw/issues/83722)) Thanks [@​IWhatsskill](https://github.com/IWhatsskill). - Codex/ACP: mark required child-run completions that only report progress, omit a final deliverable, or fail requester delivery as blocked while preserving real final reports. ([#​85110](https://github.com/openclaw/openclaw/issues/85110)) Thanks [@​IWhatsskill](https://github.com/IWhatsskill). - Channels: treat bare abort messages such as `stop`, `abort`, and `wait` as immediate control commands in inbound debounce paths so stop requests are not delayed behind pending message coalescing. ([#​83348](https://github.com/openclaw/openclaw/issues/83348)) Thanks [@​IWhatsskill](https://github.com/IWhatsskill). - Channels/message tool: resolve configured external channel plugins during in-agent channel selection, so `openclaw agent --local` message-tool sends no longer report an available channel as unavailable. ([#​85022](https://github.com/openclaw/openclaw/issues/85022)) Thanks [@​Kaspre](https://github.com/Kaspre). - Agents/heartbeat: honor group/channel `message_tool` visible-reply policy and model-specific Codex runtime config for scheduled heartbeat runs, so failed internal tool output stays private. Fixes [#​85310](https://github.com/openclaw/openclaw/issues/85310). ([#​85357](https://github.com/openclaw/openclaw/issues/85357)) Thanks [@​neeravmakwana](https://github.com/neeravmakwana). - Gateway/ACP: close child ACP sessions spawned via `sessions_spawn` when their parent session is reset or deleted, instead of leaving orphaned `claude-agent-acp` processes that accumulate and exhaust memory. Fixes [#​68916](https://github.com/openclaw/openclaw/issues/68916). ([#​85190](https://github.com/openclaw/openclaw/issues/85190)) Thanks [@​openperf](https://github.com/openperf). - Codex app-server: block native execution paths when OpenClaw exec resolves to a node host while preserving the first-party CLI node binding path. Fixes [#​85012](https://github.com/openclaw/openclaw/issues/85012). ([#​85534](https://github.com/openclaw/openclaw/issues/85534)) Thanks [@​joshavant](https://github.com/joshavant). - Diagnostics: bound cleanup timeout detail logs, emit drop summaries when async diagnostic bursts exceed the queue cap, and surface async queue drops through diagnostic telemetry. - Agents/subagents: surface blocked child-run completions as errors instead of successful subagent finishes. ([#​80886](https://github.com/openclaw/openclaw/issues/80886)) Thanks [@​TurboTheTurtle](https://github.com/TurboTheTurtle). - Context engines: fail closed with a descriptive error when the selected agent runtime cannot satisfy declared context-engine host requirements. - Agents/Pi: treat accepted embedded `sessions_spawn` child-session handoffs as terminal progress so parent turns no longer report false non-deliverable failures. ([#​85054](https://github.com/openclaw/openclaw/issues/85054)) Thanks [@​samzong](https://github.com/samzong). - CLI/models: resolve `openclaw models set` aliases from the runtime config while keeping authored aliases ahead of runtime-only defaults. ([#​83262](https://github.com/openclaw/openclaw/issues/83262)) Thanks [@​IWhatsskill](https://github.com/IWhatsskill). - Doctor: show personal Codex CLI asset notices as info instead of warnings. Fixes [#​84859](https://github.com/openclaw/openclaw/issues/84859). - WhatsApp: update Baileys to `7.0.0-rc13` and drop the obsolete logger type patch. - CLI/update: pre-pack GitHub/git package update targets before the staged npm install, restoring `openclaw update --tag main` for one-off package updates. ([#​81296](https://github.com/openclaw/openclaw/issues/81296)) Thanks [@​fuller-stack-dev](https://github.com/fuller-stack-dev). - Gateway: mirror successful same-source message-tool sends into session transcripts so delivered replies stay in later history/context. ([#​84837](https://github.com/openclaw/openclaw/issues/84837)) Thanks [@​iFiras-Max1](https://github.com/iFiras-Max1). - Media generation: keep image, music, and video completion delivery from duplicating or losing task ownership when generated media finishes through active session replies. ([#​84006](https://github.com/openclaw/openclaw/issues/84006)) Thanks [@​fuller-stack-dev](https://github.com/fuller-stack-dev). - Infra/json: retry transient `File changed during read` races while loading JSON state so config and state reads recover instead of failing the turn. ([#​84285](https://github.com/openclaw/openclaw/issues/84285)) - Plugins/providers: fail closed for workspace provider plugins during setup-mode discovery unless explicitly trusted, preventing untrusted workspace plugin code from running during provider setup. ([#​81069](https://github.com/openclaw/openclaw/issues/81069)) Thanks [@​mmaps](https://github.com/mmaps). - Providers/Ollama: resolve configured Ollama Cloud `OLLAMA_API_KEY` markers to the real discovery key so cloud provider entries keep authenticated model catalog access. ([#​85037](https://github.com/openclaw/openclaw/issues/85037)) - Discord: keep persistent component registry fallback warnings actionable by forwarding structured error and cause metadata through the runtime logger. Fixes [#​84185](https://github.com/openclaw/openclaw/issues/84185). ([#​84190](https://github.com/openclaw/openclaw/issues/84190)) Thanks [@​100menotu001](https://github.com/100menotu001). - Gateway/sessions: preserve compatible session auth profile overrides when switching models within the same provider, including provider-auth aliases. Fixes [#​81837](https://github.com/openclaw/openclaw/issues/81837). ([#​81886](https://github.com/openclaw/openclaw/issues/81886)) Thanks [@​TurboTheTurtle](https://github.com/TurboTheTurtle). - Gateway/status: surface inbound delivery telemetry counters and transport-liveness warnings in `openclaw status --all`. Fixes [#​49577](https://github.com/openclaw/openclaw/issues/49577). ([#​72724](https://github.com/openclaw/openclaw/issues/72724)) - Docker: prune package-excluded plugin source workspaces and dependency closures so runtime images do not keep packages for plugins that were not opted in. - Providers/Ollama: treat Docker/OrbStack host aliases as local Ollama endpoints so `ollama-local` marker auth works when OpenClaw runs inside a VM/container and Ollama runs on the host. Fixes [#​84875](https://github.com/openclaw/openclaw/issues/84875). - QA-Lab: keep explicitly searchable/deferred OpenClaw dynamic tool rows report-only by default so tool-coverage gates do not treat mock discovery gaps as hard product failures. ([#​80319](https://github.com/openclaw/openclaw/issues/80319)) Thanks [@​100yenadmin](https://github.com/100yenadmin). - Agents/config: keep non-Google provider model refs from being rewritten by Google Gemini preview-id normalization. ([#​84762](https://github.com/openclaw/openclaw/issues/84762)) Thanks [@​zhangguiping-xydt](https://github.com/zhangguiping-xydt). - Installer: require a real controlling terminal before launching onboarding so headless `curl | bash` installs finish cleanly after installing the CLI. - Agents/Codex: promote a completed final assistant response when a prompt timeout races Codex app-server completion instead of returning an empty timeout envelope. Refs [#​84516](https://github.com/openclaw/openclaw/issues/84516). - Codex app-server: keep interrupted turn statuses from being treated as OpenClaw aborts by themselves, so tool-only turns remain eligible for no-visible-answer recovery. Fixes [#​84492](https://github.com/openclaw/openclaw/issues/84492). - Agents: cap heartbeat model bleed context hints by the stored session window when runtime model metadata is unavailable, so overflow recovery advice does not suggest a larger window than the active session actually has. - Control UI/Web Push: use `https://openclaw.ai` as the generated default VAPID subject instead of the old localhost mailbox so iOS PWA push setup uses an Apple-acceptable subject when `OPENCLAW_VAPID_SUBJECT` is unset. Fixes [#​83134](https://github.com/openclaw/openclaw/issues/83134). ([#​83317](https://github.com/openclaw/openclaw/issues/83317)) Thanks [@​IWhatsskill](https://github.com/IWhatsskill). - Control UI: distinguish inherited thinking-off settings from explicit Off selections so the thinking selector no longer shows two identical Off rows. ([#​85223](https://github.com/openclaw/openclaw/issues/85223)) Thanks [@​amknight](https://github.com/amknight). - Agents/Pi: keep embedded session transcript writes from tripping false takeover detection after packaged npm onboarding agent turns. - Codex/TUI: surface Codex-native post-turn compaction failures instead of continuing uncompacted, and keep successful native compaction serialized before local idle/next-turn handling. Fixes [#​84305](https://github.com/openclaw/openclaw/issues/84305). ([#​85160](https://github.com/openclaw/openclaw/issues/85160)) Thanks [@​joshavant](https://github.com/joshavant). - Memory/search: stop recall tracking from writing dreaming side-effect artifacts when `dreaming.enabled=false`, while preserving normal search results. Fixes [#​84436](https://github.com/openclaw/openclaw/issues/84436). ([#​84444](https://github.com/openclaw/openclaw/issues/84444)) Thanks [@​NianJiuZst](https://github.com/NianJiuZst). - Diffs: render viewer toolbar icons from a closed icon-name map instead of HTML strings, removing the toolbar icon XSS sink. ([#​83955](https://github.com/openclaw/openclaw/issues/83955)) Thanks [@​tanshanshan](https://github.com/tanshanshan). - QA: keep `pnpm qa:e2e` self-check runs inside the private QA runtime envelope even when inherited shell env disables bundled plugins. - fix(config): validate browser sandbox bind sources \[AI]. ([#​84799](https://github.com/openclaw/openclaw/issues/84799)) Thanks [@​pgondhi987](https://github.com/pgondhi987). - doctor: constrain legacy plugin cleanup paths \[AI]. ([#​84801](https://github.com/openclaw/openclaw/issues/84801)) Thanks [@​pgondhi987](https://github.com/pgondhi987). - Update/doctor: prune stale local bundled plugin install records that point at old compiled bundled output so current bundled plugin schemas win after upgrade. ([#​84863](https://github.com/openclaw/openclaw/issues/84863)) Thanks [@​fuller-stack-dev](https://github.com/fuller-stack-dev). - Providers/Ollama: preserve native Ollama tool-call IDs across assistant replay so Gemini over Ollama Cloud can keep its hidden function-call thought-signature handle. - Discord: keep session recovery and `/stop` abort ownership on the source dispatch lane while bound ACP turns continue routing to their target session, so stalled pre-run work and late replies are cleared instead of leaking after stop. Fixes [#​84477](https://github.com/openclaw/openclaw/issues/84477). ([#​85100](https://github.com/openclaw/openclaw/issues/85100)) Thanks [@​joshavant](https://github.com/joshavant). - Codex app-server: mark missing turn completion after observed execution as replay-unsafe and release the session so follow-up turns can run. Fixes [#​84076](https://github.com/openclaw/openclaw/issues/84076). ([#​85107](https://github.com/openclaw/openclaw/issues/85107)) Thanks [@​joshavant](https://github.com/joshavant). - Codex app-server: give visible `message` dynamic tool sends a longer timeout budget so slow channel delivery can return its own result or error instead of hitting the 30-second Codex wrapper. ([#​85216](https://github.com/openclaw/openclaw/issues/85216)) Thanks [@​amknight](https://github.com/amknight). - Codex app-server: add a dedicated post-tool raw assistant completion idle timeout config so trusted heavy turns can wait longer after tool handoff without weakening final assistant release. - Matrix: keep explicitly configured two-person rooms on the room route before stale `m.direct` or strict two-member DM fallback can bypass mention gating. Fixes [#​85017](https://github.com/openclaw/openclaw/issues/85017). ([#​85137](https://github.com/openclaw/openclaw/issues/85137)) Thanks [@​joshavant](https://github.com/joshavant). - Agents/subagents: require explicit subagent allowlist targets to be configured agents so stale deleted-agent ids are omitted from `agents_list` and rejected by `sessions_spawn`. Fixes [#​84811](https://github.com/openclaw/openclaw/issues/84811). ([#​85154](https://github.com/openclaw/openclaw/issues/85154)) Thanks [@​joshavant](https://github.com/joshavant). - PDF tool: time out idle remote PDF body reads after 120 seconds so stalled remote documents return an error instead of wedging the session. Fixes [#​68649](https://github.com/openclaw/openclaw/issues/68649). ([#​84768](https://github.com/openclaw/openclaw/issues/84768)) Thanks [@​luoyanglang](https://github.com/luoyanglang). - Diagnostics/OpenTelemetry plugin: suppress handled OTLP exporter promise rejections so collector shutdowns no longer crash the Gateway. ([#​81085](https://github.com/openclaw/openclaw/issues/81085)) Thanks [@​luoyanglang](https://github.com/luoyanglang). - Agents/exec: omit raw command text and env values from denied exec failure logs while keeping safe correlation metadata. Fixes [#​85049](https://github.com/openclaw/openclaw/issues/85049). ([#​85140](https://github.com/openclaw/openclaw/issues/85140)) Thanks [@​joshavant](https://github.com/joshavant). - Media/audio: skip empty structured sherpa-onnx transcripts instead of treating the raw JSON payload as spoken text. ([#​84667](https://github.com/openclaw/openclaw/issues/84667)) Thanks [@​TurboTheTurtle](https://github.com/TurboTheTurtle). - Agents/exec: preserve inherited XDG base-directory environment values for subprocesses while still rejecting agent-supplied XDG overrides. Fixes [#​84854](https://github.com/openclaw/openclaw/issues/84854). ([#​85139](https://github.com/openclaw/openclaw/issues/85139)) Thanks [@​joshavant](https://github.com/joshavant). - Node/Linux: keep `OPENCLAW_GATEWAY_TOKEN` out of generated systemd unit files by writing node service token values to a node-specific env file. ([#​84408](https://github.com/openclaw/openclaw/issues/84408)) - Memory-core/dreaming: reuse stable narrative subagent session keys per workspace and phase while keeping per-run idempotency and bounded cleanup, so stale `dreaming-narrative-*` sessions do not accumulate. Fixes [#​68252](https://github.com/openclaw/openclaw/issues/68252), [#​69187](https://github.com/openclaw/openclaw/issues/69187), and [#​70402](https://github.com/openclaw/openclaw/issues/70402). ([#​70464](https://github.com/openclaw/openclaw/issues/70464)) Thanks [@​chiyouYCH](https://github.com/chiyouYCH). - Trajectory/support: tol…
…uisition (openclaw#85764) * fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition - Adds optional maxHoldMs parameter to inspectLockPayload - Inspect now marks locks as stale when held longer than maxHoldMs - Passes maxHoldMs through inspectLockPayloadForSession - acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs This ensures that when a live process holds a lock for longer than maxHoldMs (default 5min), other processes can reclaim it during acquisition — matching the watchdog's existing enforcement. Previously shouldReclaim only used staleMs (30min default), meaning a lock held for 10+ minutes by a live PID would never be reclaimable, causing 60s timeout failures and gateway freezes. Closes openclaw#85762 * fix(session-lock): add dead-PID fast-path before retry loop Adds a fast-path check at the top of acquireSessionWriteLock: if the lock file's owner PID is dead, remove it immediately before entering the retry loop. This saves up to timeoutMs (60s) of futile waiting when the previous lock holder has died. The shouldReclaim callback already handles this case, but only iteratively through the retry loop. The fast-path eliminates that unnecessary delay. * fix(session-lock): enforce max hold during acquisition * fix(session-lock): revalidate max hold safely * fix(session-lock): honor holder max-hold policy * fix(session-lock): keep cleanup from reclaiming live holders * fix(session-lock): remove stale locks only when unchanged * fix(session-lock): skip self-held max-hold reclaim * fix(ci): refresh gateway protocol checks --------- Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>
…uisition (openclaw#85764) * fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition - Adds optional maxHoldMs parameter to inspectLockPayload - Inspect now marks locks as stale when held longer than maxHoldMs - Passes maxHoldMs through inspectLockPayloadForSession - acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs This ensures that when a live process holds a lock for longer than maxHoldMs (default 5min), other processes can reclaim it during acquisition — matching the watchdog's existing enforcement. Previously shouldReclaim only used staleMs (30min default), meaning a lock held for 10+ minutes by a live PID would never be reclaimable, causing 60s timeout failures and gateway freezes. Closes openclaw#85762 * fix(session-lock): add dead-PID fast-path before retry loop Adds a fast-path check at the top of acquireSessionWriteLock: if the lock file's owner PID is dead, remove it immediately before entering the retry loop. This saves up to timeoutMs (60s) of futile waiting when the previous lock holder has died. The shouldReclaim callback already handles this case, but only iteratively through the retry loop. The fast-path eliminates that unnecessary delay. * fix(session-lock): enforce max hold during acquisition * fix(session-lock): revalidate max hold safely * fix(session-lock): honor holder max-hold policy * fix(session-lock): keep cleanup from reclaiming live holders * fix(session-lock): remove stale locks only when unchanged * fix(session-lock): skip self-held max-hold reclaim * fix(ci): refresh gateway protocol checks --------- Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>
…uisition (openclaw#85764) * fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition - Adds optional maxHoldMs parameter to inspectLockPayload - Inspect now marks locks as stale when held longer than maxHoldMs - Passes maxHoldMs through inspectLockPayloadForSession - acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs This ensures that when a live process holds a lock for longer than maxHoldMs (default 5min), other processes can reclaim it during acquisition — matching the watchdog's existing enforcement. Previously shouldReclaim only used staleMs (30min default), meaning a lock held for 10+ minutes by a live PID would never be reclaimable, causing 60s timeout failures and gateway freezes. Closes openclaw#85762 * fix(session-lock): add dead-PID fast-path before retry loop Adds a fast-path check at the top of acquireSessionWriteLock: if the lock file's owner PID is dead, remove it immediately before entering the retry loop. This saves up to timeoutMs (60s) of futile waiting when the previous lock holder has died. The shouldReclaim callback already handles this case, but only iteratively through the retry loop. The fast-path eliminates that unnecessary delay. * fix(session-lock): enforce max hold during acquisition * fix(session-lock): revalidate max hold safely * fix(session-lock): honor holder max-hold policy * fix(session-lock): keep cleanup from reclaiming live holders * fix(session-lock): remove stale locks only when unchanged * fix(session-lock): skip self-held max-hold reclaim * fix(ci): refresh gateway protocol checks --------- Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>
…uisition (openclaw#85764) * fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition - Adds optional maxHoldMs parameter to inspectLockPayload - Inspect now marks locks as stale when held longer than maxHoldMs - Passes maxHoldMs through inspectLockPayloadForSession - acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs This ensures that when a live process holds a lock for longer than maxHoldMs (default 5min), other processes can reclaim it during acquisition — matching the watchdog's existing enforcement. Previously shouldReclaim only used staleMs (30min default), meaning a lock held for 10+ minutes by a live PID would never be reclaimable, causing 60s timeout failures and gateway freezes. Closes openclaw#85762 * fix(session-lock): add dead-PID fast-path before retry loop Adds a fast-path check at the top of acquireSessionWriteLock: if the lock file's owner PID is dead, remove it immediately before entering the retry loop. This saves up to timeoutMs (60s) of futile waiting when the previous lock holder has died. The shouldReclaim callback already handles this case, but only iteratively through the retry loop. The fast-path eliminates that unnecessary delay. * fix(session-lock): enforce max hold during acquisition * fix(session-lock): revalidate max hold safely * fix(session-lock): honor holder max-hold policy * fix(session-lock): keep cleanup from reclaiming live holders * fix(session-lock): remove stale locks only when unchanged * fix(session-lock): skip self-held max-hold reclaim * fix(ci): refresh gateway protocol checks --------- Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>
…uisition (openclaw#85764) * fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition - Adds optional maxHoldMs parameter to inspectLockPayload - Inspect now marks locks as stale when held longer than maxHoldMs - Passes maxHoldMs through inspectLockPayloadForSession - acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs This ensures that when a live process holds a lock for longer than maxHoldMs (default 5min), other processes can reclaim it during acquisition — matching the watchdog's existing enforcement. Previously shouldReclaim only used staleMs (30min default), meaning a lock held for 10+ minutes by a live PID would never be reclaimable, causing 60s timeout failures and gateway freezes. Closes openclaw#85762 * fix(session-lock): add dead-PID fast-path before retry loop Adds a fast-path check at the top of acquireSessionWriteLock: if the lock file's owner PID is dead, remove it immediately before entering the retry loop. This saves up to timeoutMs (60s) of futile waiting when the previous lock holder has died. The shouldReclaim callback already handles this case, but only iteratively through the retry loop. The fast-path eliminates that unnecessary delay. * fix(session-lock): enforce max hold during acquisition * fix(session-lock): revalidate max hold safely * fix(session-lock): honor holder max-hold policy * fix(session-lock): keep cleanup from reclaiming live holders * fix(session-lock): remove stale locks only when unchanged * fix(session-lock): skip self-held max-hold reclaim * fix(ci): refresh gateway protocol checks --------- Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>
…uisition (openclaw#85764) * fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition - Adds optional maxHoldMs parameter to inspectLockPayload - Inspect now marks locks as stale when held longer than maxHoldMs - Passes maxHoldMs through inspectLockPayloadForSession - acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs This ensures that when a live process holds a lock for longer than maxHoldMs (default 5min), other processes can reclaim it during acquisition — matching the watchdog's existing enforcement. Previously shouldReclaim only used staleMs (30min default), meaning a lock held for 10+ minutes by a live PID would never be reclaimable, causing 60s timeout failures and gateway freezes. Closes openclaw#85762 * fix(session-lock): add dead-PID fast-path before retry loop Adds a fast-path check at the top of acquireSessionWriteLock: if the lock file's owner PID is dead, remove it immediately before entering the retry loop. This saves up to timeoutMs (60s) of futile waiting when the previous lock holder has died. The shouldReclaim callback already handles this case, but only iteratively through the retry loop. The fast-path eliminates that unnecessary delay. * fix(session-lock): enforce max hold during acquisition * fix(session-lock): revalidate max hold safely * fix(session-lock): honor holder max-hold policy * fix(session-lock): keep cleanup from reclaiming live holders * fix(session-lock): remove stale locks only when unchanged * fix(session-lock): skip self-held max-hold reclaim * fix(ci): refresh gateway protocol checks --------- Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>
…uisition (openclaw#85764) * fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition - Adds optional maxHoldMs parameter to inspectLockPayload - Inspect now marks locks as stale when held longer than maxHoldMs - Passes maxHoldMs through inspectLockPayloadForSession - acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs This ensures that when a live process holds a lock for longer than maxHoldMs (default 5min), other processes can reclaim it during acquisition — matching the watchdog's existing enforcement. Previously shouldReclaim only used staleMs (30min default), meaning a lock held for 10+ minutes by a live PID would never be reclaimable, causing 60s timeout failures and gateway freezes. Closes openclaw#85762 * fix(session-lock): add dead-PID fast-path before retry loop Adds a fast-path check at the top of acquireSessionWriteLock: if the lock file's owner PID is dead, remove it immediately before entering the retry loop. This saves up to timeoutMs (60s) of futile waiting when the previous lock holder has died. The shouldReclaim callback already handles this case, but only iteratively through the retry loop. The fast-path eliminates that unnecessary delay. * fix(session-lock): enforce max hold during acquisition * fix(session-lock): revalidate max hold safely * fix(session-lock): honor holder max-hold policy * fix(session-lock): keep cleanup from reclaiming live holders * fix(session-lock): remove stale locks only when unchanged * fix(session-lock): skip self-held max-hold reclaim * fix(ci): refresh gateway protocol checks --------- Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>
…026.5.26) (#682) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [ghcr.io/openclaw/openclaw](https://openclaw.ai) ([source](https://github.com/openclaw/openclaw)) | patch | `2026.5.22` → `2026.5.26` | --- > ⚠️ **Warning** > > Some dependencies could not be looked up. Check the [Dependency Dashboard](issues/567) for more information. --- ### Release Notes <details> <summary>openclaw/openclaw (ghcr.io/openclaw/openclaw)</summary> ### [`v2026.5.26`](https://github.com/openclaw/openclaw/blob/HEAD/CHANGELOG.md#2026526) [Compare Source](https://github.com/openclaw/openclaw/compare/v2026.5.22...v2026.5.26) ##### Highlights - Faster Gateway and replies: startup avoids repeated plugin, channel, session, usage-cost, warning, scheduled-service, and filesystem scans; visible replies separate user-facing sends from slower follow-up work; Gateway runtime/session caches churn less under load. - Transcripts are core: transcript-backed meeting summaries, source-provider chunks, cleaned user turns, media provenance, Codex mirrors, WebChat replies, and CLI/TUI replay now use one more reliable transcript path. - More channels are production-ready: Telegram keeps typing/progress context and forum topics, iMessage handles attachment roots, remote media staging, and duplicate local Messages sources, WhatsApp restores group/media behavior, Discord improves voice playback and model picking, and Signal/iMessage/WhatsApp get reaction approvals. - Better voice and Talk: realtime Talk runs can be inspected, steered, cancelled, or followed up from Web UI and Discord voice; wake-name handling is more tolerant without letting ambient speech trigger agents. - Safer content boundaries: Browser snapshot reads honor SSRF policy, system-event text cannot spoof nested prompt markers, fetched file text is wrapped as external content, ClickClack inbound sender allowlists run before agent dispatch, stale device tokens are rejected, and serialized tool-call text is scrubbed from replies. - Providers, Codex, and local models are steadier: named auth profiles, OpenAI sampling params, Codex app-server resume/timeout/usage-limit recovery, dynamic tool-schema guards, xAI usage-limit surfacing, Ollama top-p normalization, and local approval resolution reduce provider-specific dead ends. - More reliable install/update/release paths: Alpine installs, trusted runtime fallback roots, stable update channels, Docker/package timeouts, Windows Scheduled Tasks, Windows/macOS proof lanes, Testbox/Crabbox delegation, plugin publish checks, and macOS runner bootstraps all got hardened. - Better observability: Activity tab, gateway secret-prep traces, tool/model stream progress, explicit fast-mode status, systemd Gateway hygiene, OpenTelemetry LLM spans, release performance evidence, and richer telemetry signals make failures easier to inspect. ##### Changes - Transcripts: add core transcript capture and source-provider support for transcript-backed meeting summaries, including the renamed Transcripts docs, CLI surface, source-provider chunks, and cleaned user-turn persistence. - Auth: add named model login profiles and supported credential migration for Hermes, OpenCode, and Codex auth profiles, with explicit opt-out and non-interactive controls. ([#​85667](https://github.com/openclaw/openclaw/issues/85667)) Thanks [@​fuller-stack-dev](https://github.com/fuller-stack-dev). - Diagnostics: trace gateway secret preparation, classify skill/tool usage, surface model stream progress, add OpenTelemetry LLM content spans, and expose alertable telemetry for blocked tools, failover, stale sessions, liveness, oversized payloads, and webhook ingress. ([#​83019](https://github.com/openclaw/openclaw/issues/83019), [#​80370](https://github.com/openclaw/openclaw/issues/80370), [#​86191](https://github.com/openclaw/openclaw/issues/86191)) - Channels: add Signal reaction approvals, iMessage thumb approval reactions, and WhatsApp thumb approval reaction support so mobile approval flows work without textual `/approve` commands. ([#​85894](https://github.com/openclaw/openclaw/issues/85894), [#​85952](https://github.com/openclaw/openclaw/issues/85952), [#​85477](https://github.com/openclaw/openclaw/issues/85477)) - Agents/API: forward OpenAI sampling params through the Gateway and expose estimated context-budget status for active agent runs. ([#​84094](https://github.com/openclaw/openclaw/issues/84094)) - TUI/status: queue prompts submitted while an agent is busy and show explicit fast-mode state plus richer systemd Gateway hygiene in status output. ([#​86722](https://github.com/openclaw/openclaw/issues/86722), [#​87115](https://github.com/openclaw/openclaw/issues/87115), [#​86976](https://github.com/openclaw/openclaw/issues/86976)) - Exec approvals: hide durable approval actions that are unavailable for the current prompt and keep approval runtime tokens local-only so stale prompts cannot offer misleading controls. ([#​86270](https://github.com/openclaw/openclaw/issues/86270), [#​86359](https://github.com/openclaw/openclaw/issues/86359)) - Plugin SDK: add reaction approval helpers and keep diagnostic event root exports discoverable across function-name and alias-bound module graphs. ([#​86735](https://github.com/openclaw/openclaw/issues/86735), [#​87084](https://github.com/openclaw/openclaw/issues/87084)) - Android/iOS: add the Android pair-new-gateway action and improve mobile Talk mode surfaces, including iOS realtime Talk mode and Android offline voice/gateway recovery. ([#​86798](https://github.com/openclaw/openclaw/issues/86798), [#​86355](https://github.com/openclaw/openclaw/issues/86355)) Thanks [@​ngutman](https://github.com/ngutman). - Performance: cache plugin metadata snapshots, package realpaths, stable gateway metadata, model cost indexes, channel resolution, usage-cost indexes, and session/auth hot-path facts so common Gateway and reply paths do less rediscovery. ([#​84649](https://github.com/openclaw/openclaw/issues/84649), [#​85843](https://github.com/openclaw/openclaw/issues/85843), [#​86517](https://github.com/openclaw/openclaw/issues/86517), [#​86678](https://github.com/openclaw/openclaw/issues/86678)) - Voice: expose shared realtime turn-context tracking through the realtime voice SDK and reuse it for Discord speaker attribution and wake-name context recovery. - Voice: reuse shared realtime output activity tracking in Google Meet command and node audio bridges, including recent-output checks for local barge-in detection. - Voice: expose shared realtime output activity tracking through the realtime voice SDK and reuse it for Discord playback activity and barge-in decisions. - Voice: expose shared realtime consult question matching, speakable-result extraction, and alias-aware forced-consult coordination through the realtime voice SDK, then reuse it in Gateway Talk, Voice Call, and Discord voice paths. - Voice: share activation-name matching and consult-transcript screening through the realtime voice SDK so Discord, browser voice, and meeting surfaces can reuse one implementation. - Cron: default `cron.maxConcurrentRuns` to 8 so scheduled automations and their isolated agent turns can make progress in parallel without explicit configuration. - QA-Lab: add `qa coverage --match <query>` so focused proof selection can discover matching scenarios from existing metadata before running live or remote lanes. - Discord/model picker: surface an alpha-bucket select (e.g. `A–G (12) · H–N (18) · O–Z (5)`) when the provider list or a provider's model list exceeds 25 items, so configs with `provider/*` wildcards stay one click from the right page instead of paginating through prev/next; falls back to numeric chunks when every item shares the same first letter. - Control UI: add an ephemeral Activity tab for sanitized live tool activity summaries without persisting raw telemetry. Fixes [#​12831](https://github.com/openclaw/openclaw/issues/12831). Thanks [@​BunsDev](https://github.com/BunsDev). - Build: include `ui:build` in the `full` and `ciArtifacts` profiles of `scripts/build-all.mjs` so `pnpm build` always rebuilds `dist/control-ui` after `tsdown` cleans `dist`, removing the second-command requirement and the missing-asset failure mode for source/runtime installs and CI artifact uploads. ([#​85206](https://github.com/openclaw/openclaw/issues/85206)) - iOS: improve Talk mode with direct realtime voice sessions, compact toolbar status, and responsive voice waveform feedback. ([#​86355](https://github.com/openclaw/openclaw/issues/86355)) Thanks [@​ngutman](https://github.com/ngutman). - Media: replace the Sharp image backend with Rastermill for metadata, resizing, EXIF orientation, and PNG alpha-preserving optimization so OpenClaw no longer installs Sharp or the WhatsApp Jimp fallback for image processing. ([#​86437](https://github.com/openclaw/openclaw/issues/86437)) - Codex: update the bundled Codex CLI to 0.134.0 and keep native compaction disabled for budget-triggered app-server turns so OpenClaw owns the recovery boundary. ([#​86772](https://github.com/openclaw/openclaw/issues/86772)) ##### Fixes - Memory/security: reject prompt-like text submitted through the explicit `memory_store` tool before embedding or storage, matching the existing auto-capture prompt-injection filter. ([#​87142](https://github.com/openclaw/openclaw/issues/87142)) - Gateway/security: enable the default auth rate limiter for remote non-browser and HTTP gateway auth failures when `gateway.auth.rateLimit` is unset, while preserving the loopback exemption. ([#​87148](https://github.com/openclaw/openclaw/issues/87148)) - Prompt hardening: route untrusted group prompt metadata through sanitized untrusted structured context while preserving trusted operator-configured group system prompts and aligning the plugin SDK docs/test helpers. ([#​87144](https://github.com/openclaw/openclaw/issues/87144)) - Security/content boundaries: validate Browser snapshot tab URLs against SSRF policy before ChromeMCP or direct CDP reads, sanitize queued system-event text so untrusted plugin/channel labels cannot spoof nested prompt markers, wrap fetched file text and metadata as external content, apply ClickClack `allowFrom` sender allowlists before agent dispatch, reject RPCs from invalidated device-token clients during rotation, require staged sandbox media refs, and scrub serialized tool-call text from replies. ([#​78526](https://github.com/openclaw/openclaw/issues/78526), [#​87094](https://github.com/openclaw/openclaw/issues/87094), [#​87062](https://github.com/openclaw/openclaw/issues/87062), [#​83741](https://github.com/openclaw/openclaw/issues/83741), [#​70707](https://github.com/openclaw/openclaw/issues/70707), [#​86924](https://github.com/openclaw/openclaw/issues/86924)) Thanks [@​zsxsoft](https://github.com/zsxsoft), [@​ttzero25](https://github.com/ttzero25), and [@​mmaps](https://github.com/mmaps). - Transcripts/user turns: persist CLI, WebChat, media, follow-up, hook, and Codex-mirror user turns to the admitted session target; keep cleaned transcript text, inline image routing, provenance metadata, replay hooks, and fallback paths idempotent when runtimes fail or restart. - TUI/status/onboarding/UI: queue busy TUI prompts instead of dropping them, preserve the configured default model during onboarding, show failed tool results as errors, show config-open failures in Control UI, keep status JSON plugin scans healthy, preserve xAI usage-limit errors locally, and expose explicit fast-mode/systemd state. ([#​86722](https://github.com/openclaw/openclaw/issues/86722), [#​87000](https://github.com/openclaw/openclaw/issues/87000), [#​85786](https://github.com/openclaw/openclaw/issues/85786), [#​87108](https://github.com/openclaw/openclaw/issues/87108), [#​87001](https://github.com/openclaw/openclaw/issues/87001), [#​86614](https://github.com/openclaw/openclaw/issues/86614), [#​87115](https://github.com/openclaw/openclaw/issues/87115), [#​86976](https://github.com/openclaw/openclaw/issues/86976)) - Plugin commands/SDK: preserve plugin LLM command auth, bind native plugin command dispatch to the host agent's LLM auth, keep `onDiagnosticEvent` exports discoverable through `Function.name`, stabilize diagnostic event root aliases, correlate pathless read diagnostics, suppress transient runner failures in channel command paths, and repair local approval resolution. ([#​85936](https://github.com/openclaw/openclaw/issues/85936), [#​87084](https://github.com/openclaw/openclaw/issues/87084), [#​86977](https://github.com/openclaw/openclaw/issues/86977), [#​87069](https://github.com/openclaw/openclaw/issues/87069), [#​86771](https://github.com/openclaw/openclaw/issues/86771)) - Codex/providers: keep WebChat delivery hints out of user prompts, avoid false queued-terminal idle timeouts, share the native hook relay registry, quarantine unsupported dynamic tool schemas, preserve Claude resumed-session system prompts, normalize greedy Ollama `top_p`, preserve per-agent thinking defaults for ingress runs, and avoid native compaction takeover on budget-triggered Codex turns. ([#​87096](https://github.com/openclaw/openclaw/issues/87096), [#​73950](https://github.com/openclaw/openclaw/issues/73950), [#​87049](https://github.com/openclaw/openclaw/issues/87049), [#​86689](https://github.com/openclaw/openclaw/issues/86689), [#​86772](https://github.com/openclaw/openclaw/issues/86772)) - Gateway/perf/release: reuse startup-warning metadata and prepared auth stores, avoid cloning live-switch and lifecycle session caches on read paths, defer warning and scheduled-service fallback imports, trim Gateway session/startup/runtime CPU churn, skip duplicate turn session touches, stop chat timeout fallback cascades, drop stale subagent announce history, bound benchmark/watch/kitchen-sink teardown waits, bound macOS/package/onboarding/plugin smoke commands, bound install finalization probes, resolve Parallels npm-update commands from guest `PATH`, and bootstrap raw AWS macOS Node/pnpm commands through `/usr/bin/env`. ([#​86997](https://github.com/openclaw/openclaw/issues/86997)) - Reply/perf: reduce visible reply delivery latency by preserving Telegram typing/progress context, lazy-loading slash-command startup metadata, avoiding hot-path model hydration, flag-gating Codex profiler timing, deferring context compaction maintenance, and tracking delivery timing. ([#​86989](https://github.com/openclaw/openclaw/issues/86989), [#​86990](https://github.com/openclaw/openclaw/issues/86990), [#​86991](https://github.com/openclaw/openclaw/issues/86991), [#​86992](https://github.com/openclaw/openclaw/issues/86992), [#​86993](https://github.com/openclaw/openclaw/issues/86993), [#​86994](https://github.com/openclaw/openclaw/issues/86994)) Thanks [@​keshavbotagent](https://github.com/keshavbotagent). - Reply/source delivery: keep TUI, Control UI, media, TTS, transcript, and Codex source-reply finals live without duplicate terminal events or stale replay artifacts. - Agents/replay: repair legacy tool results before replay, preserve `sessions_spawn` transcript payloads, restore current guard checks, stage sandboxed workspace media, and keep duplicate transcripts tool display metadata from reappearing. ([#​82203](https://github.com/openclaw/openclaw/issues/82203), [#​86934](https://github.com/openclaw/openclaw/issues/86934), [#​87025](https://github.com/openclaw/openclaw/issues/87025)) Thanks [@​martingarramon](https://github.com/martingarramon), [@​vincentkoc](https://github.com/vincentkoc), and [@​joshavant](https://github.com/joshavant). - Agents/sessions: handle active-fallback failures in `sessions_send` so fallback routing reports the real failure and does not leave callers with an ambiguous dropped send. ([#​86638](https://github.com/openclaw/openclaw/issues/86638)) - Agents/hooks/subagents: enforce default hook agent allowlists, recover failed subagent lifecycle completions, and keep node task lifecycle cleanup from closing the Gateway listener. ([#​86101](https://github.com/openclaw/openclaw/issues/86101)) - Codex: project newer OpenClaw chat history into resumed app-server threads and keep Codex turn timeouts inside the Codex runtime boundary so timeouts do not poison shared app-server clients or fall through to unrelated provider fallback. ([#​86677](https://github.com/openclaw/openclaw/issues/86677), [#​86476](https://github.com/openclaw/openclaw/issues/86476)) Thanks [@​TurboTheTurtle](https://github.com/TurboTheTurtle) and [@​pashpashpash](https://github.com/pashpashpash). - Config/doctor/update: narrow profiled tool-section doctor repair, keep runtime-injected legacy web-search provider config out of user-authored config validation, and keep prerelease tags excluded from stable updater resolution. ([#​87030](https://github.com/openclaw/openclaw/issues/87030), [#​86818](https://github.com/openclaw/openclaw/issues/86818), [#​86559](https://github.com/openclaw/openclaw/issues/86559)) Thanks [@​joshavant](https://github.com/joshavant), [@​luoyanglang](https://github.com/luoyanglang), and [@​stevenepalmer](https://github.com/stevenepalmer). - Doctor/runtime: validate active bundled MCP tool schemas through the same runtime projection path so unsupported MCP input schemas are reported and quarantined instead of poisoning assistant startup. - CLI/Windows: add a Windows-only stack-size respawn for stack-heavy startup paths, default CLI logs to local timestamps, and validate timeout/banner TTY state more strictly. ([#​87031](https://github.com/openclaw/openclaw/issues/87031), [#​85387](https://github.com/openclaw/openclaw/issues/85387)) Thanks [@​giodl73-repo](https://github.com/giodl73-repo) and [@​vincentkoc](https://github.com/vincentkoc). - Locking/security: require owner identity proof before stale plugin lock removal, memoize session lock owner arguments, and avoid writing default exec approval stores unless policy state actually changed. ([#​86814](https://github.com/openclaw/openclaw/issues/86814), [#​86964](https://github.com/openclaw/openclaw/issues/86964)) Thanks [@​Alix-007](https://github.com/Alix-007) and [@​vincentkoc](https://github.com/vincentkoc). - Install/release: bound Docker package build, inventory, pack, and tarball preparation with process-group timeouts; pin shrinkwrap patch drift to the pnpm lock; harden macOS restart and dSYM packaging; and run release Docker/live timeout wrappers in the foreground so child processes cannot wedge gates. - QA/Telegram: bound Telegram user credential tar and broker calls so live proof setup fails with a timeout instead of waiting for the outer Crabbox job deadline. - QA/Tool Search: bound gateway E2E HTTP probes, run only the fixture plugin, and clean up temporary fixture trees after the compact tool-catalog proof completes. - Telegram/network: treat `ENETDOWN` as a transient pre-connect network failure so Telegram sends, gateway unhandled-rejection handling, and cron network retries follow the same recovery path as sibling network outages. ([#​86762](https://github.com/openclaw/openclaw/issues/86762)) Thanks [@​TurboTheTurtle](https://github.com/TurboTheTurtle). - Telegram: preserve inbound text entities, overlapping DM replies, account topic cache sidecars, outbound reply context, targeted bot-command mentions, durable group retry targets, forum topic names, and native progress callbacks. ([#​83873](https://github.com/openclaw/openclaw/issues/83873), [#​85361](https://github.com/openclaw/openclaw/issues/85361), [#​85555](https://github.com/openclaw/openclaw/issues/85555), [#​85656](https://github.com/openclaw/openclaw/issues/85656), [#​85709](https://github.com/openclaw/openclaw/issues/85709), [#​86299](https://github.com/openclaw/openclaw/issues/86299), [#​86553](https://github.com/openclaw/openclaw/issues/86553)) Thanks [@​SebTardif](https://github.com/SebTardif), [@​luoyanglang](https://github.com/luoyanglang), and [@​neeravmakwana](https://github.com/neeravmakwana). - iMessage: read image attachments from local Messages attachment roots, dedupe duplicate local Messages-source accounts, seed direct DM history, fix image/group media attachment commands, advance catchup cursors after live handling, and keep slash-command acknowledgements in the source conversation. ([#​82642](https://github.com/openclaw/openclaw/issues/82642), [#​85475](https://github.com/openclaw/openclaw/issues/85475), [#​86569](https://github.com/openclaw/openclaw/issues/86569), [#​86705](https://github.com/openclaw/openclaw/issues/86705), [#​86706](https://github.com/openclaw/openclaw/issues/86706), [#​86770](https://github.com/openclaw/openclaw/issues/86770)) Thanks [@​homer-byte](https://github.com/homer-byte), [@​TurboTheTurtle](https://github.com/TurboTheTurtle), [@​swang430](https://github.com/swang430), and [@​OmarShahine](https://github.com/OmarShahine). - WhatsApp/QQ/Twitch/IRC/Slack: restore WhatsApp ack identity and group-drop warnings, make QQ Bot media respect `OPENCLAW_HOME`, serialize Twitch auth disconnects, store IRC channel routes canonically, and keep Slack downloaded files out of reply media. ([#​83833](https://github.com/openclaw/openclaw/issues/83833), [#​85309](https://github.com/openclaw/openclaw/issues/85309), [#​85777](https://github.com/openclaw/openclaw/issues/85777), [#​85794](https://github.com/openclaw/openclaw/issues/85794), [#​85906](https://github.com/openclaw/openclaw/issues/85906), [#​86318](https://github.com/openclaw/openclaw/issues/86318), [#​86697](https://github.com/openclaw/openclaw/issues/86697)) Thanks [@​sliverp](https://github.com/sliverp), [@​neeravmakwana](https://github.com/neeravmakwana), and [@​Kailigithub](https://github.com/Kailigithub). - Discord/voice: improve voice playback and wake replies, bucket large model picker menus, merge media captions into one message, route metadata through configured proxies, restore numeric channel sends, suppress self-reply echoes, and tighten wake matching without breaking fuzzy wake phrases. ([#​80227](https://github.com/openclaw/openclaw/issues/80227), [#​86238](https://github.com/openclaw/openclaw/issues/86238), [#​86487](https://github.com/openclaw/openclaw/issues/86487), [#​86571](https://github.com/openclaw/openclaw/issues/86571), [#​86595](https://github.com/openclaw/openclaw/issues/86595), [#​86601](https://github.com/openclaw/openclaw/issues/86601)) - Codex: preserve native web-search metadata, keep oversized native thread reuse, bridge CLI API-key auth into the app server, preserve sandbox bootstrap path style, recover context-window prompt errors, honor yolo approval policy, disable native thread personality, and route compaction through Codex auth. ([#​85378](https://github.com/openclaw/openclaw/issues/85378), [#​85542](https://github.com/openclaw/openclaw/issues/85542), [#​85891](https://github.com/openclaw/openclaw/issues/85891), [#​85909](https://github.com/openclaw/openclaw/issues/85909), [#​86408](https://github.com/openclaw/openclaw/issues/86408)) - Agents/runtime: enforce session lock max-hold reclaim, release embedded-attempt locks on all exits, treat aborted subagent runs as terminal, avoid runtime model hydration on hot paths, disclose scoped session list counts, derive overflow budgets from provider errors, and keep fallback errors scoped to the active model candidate. ([#​70473](https://github.com/openclaw/openclaw/issues/70473), [#​85764](https://github.com/openclaw/openclaw/issues/85764), [#​86014](https://github.com/openclaw/openclaw/issues/86014), [#​86134](https://github.com/openclaw/openclaw/issues/86134), [#​86427](https://github.com/openclaw/openclaw/issues/86427), [#​86944](https://github.com/openclaw/openclaw/issues/86944)) Thanks [@​openperf](https://github.com/openperf), [@​fuller-stack-dev](https://github.com/fuller-stack-dev), [@​zhangguiping-xydt](https://github.com/zhangguiping-xydt), and [@​ferminquant](https://github.com/ferminquant). - Config/update/doctor: retry config recovery after failed backup restore, skip shell env fallback on Windows, exclude prerelease tags from the stable git channel, support deep config edits, warn instead of aborting on unreadable cron stores, prune stale bundled plugin paths, and avoid duplicate restart prompts when the Gateway is already healthy. ([#​85739](https://github.com/openclaw/openclaw/issues/85739), [#​85787](https://github.com/openclaw/openclaw/issues/85787), [#​86060](https://github.com/openclaw/openclaw/issues/86060), [#​86260](https://github.com/openclaw/openclaw/issues/86260), [#​86384](https://github.com/openclaw/openclaw/issues/86384), [#​86533](https://github.com/openclaw/openclaw/issues/86533)) Thanks [@​liaoyl830](https://github.com/liaoyl830). - Install/release: support Alpine CLI installs and runtime floors, prefer trusted startup argv runtime fallback roots, reject stale CLI node runtimes, avoid npm `min-release-age` installer failures, bound npm/package/Docker install phases, restore config parent ownership in Docker, seed Docker lockfile package tarballs before prune, make release/plugin prerelease checks fail closed instead of hanging or false-greening, and use host-visible Crabbox local work roots for Docker-backed proof. ([#​85491](https://github.com/openclaw/openclaw/issues/85491)) - Windows daemon: keep Scheduled Task gateway launches running on battery power and avoid workgroup-machine prompts for a domain user during task installation. ([#​59299](https://github.com/openclaw/openclaw/issues/59299)) - Security: avoid printing Gateway tokens in Docker, validate plugin model-pattern regexes safely, escape transcript metadata field names, harden session allowlist glob matching, audit Claude permission overrides under YOLO, and require explicit allow for ACP auto approvals. ([#​85849](https://github.com/openclaw/openclaw/issues/85849), [#​85934](https://github.com/openclaw/openclaw/issues/85934), [#​86046](https://github.com/openclaw/openclaw/issues/86046), [#​86557](https://github.com/openclaw/openclaw/issues/86557)) - Media/images: replace Sharp with Rastermill, keep EXIF normalization best-effort, normalize HEIC/HEIF before image descriptions, route Codex image API keys through OpenAI, preserve image compression metadata, and auto-scale live tool result caps. ([#​85776](https://github.com/openclaw/openclaw/issues/85776), [#​86037](https://github.com/openclaw/openclaw/issues/86037), [#​86437](https://github.com/openclaw/openclaw/issues/86437), [#​86857](https://github.com/openclaw/openclaw/issues/86857), [#​86923](https://github.com/openclaw/openclaw/issues/86923)) - Memory: prevent semantic vector indexes from silently degrading when embeddings are unavailable, stop doctor OOMs on large session stores, preserve sidecar hooks/artifacts, write fallback dream diaries, use CJK-aware dreaming dedupe, and avoid per-file watcher FD fan-out. ([#​80613](https://github.com/openclaw/openclaw/issues/80613), [#​82928](https://github.com/openclaw/openclaw/issues/82928), [#​85060](https://github.com/openclaw/openclaw/issues/85060), [#​85704](https://github.com/openclaw/openclaw/issues/85704), [#​85967](https://github.com/openclaw/openclaw/issues/85967), [#​86701](https://github.com/openclaw/openclaw/issues/86701)) Thanks [@​brokemac79](https://github.com/brokemac79), [@​openperf](https://github.com/openperf), and [@​yaaboo-gif](https://github.com/yaaboo-gif). - Agents/sessions: include visibility metadata on restricted `sessions_list` results so scoped counts are clearly reported without widening access or exposing hidden-session counts. ([#​86944](https://github.com/openclaw/openclaw/issues/86944)) Thanks [@​ferminquant](https://github.com/ferminquant). - Gateway/DNS: validate wide-area discovery domains before deriving zone paths or writing zone files, so invalid `discovery.wideArea.domain` and `dns setup --domain` values fail with a DNS-name diagnostic instead of falling through to unrelated configuration errors. Thanks [@​mmaps](https://github.com/mmaps). - Agents/BTW: route fallback side-question streams through the embedded stream resolver so Anthropic-compatible MiniMax requests use the same capped transport as normal chat. ([#​86312](https://github.com/openclaw/openclaw/issues/86312)) Thanks [@​neeravmakwana](https://github.com/neeravmakwana). - Telegram: treat `/command@TargetBot` bot-command entities as explicit mentions for the addressed bot so `requireMention` groups no longer drop targeted commands or captions. Fixes [#​84462](https://github.com/openclaw/openclaw/issues/84462). ([#​86553](https://github.com/openclaw/openclaw/issues/86553)) Thanks [@​luoyanglang](https://github.com/luoyanglang). - CI: bound Docker/Bash E2E tarball npm installs with `OPENCLAW_E2E_NPM_INSTALL_TIMEOUT` so package, onboarding, plugin, and upgrade lanes fail instead of hanging on a stuck npm install. - CI: fail Parallels npm-update smoke jobs after the guest command timeout and cleanup backstop instead of only logging a timeout line. - CI: bound kitchen-sink RPC HTTP probes so stalled gateway readiness or response bodies fail and retry instead of wedging the walker. - CI: bound Telegram user Crabbox proof Bot API calls so stalled Telegram responses fail instead of wedging credential and desktop proof cleanup. - CI: bound MCP channel stdio client initialization so Docker channel proof fails and closes the bridge transport instead of waiting for the outer job timeout. - CI: keep `OPENCLAW_TESTBOX=1 pnpm check:changed` delegating to Blacksmith Testbox through Crabbox without forwarding local Testbox or worker env into the remote command. - CI: send KILL after the TERM grace period for manual checkout fetch timeouts so stuck Testbox and workflow checkout retries cannot hang behind a wedged `git fetch`. - CI: send KILL after the TERM grace period for Bun global install smoke command timeouts so trapped `openclaw` child processes cannot wedge the scheduled install smoke. - iMessage: thread current channel/account inbound attachment roots into the image tool so iMessage-saved attachments under `~/Library/Messages/Attachments` (including the wildcard `/Users/*/Library/Messages/Attachments` root) are read through the existing inbound path policy instead of being rejected as `path-not-allowed`. Literal `localRoots` stays workspace-scoped. Fixes [#​30170](https://github.com/openclaw/openclaw/issues/30170). ([#​86569](https://github.com/openclaw/openclaw/issues/86569)) - QQ Bot: respect `OPENCLAW_HOME` for outbound media path resolution so `<qqmedia>` sends no longer silently fail when `HOME` and `OPENCLAW_HOME` differ (Docker / multi-user hosts). Persisted QQ Bot data (sessions, known users, refs) stays anchored on the OS home for upgrade compatibility. Fixes [#​83562](https://github.com/openclaw/openclaw/issues/83562). Thanks [@​sliverp](https://github.com/sliverp). - Update: report the primary malformed `openclaw.extensions` payload error without adding a duplicate missing-main diagnostic. ([#​86596](https://github.com/openclaw/openclaw/issues/86596)) Thanks [@​ferminquant](https://github.com/ferminquant). - Control UI: keep host-local Markdown file paths inert while preserving app-relative links. ([#​86620](https://github.com/openclaw/openclaw/issues/86620)) Thanks [@​BryanTegomoh](https://github.com/BryanTegomoh). - Gateway: dampen repeated unauthenticated device-required probes per URL while preserving explicit-auth and paired recovery paths. ([#​86575](https://github.com/openclaw/openclaw/issues/86575)) Thanks [@​ferminquant](https://github.com/ferminquant). - IRC: store inbound channel routes with the canonical `channel:#name` target and join transient channel sends before writing. ([#​85906](https://github.com/openclaw/openclaw/issues/85906)) Thanks [@​Kailigithub](https://github.com/Kailigithub). - Usage: surface unknown all-zero model pricing as missing cost entries instead of a confident `$0` total. ([#​85882](https://github.com/openclaw/openclaw/issues/85882)) Thanks [@​MichaelZelbel](https://github.com/MichaelZelbel). - Agents/Codex: honor yolo app-server approval policy only for the full `never` plus `danger-full-access` case. ([#​85909](https://github.com/openclaw/openclaw/issues/85909)) Thanks [@​earlvanze](https://github.com/earlvanze). - Gateway/Gmail: clear Gmail watcher renewal intervals on re-entry so hot reloads do not leak lifecycle timers. ([#​82947](https://github.com/openclaw/openclaw/issues/82947)) Thanks [@​SebTardif](https://github.com/SebTardif). - Logging: exit cleanly on broken stdout/stderr pipes without masking existing failure exit codes. ([#​80059](https://github.com/openclaw/openclaw/issues/80059)) Thanks [@​pavelzak](https://github.com/pavelzak). - Gateway/security: escape transcript metadata field names while extracting oversized session line prefixes. ([#​85934](https://github.com/openclaw/openclaw/issues/85934)) Thanks [@​SebTardif](https://github.com/SebTardif). - Plugins/security: validate manifest model pattern regexes with the safe-regex compiler so unsafe patterns are ignored before matching. ([#​86046](https://github.com/openclaw/openclaw/issues/86046)) Thanks [@​SebTardif](https://github.com/SebTardif). - Discord: route gateway metadata REST lookups through the configured Discord proxy so proxied accounts do not fall back to direct `discord.com` connections before opening the WebSocket. Fixes [#​80227](https://github.com/openclaw/openclaw/issues/80227). Thanks [@​Clivilwalker](https://github.com/Clivilwalker). - Agents/media: hydrate current-turn image attachments from filename-derived MIME types so active vision can see generated or forwarded images whose source omitted an image content type. ([#​84812](https://github.com/openclaw/openclaw/issues/84812)) Thanks [@​marchpure](https://github.com/marchpure). - Agents/fs: point workspace-only scratch-path guidance at in-workspace temp directories while keeping host-root writes rejected by the tool guard. ([#​86501](https://github.com/openclaw/openclaw/issues/86501)) Thanks [@​tianxiaochannel-oss88](https://github.com/tianxiaochannel-oss88). - Agents/media: keep async cron media completions scoped to their run session while preserving direct delivery for stale generated-media success and failure notifications. ([#​86529](https://github.com/openclaw/openclaw/issues/86529)) Thanks [@​ai-hpc](https://github.com/ai-hpc). - Gateway: emit plugin `session_end`/`session_start` hooks when `agent.send` rotates or replaces a session id, keeping hook lifecycle state aligned with `sessions.changed` notifications. Fixes [#​83507](https://github.com/openclaw/openclaw/issues/83507). ([#​85875](https://github.com/openclaw/openclaw/issues/85875)) Thanks [@​brokemac79](https://github.com/brokemac79). - OpenShell/SSH: reject malformed generated exec commands before sandbox/session setup so unresolved workflow placeholders fail fast instead of reaching the remote shell. Fixes [#​72373](https://github.com/openclaw/openclaw/issues/72373). Thanks [@​brokemac79](https://github.com/brokemac79). - Google: stop normalizing `gemini-3.1-flash-lite` to the retired preview endpoint and update Flash Lite alias guidance to the GA model id. Fixes [#​86151](https://github.com/openclaw/openclaw/issues/86151). ([#​86240](https://github.com/openclaw/openclaw/issues/86240)) Thanks [@​SebTardif](https://github.com/SebTardif). - Installer: make Alpine apk installs cover Git, verify the Node runtime floor, try `nodejs-current`, and report Alpine version guidance when repositories only provide older Node packages. - Agents/status: prefer the active Claude CLI OAuth auth label over an unused Anthropic env API-key label for equivalent runtime aliases. Fixes [#​80184](https://github.com/openclaw/openclaw/issues/80184). ([#​86570](https://github.com/openclaw/openclaw/issues/86570)) Thanks [@​brokemac79](https://github.com/brokemac79). - Agents/media: send direct fallback for generated media still missing after an active requester wake fails. ([#​85489](https://github.com/openclaw/openclaw/issues/85489)) Thanks [@​fuller-stack-dev](https://github.com/fuller-stack-dev). - Agents: derive overflow compaction budgets from provider-reported and synthetic over-budget token counts so confirmed context overflows compact before retrying. ([#​70473](https://github.com/openclaw/openclaw/issues/70473)) Thanks [@​fuller-stack-dev](https://github.com/fuller-stack-dev). - Agents/Codex: recover Codex context-window prompt errors through overflow compaction and surface reset guidance when recovery is exhausted. ([#​85542](https://github.com/openclaw/openclaw/issues/85542)) Thanks [@​fuller-stack-dev](https://github.com/fuller-stack-dev). - Agents/Codex: allow Codex app-server runs to bootstrap from `CODEX_API_KEY` or `OPENAI_API_KEY` when no Codex auth profile is configured. - Agents/Codex: keep selected Codex runtime routing on OpenAI-Codex while preserving direct OpenAI API-key compaction fallback. ([#​86408](https://github.com/openclaw/openclaw/issues/86408)) Thanks [@​funmerlin](https://github.com/funmerlin) and [@​VACInc](https://github.com/VACInc). - Agent transcript: include OpenClaw agent session logs when finding local transcript candidates. - Crabbox: bootstrap raw AWS macOS shell commands wrapped in absolute `time` paths so RSS probes can run Node and pnpm on fresh macOS runners. - Crabbox: bootstrap raw AWS macOS shell commands even when setup statements precede Node or pnpm usage. - TUI/local: skip unnecessary secret resolution, gateway model catalog loading, bootstrap, and skill scans in explicit local-model runs so startup reaches the model request faster. - Sessions/doctor: load large session stores without clone amplification during read-only doctor checks and reclaim stale `sessions.json.*.tmp` sidecars. Fixes [#​56827](https://github.com/openclaw/openclaw/issues/56827). Thanks [@​openperf](https://github.com/openperf). - Tests: clean successful plugin gateway gauntlet isolated temp roots while keeping an explicit preservation switch for failed/debug runs. - Plugins/perf: reuse derived plugin metadata snapshots for the lifetime of the process so reply-time skill setup no longer rescans plugin metadata on every turn. - Discord/OpenAI voice: keep wake-name master consults using the current speaker context after ignored ambient transcripts and shorten the default capture silence grace. - Doctor: skip redundant Gateway restart prompts when a recent supervisor restart leaves the Gateway healthy. Fixes [#​86518](https://github.com/openclaw/openclaw/issues/86518). ([#​86533](https://github.com/openclaw/openclaw/issues/86533)) Thanks [@​liaoyl830](https://github.com/liaoyl830). - Cron: restore suspended cron lanes to the configured/default concurrency instead of falling back to one after quota or circuit-breaker auto-resume. - Gateway: keep session-only Control UI tool-start mirrors flowing during diagnostic queue pressure instead of silently dropping non-terminal tool updates. - Agents/memory: return optional not-found context for missing date-only daily memory reads instead of logging benign first-run `ENOENT` failures. Fixes [#​82928](https://github.com/openclaw/openclaw/issues/82928). Thanks [@​galiniliev](https://github.com/galiniliev). - Discord: merge streamed text captions into following media block replies so captions and attachments send as one message. ([#​86487](https://github.com/openclaw/openclaw/issues/86487)) Thanks [@​neeravmakwana](https://github.com/neeravmakwana). - Gateway: avoid sending duplicate tool-event frames to Control UI connections that are subscribed by both run and session. - Discord/OpenAI voice: accept broader edge-position fuzzy wake-name transcripts while keeping ambient speech gated. - Discord/OpenAI voice: accept longer leading wake-name mistranscripts such as "Open Club" for OpenClaw. - Agents/OpenAI-compatible: stop ModelStudio-compatible chat requests before sending system/tool-only payloads that have no usable user or assistant turn. ([#​86177](https://github.com/openclaw/openclaw/issues/86177)) Thanks [@​TurboTheTurtle](https://github.com/TurboTheTurtle). - Gateway/plugins: reuse plugin package realpath checks while building installed plugin indexes so startup avoids repeated filesystem resolution work. - Kilo Gateway: send string `stop` sequences as arrays so Kilo accepts OpenAI-compatible chat completions. ([#​86461](https://github.com/openclaw/openclaw/issues/86461)) Thanks [@​SebTardif](https://github.com/SebTardif). - Discord/OpenAI voice: accept leading fuzzy wake-name transcripts such as "Monty" or "Moti" for a Molty agent while keeping ambient speech gated. - Media understanding: convert HEIC and HEIF images to JPEG before image description providers run so iPhone photos work in direct and configured image-description flows. ([#​86037](https://github.com/openclaw/openclaw/issues/86037)) - Agents: release embedded-attempt session locks from outer teardown so post-prompt exceptions cannot wedge later requests behind `SessionWriteLockTimeoutError`. Fixes [#​86014](https://github.com/openclaw/openclaw/issues/86014). Thanks [@​openperf](https://github.com/openperf). - Discord/OpenAI voice: rotate Realtime sessions at provider max duration without logging the expected session-expiry event as an error. - Sessions: skip metadata-only entries during QMD-slugified session lookup so one incomplete row does not block transcript hit resolution. ([#​86327](https://github.com/openclaw/openclaw/issues/86327)) Thanks [@​abnershang](https://github.com/abnershang). - Agents/media: derive bundled plugin local-media trust from plugin tool metadata instead of importing the full plugin registry on subscription paths. ([#​84409](https://github.com/openclaw/openclaw/issues/84409)) Thanks [@​samzong](https://github.com/samzong). - Image tool: keep config-backed custom-provider API keys usable for auto-discovered vision models, including deferred image-tool execution without env keys or auth profiles. ([#​85733](https://github.com/openclaw/openclaw/issues/85733)) - Memory/local embeddings: run local GGUF embeddings in an isolated worker sidecar and degrade to configured fallback or keyword search on worker failure so native embedding crashes do not take down the Gateway. ([#​85348](https://github.com/openclaw/openclaw/issues/85348)) Thanks [@​osolmaz](https://github.com/osolmaz). - Gateway: clear the runtime config snapshot before `SIGUSR1` in-process restarts so config changes survive the next gateway loop. ([#​86388](https://github.com/openclaw/openclaw/issues/86388)) Thanks [@​XuZehan-iCenter](https://github.com/XuZehan-iCenter). - Models: show OAuth delegation markers as configured `models.json` auth while keeping runtime route usability checks strict. ([#​86378](https://github.com/openclaw/openclaw/issues/86378)) Thanks [@​rohitjavvadi](https://github.com/rohitjavvadi). - Cron: seed active scheduled and manual cron task rows with a progress summary so status surfaces do not look blank while jobs run. ([#​86313](https://github.com/openclaw/openclaw/issues/86313)) Thanks [@​ferminquant](https://github.com/ferminquant). - Cron: preserve unsupported persisted cron payload rows during routine store writes while keeping those rows non-runnable. Fixes [#​84922](https://github.com/openclaw/openclaw/issues/84922). ([#​86415](https://github.com/openclaw/openclaw/issues/86415)) Thanks [@​IWhatsskill](https://github.com/IWhatsskill). - Updater: exclude prerelease git tags from stable channel resolution so source updates do not check out newer alpha/rc/preview/canary tags. ([#​86260](https://github.com/openclaw/openclaw/issues/86260)) Thanks [@​stevenepalmer](https://github.com/stevenepalmer). - Security/Audit: flag webhook `hooks.token` reuse of active Gateway password auth in `openclaw security audit` while keeping password-mode startup compatibility. ([#​84338](https://github.com/openclaw/openclaw/issues/84338)) Thanks [@​coygeek](https://github.com/coygeek). - QQBot: derive the outbound reply watchdog from configured agent and provider timeouts so slow local model replies are not cut off at five minutes. Fixes [#​85267](https://github.com/openclaw/openclaw/issues/85267). ([#​85271](https://github.com/openclaw/openclaw/issues/85271)) Thanks [@​SymbolStar](https://github.com/SymbolStar). - Agents/heartbeat: stop heartbeat turns after the first valid `heartbeat_respond` so repeated response loops do not burn tokens. ([#​86357](https://github.com/openclaw/openclaw/issues/86357)) Thanks [@​udaymanish6](https://github.com/udaymanish6). - Tasks: keep retained lost tasks out of default status health counts, explain their cleanup window during maintenance, and prune lost task records after 24 hours instead of the general 7-day terminal retention. - Memory-core: keep REM dreaming focused on live light-staged memories and mark staged entries as considered so old recall history no longer dominates fresh candidates. ([#​86302](https://github.com/openclaw/openclaw/issues/86302)) Thanks [@​SebTardif](https://github.com/SebTardif). - Memory: abort sync instead of downgrading an existing semantic vector index to FTS-only when the configured embedding provider is temporarily unavailable. ([#​85704](https://github.com/openclaw/openclaw/issues/85704)) Thanks [@​yaaboo-gif](https://github.com/yaaboo-gif). - Telegram: propagate forum topic names through the account-scoped topic cache for native command context and topic create/edit actions. ([#​86299](https://github.com/openclaw/openclaw/issues/86299)) Thanks [@​SebTardif](https://github.com/SebTardif). - Slack: keep downloaded read-only files out of reply media so Slack file reads do not echo files back to the conversation. ([#​86318](https://github.com/openclaw/openclaw/issues/86318)) Thanks [@​neeravmakwana](https://github.com/neeravmakwana). - Cron: accept leading-plus relative durations such as `+5m` for one-shot `--at` schedules. ([#​86341](https://github.com/openclaw/openclaw/issues/86341)) Thanks [@​mushuiyu886](https://github.com/mushuiyu886). - Agents/media: preserve async-started media tool metadata so background generation starts no longer surface generic incomplete-turn warnings while replay stays unsafe. ([#​85933](https://github.com/openclaw/openclaw/issues/85933)) Thanks [@​fuller-stack-dev](https://github.com/fuller-stack-dev). - Docker E2E: dedupe scheduler lane resources so npm/service package lanes are not over-counted and serialized unnecessarily. - QA/diagnostics: add a collector-backed OpenTelemetry smoke lane, make the OTLP payload leak check scenario-aware, and keep source QA builds from failing on optional dependency imports resolved through pnpm's temp module path. - Crabbox: bootstrap Git metadata for sparse remote changed gates so raw synced workspaces can run `pnpm check:changed` from the intended diff. - xAI/LM Studio: avoid buffering ordinary bracketed or `final` prose until stream completion while watching for plain-text tool-call fallbacks. - Doctor: warn and continue when the cron job store exists but cannot be read so later health checks still run. Fixes [#​86102](https://github.com/openclaw/openclaw/issues/86102). ([#​86384](https://github.com/openclaw/openclaw/issues/86384)) Thanks [@​1052326311](https://github.com/1052326311). - Discord: suppress a bot's previous reply body and referenced media from prompt context when a user replies to that bot message, while keeping reply metadata for routing. ([#​86238](https://github.com/openclaw/openclaw/issues/86238)) Thanks [@​fuller-stack-dev](https://github.com/fuller-stack-dev). - Discord: restore bare numeric channel IDs for outbound message-tool sends while keeping explicit DM targets unambiguous. ([#​86571](https://github.com/openclaw/openclaw/issues/86571)) Thanks [@​joshavant](https://github.com/joshavant). - Docker E2E: avoid rebuilding the Control UI twice while preparing the shared OpenClaw package tarball for package-backed scenario runs. - Tests: avoid rebuilding the Control UI twice during the installer Docker smoke now that `pnpm build` includes `ui:build`. - Tests: give QA config mutation RPCs enough native Windows budget to finish gateway config writes and restart settle after hot scenario runs. - Tests: keep the gateway restart-inflight QA scenario focused on restart recovery on native Windows by allowing expected embedded prompt handoff errors and using the Windows-safe timeout budget. - QA-Lab: make the synthetic OpenAI provider honor generic `reply exactly:` directives after required kickoff reads so restart-recovery scenarios do not fall through to generic repo-summary prose. - Gateway: abort active `agent` RPC runs during forced restart shutdown so stale in-process turns cannot keep writing a session after the Gateway lifecycle restarts. - Crabbox: sync clean sparse worktrees through a temporary full checkout even when reusing an existing lease so tracked build-time files are not omitted. - Build: route `scripts/ui.js` through the shared pnpm runner and keep Control UI chunking helpers in sparse-included source so native Windows Corepack builds can produce `dist/control-ui`. - Tests: give the memory fallback QA scenario enough turn budget to exercise native Windows gateway runs instead of failing on the client timeout while the mock agent is still dispatching. - Tests: collect QA gateway CPU/RSS metrics on native Windows and give the channel baseline enough turn budget to report slow gateway runs instead of timing out before proof. - Install/update: bypass npm `min-release-age` policies with `--min-release-age=0` instead of `--before` so hosted installers keep working on npm versions that reject the combined config. ([#​84749](https://github.com/openclaw/openclaw/issues/84749)) Thanks [@​TeodoroRodrigo](https://github.com/TeodoroRodrigo). - Diagnostics: reclaim wedged session lanes when stale active-run bookkeeping blocks queued work despite no forward progress. Fixes [#​85639](https://github.com/openclaw/openclaw/issues/85639). Thanks [@​openperf](https://github.com/openperf). - WebChat: keep message-tool replies visible in the chat while still summarizing internal tool results for the model. Fixes [#​86347](https://github.com/openclaw/openclaw/issues/86347). Thanks [@​shakkernerd](https://github.com/shakkernerd). - Gateway/perf: fail startup benchmark samples when the Gateway process exits before benchmark teardown, including signal deaths after readiness probes. - Gateway/perf: fail restart benchmark samples when the Gateway exits before benchmark teardown, including clean exits and signal deaths after successful restart probes. - Agents/tests: keep model catalog visibility on static selection helpers so catalog visibility checks avoid the broad model-selection barrel import. - Agents/commitments: serialize commitment store load-modify-save writes so concurrent heartbeat and CLI updates no longer lose dismissal, sent, or attempt state. ([#​81153](https://github.com/openclaw/openclaw/issues/81153)) Thanks [@​ai-hpc](https://github.com/ai-hpc). - xAI/LM Studio: promote plain-text tool-call fallbacks into structured tool calls and strip leaked internal tool syntax before user-facing delivery. ([#​86222](https://github.com/openclaw/openclaw/issues/86222)) Thanks [@​fuller-stack-dev](https://github.com/fuller-stack-dev). - CLI: suppress benign self-update version-skew warnings during package post-update finalization. - Gateway/perf: tighten restart and startup benchmark failure handling so long profiling runs, failed probes, and fresh Linux runners no longer produce false passing or `n/a` results. - Checks: keep intentional Knip unused-file findings optional so full CI and sparse proof workspaces stay aligned. - Docker: restore writable `~/.config` in runtime images. Fixes [#​85968](https://github.com/openclaw/openclaw/issues/85968). Thanks [@​hkoessler](https://github.com/hkoessler) and [@​Bartok9](https://github.com/Bartok9). - Plugin SDK: keep legacy root diagnostic subscriptions connected when built plugin SDK aliases resolve diagnostic helpers through a separate module graph. - Diagnostics: export alertable OTel and Prometheus signals for blocked tools, model failover, stale sessions, liveness warnings, oversized payloads, and webhook ingress while fixing shared OTLP endpoints with query strings. - Tests: normalize macOS canonical temp paths in exec allowlists, fs-safe trash assertions, installed plugin matching, Telegram topic-name stores, and built ACPX MCP server expectations so native macOS proof runners cover the intended behavior. - Codex/app-server: preserve message-tool-only source reply delivery mode on active runs so sub-agent completion wakeups can steer the active Codex turn instead of being rejected. ([#​86287](https://github.com/openclaw/openclaw/issues/86287)) Thanks [@​ferminquant](https://github.com/ferminquant). - Tests: sample the Windows kitchen-sink RPC gateway directly and serialize RSS probes so native runs keep the memory guard active. - Tests: normalize bundled plugin lifecycle probe paths and state-root lookup so native Windows release sweeps accept valid packaged plugin installs. - Agents/Claude CLI: route live native Bash permission requests through OpenClaw exec policy so Claude turns no longer stall on `control_request`, and document that OpenClaw exec policy is authoritative. Fixes [#​80819](https://github.com/openclaw/openclaw/issues/80819). ([#​86330](https://github.com/openclaw/openclaw/issues/86330), from [#​81971](https://github.com/openclaw/openclaw/issues/81971)) Thanks [@​guthirry](https://github.com/guthirry) and [@​sallyom](https://github.com/sallyom). - Security audit: warn when YOLO OpenClaw exec policy overrides a restrictive raw Claude `--permission-mode` for managed live sessions. ([#​86557](https://github.com/openclaw/openclaw/issues/86557)) Thanks [@​sallyom](https://github.com/sallyom). - Config: keep benign legacy metadata write anomalies out of default doctor and config command output while preserving explicit anomaly logging for diagnostics. - Codex: log when implicit app-server `never` approvals are promoted for OpenClaw tool policy, including whether the trigger was a `before_tool_call` hook or trusted tool policy. - Codex harness: make subscription usage-limit errors without reset times explain that OpenClaw cannot determine the reset and point users to wait until Codex is available, use another Codex account, or switch to another configured model/provider. Thanks [@​amknight](https://github.com/amknight). - Google Vertex: support production ADC modes such as Workload Identity Federation, service-account credentials, and metadata-server ADC for the native Vertex transport. ([#​83971](https://github.com/openclaw/openclaw/issues/83971)) Thanks [@​damianFelixPago](https://github.com/damianFelixPago). - Telegram: route normal `[telegram][diag]` polling diagnostics through `runtime.log` while keeping non-diag warnings and persistence failures on `runtime.error`, so healthy polling startup no longer looks like an error. Fixes [#​82957](https://github.com/openclaw/openclaw/issues/82957). ([#​82958](https://github.com/openclaw/openclaw/issues/82958)) Thanks [@​galiniliev](https://github.com/galiniliev). - Providers/Ollama: strip inline Kimi cloud reasoning prefixes from streamed and final visible replies while keeping ordinary Kimi answers append-only. ([#​86286](https://github.com/openclaw/openclaw/issues/86286)) Thanks [@​jason-allen-oneal](https://github.com/jason-allen-oneal). - Gateway: require Talk secret authority before setup-code handoff can include Talk secrets. ([#​85690](https://github.com/openclaw/openclaw/issues/85690)) Thanks [@​ngutman](https://github.com/ngutman). - Agents: keep fallback error reporting scoped to the active model candidate so stale prior-provider quota/auth text is not reported for later fallback attempts. ([#​86134](https://github.com/openclaw/openclaw/issues/86134)) Thanks [@​zhangguiping-xydt](https://github.com/zhangguiping-xydt). - iMessage: dedupe watcher startup when `channels.imessage.accounts` lists both `default` and a named account that point at the same local Messages source, so the gateway no longer spawns two `imsg rpc` processes or doubles inbound replies; the dedupe is scoped to watcher startup, leaving duplicate accounts addressable for outbound sends, status, and capability listings, and `openclaw doctor` flags the redundant account with a rebinding hint. Fixes [#​65141](https://github.com/openclaw/openclaw/issues/65141). ([#​86705](https://github.com/openclaw/openclaw/issues/86705)) Thanks [@​swang430](https://github.com/swang430). </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xMDEuMSIsInVwZGF0ZWRJblZlciI6IjQzLjEwMS4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL3BhdGNoIl19--> Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/682
…col/test churn - wrappedStreamFn: restructure provider-error-preservation without a throw inside finally (oxlint no-unsafe-finally). Same semantics: always reacquire; prefer the original stream error over a reacquire takeover error; surface reacquire error only when the stream succeeded. - Revert src/gateway/server-methods/agent.test.ts + GatewayModels.swift to the 5.18 baseline: the openclaw#85764 cherry-pick conflict-resolution had pulled in openclaw#85256-era internal-session-effect tests + protocol fields whose implementation isn't in this backport, breaking checks-node-agentic-gateway-methods + checks-fast-bundled-protocol. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(agents): skip fallback for session coordination errors Preserve provider fallback metadata when session coordination errors are nested under provider failures. Co-authored-by: luyao618 <364939526@qq.com> (cherry picked from commit 6a5a135) * fix(agents): tolerate in-process session writes during prompt release (openclaw#84250) Merged via squash. Prepared head SHA: 33f88fe Co-authored-by: tianxiaochannel-oss88 <272340815+tianxiaochannel-oss88@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman (cherry picked from commit 1b77145) * fix(agents): bound embedded compaction write locks Fixes the embedded attempt session write-lock watchdog so the fallback max hold time follows the resolved compaction timeout plus the existing lock grace window, instead of inheriting the full run timeout. Adds regression coverage for the helper and settled-compaction lock lifecycle, plus a changelog entry thanking @luoyanglang. Verification: - `pnpm test src/agents/session-write-lock.test.ts src/agents/pi-embedded-runner/run/attempt.test.ts src/agents/pi-embedded-runner/run/attempt.session-lock.test.ts` - `pnpm check:changed` via Blacksmith Testbox `tbx_01ks8b6vn8se5cg1dfn3te3g47` / https://github.com/openclaw/openclaw/actions/runs/26301988670 - Autoreview clean: `/Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --mode branch --base origin/main` - PR CI green on `79e8c5f1a637981d263c0268bf5666967ff4e778`: https://github.com/openclaw/openclaw/actions/runs/26302152844 and https://github.com/openclaw/openclaw/actions/runs/26302152798 Co-authored-by: luoyanglang <hanwanlonga@gmail.com> (cherry picked from commit 46de078) * fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition (openclaw#85764) * fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition - Adds optional maxHoldMs parameter to inspectLockPayload - Inspect now marks locks as stale when held longer than maxHoldMs - Passes maxHoldMs through inspectLockPayloadForSession - acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs This ensures that when a live process holds a lock for longer than maxHoldMs (default 5min), other processes can reclaim it during acquisition — matching the watchdog's existing enforcement. Previously shouldReclaim only used staleMs (30min default), meaning a lock held for 10+ minutes by a live PID would never be reclaimable, causing 60s timeout failures and gateway freezes. Closes openclaw#85762 * fix(session-lock): add dead-PID fast-path before retry loop Adds a fast-path check at the top of acquireSessionWriteLock: if the lock file's owner PID is dead, remove it immediately before entering the retry loop. This saves up to timeoutMs (60s) of futile waiting when the previous lock holder has died. The shouldReclaim callback already handles this case, but only iteratively through the retry loop. The fast-path eliminates that unnecessary delay. * fix(session-lock): enforce max hold during acquisition * fix(session-lock): revalidate max hold safely * fix(session-lock): honor holder max-hold policy * fix(session-lock): keep cleanup from reclaiming live holders * fix(session-lock): remove stale locks only when unchanged * fix(session-lock): skip self-held max-hold reclaim * fix(ci): refresh gateway protocol checks --------- Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com> Co-authored-by: Peter Steinberger <steipete@gmail.com> (cherry picked from commit a1eb765) * fix(embedded-runner): preserve provider errors on cleanup takeover (openclaw#84321) Summary: - The PR preserves provider-facing embedded-runner prompt errors when cleanup detects session takeover, keeps the takeover signal fatal for fallback, and adds focused regressions. - PR surface: Source +52, Tests +92. Total +144 across 5 files. - Reproducibility: yes. Source inspection shows current main can let cleanup takeover replace a prior prompt/p ... rror and can normalize a provider-looking takeover wrapper before fallback sees it as coordination failure. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(embedded-runner): preserve takeover during fallback - PR branch already contained follow-up commit before automerge: fix(clawsweeper): address review for automerge-openclaw-openclaw-8405… Validation: - ClawSweeper review passed for head 050c779. - Required merge gates passed before the squash merge. Prepared head SHA: 050c779 Review: openclaw#84321 (comment) Co-authored-by: abnershang <abner.shang@gmail.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com> (cherry picked from commit 7fbca96) * fix(agents): release embedded-attempt session lock on every exit path (openclaw#86427) * fix(agents): release embedded-attempt session lock on every exit path The embedded run controller acquires its session write lock eagerly at creation and released it only inside the post-run cleanup block. An exception thrown in post-prompt processing skipped that block, so the lock leaked to the live gateway process until the watchdog reclaimed it and later requests to the session failed with SessionWriteLockTimeoutError. Add an idempotent dispose() to the lock controller and call it from the run's outer finally so the eagerly-held lock is released on every exit path. Normal/aborted/timed-out runs still hand the lock to acquireForCleanup first, so dispose() is a no-op then (no double release). Fixes openclaw#86014 * fix: keep session lock teardown comment lean * docs(changelog): note embedded session lock fix --------- Co-authored-by: Peter Steinberger <steipete@gmail.com> (cherry picked from commit 32ddfc2) * fix(agents): fence yield abort lock release (cherry picked from commit 0fe7479) * fix(agents): memoize session lock owner args Memoize owner process argv lookups per PID during `cleanStaleLockFiles`, and yield between lock entries so startup cleanup does not monopolize the event loop while inspecting many session locks. This keeps lock classification semantics unchanged while avoiding repeated synchronous process-args reads for lock clusters owned by the same PID, especially the Windows PowerShell path. Fixes openclaw#86509. Verification: - `git diff --check origin/main...HEAD` - focused TSX harness against the current-main merge result: `session-lock memo regression harness passed` Thanks @openperf. Co-authored-by: openperf <16864032@qq.com> (cherry picked from commit c430fcd) * fix(diagnostics): recover orphaned session activity Recover idle queued sessions whose diagnostic activity retained stale ownerless model or tool calls by classifying them as recoverable session.stuck after the usual recovery gates. Yield the event loop before stale session-lock process inspection so sync process lookup cannot monopolize lock contention paths. Docs now describe the widened session.stuck telemetry contract for recoverable stale bookkeeping, including ownerless activity. Thanks @samuelsoaress. Refs openclaw#84903. Co-authored-by: samuelsoaress <samuelsoares177778@gmail.com> (cherry picked from commit 286964c) * [FORK][openclaw#86584] gate owned-write publish on pre-append fingerprint (fixes openclaw#86572) Carries unmerged upstream PR openclaw#86584 (HEAD d79a3b4) onto the boon 5.18 base as the same-lane EmbeddedAttemptSessionTakeoverError fence fix for long cron turns. Fails closed: an external mutation before pi's append fails the trust gate and still trips the fence (verified by the PR's 303-line test suite incl. the mixed-interleave negative test). Backfills base symbols openclaw#86584 assumes (introduced upstream between 5.18 and the PR base, not carried by the 9 merged race-fix picks): - session-lock.ts: MAX_BENIGN_SESSION_FENCE_{ADVANCE,REWRITE,REWRITE_RESULT}_BYTES, MAX_SAFE_FILE_OFFSET, TRANSCRIPT_ONLY_OPENCLAW_ASSISTANT_MODELS, SessionFileFenceSnapshot type, fenceSnapshot state var, ActiveWriteLockState type + activeWriteLock store fix (reuse nested writes via {active:true}), node:util + string-normalization imports. - transcript-append.ts: wrap appendSessionTranscriptMessage in runWithOwnedSessionTranscriptWriteLock so low-level appends acquire the owned-context lock. - test import fixes (appendSessionTranscriptMessage, withOwned/bindOwned, __testing). Drop when upstream merges openclaw#86584. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * [FORK][openclaw#86584] wire owned-transcript-write context + typecheck cleanup CRITICAL: wrap promptActiveSession in withOwnedSessionTranscriptWrites and bind onBlockReply/onBlockReplyFlush to the owned context in attempt.ts. Without this, pi's own transcript appends during a prompt are NOT recorded as owned, so the fence trips on them (the exact takeover the backport is meant to prevent). This wiring is an intermediate-base feature (between 5.18 and openclaw#84250's base) the merged picks didn't carry. Tests passed before only because they set the context manually. Also: add releaseHeldLockForAbort to the controller type; drop incidental non-fence suppressAssistantErrorPersistence passes; remove dead async benign-rewrite cluster (sessionFence{Advance,Rewrite}IsBenign + readAppendedSessionFileText + lineMatchesLinearTranscriptMigration + helpers) — our openclaw#84250-based assertSessionFileFence uses the sync owned-write path, so the async benign-detection variants are unreachable. tsgo core: 0 errors. 384 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * [FORK][openclaw#86584] address codex review: prefix-validate benign advance + preserve provider error Finding 2 (masking gap, P2): sessionFenceAdvanceIsBenignSync only validated the APPENDED bytes, so a writer that rewrote the existing prefix AND appended a benign delivery-mirror/gateway-injected line could be laundered as an owned advance — masking a genuine external takeover (silent message loss). Now fail closed unless the current prefix is byte-identical to the trusted readSessionFileFenceSnapshot text (readSessionFilePrefixSync); absent snapshot text => not benign. Finding 1 (provider-error masking, P2): wrappedStreamFn's finally let a reacquireAfterPrompt() takeover error mask the original provider error when the stream itself threw. Now only surface the reacquire error when the stream succeeded; otherwise preserve the original failure. tsgo core: 0 errors. 384 tests pass (benign-advance acceptance + external-mutation rejection both green). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(release): 2026.5.18-boon.1 — session-takeover hardening (boon fleet build) Version bump + CHANGELOG for the fork build. Also fixes a backport test-import gap: attempt.test.ts referenced `attemptTesting` (the __testing export) without importing it. Full project typecheck (tsgo -b tsconfig.projects.json): 0 errors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(ci): no-unsafe-finally in wrappedStreamFn + drop collateral protocol/test churn - wrappedStreamFn: restructure provider-error-preservation without a throw inside finally (oxlint no-unsafe-finally). Same semantics: always reacquire; prefer the original stream error over a reacquire takeover error; surface reacquire error only when the stream succeeded. - Revert src/gateway/server-methods/agent.test.ts + GatewayModels.swift to the 5.18 baseline: the openclaw#85764 cherry-pick conflict-resolution had pulled in openclaw#85256-era internal-session-effect tests + protocol fields whose implementation isn't in this backport, breaking checks-node-agentic-gateway-methods + checks-fast-bundled-protocol. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix: remove vestigial onAssistantErrorMessagePersisted option decls Address cubic P2 review (PR #2): the option was declared on the guard and guard-wrapper option types but never forwarded or invoked, so any provided callback was silently ignored. The companion error-suppression feature (suppressAssistantErrorPersistence + the agent-runner/followup caller chain) is deliberately scoped OUT of this 5.18 backport, so the decls were dead plumbing left over from a cherry-pick. Remove them to keep the option surface honest; the load-bearing beforeMessagePersist fence checkpoint (openclaw#86572) is retained. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Yao <364939526@qq.com> Co-authored-by: xiaotian <tianxiaochannel@gmail.com> Co-authored-by: 狼哥 <hanwanlonga@gmail.com> Co-authored-by: njuboy <njuboy11@gmail.com> Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com> Co-authored-by: Peter Steinberger <steipete@gmail.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: abnershang <abner.shang@gmail.com> Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com> Co-authored-by: Chunyue Wang <80630709+openperf@users.noreply.github.com> Co-authored-by: openperf <16864032@qq.com> Co-authored-by: Samuel Soares da Silva <samuelsoares177778@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…uisition (openclaw#85764) * fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition - Adds optional maxHoldMs parameter to inspectLockPayload - Inspect now marks locks as stale when held longer than maxHoldMs - Passes maxHoldMs through inspectLockPayloadForSession - acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs This ensures that when a live process holds a lock for longer than maxHoldMs (default 5min), other processes can reclaim it during acquisition — matching the watchdog's existing enforcement. Previously shouldReclaim only used staleMs (30min default), meaning a lock held for 10+ minutes by a live PID would never be reclaimable, causing 60s timeout failures and gateway freezes. Closes openclaw#85762 * fix(session-lock): add dead-PID fast-path before retry loop Adds a fast-path check at the top of acquireSessionWriteLock: if the lock file's owner PID is dead, remove it immediately before entering the retry loop. This saves up to timeoutMs (60s) of futile waiting when the previous lock holder has died. The shouldReclaim callback already handles this case, but only iteratively through the retry loop. The fast-path eliminates that unnecessary delay. * fix(session-lock): enforce max hold during acquisition * fix(session-lock): revalidate max hold safely * fix(session-lock): honor holder max-hold policy * fix(session-lock): keep cleanup from reclaiming live holders * fix(session-lock): remove stale locks only when unchanged * fix(session-lock): skip self-held max-hold reclaim * fix(ci): refresh gateway protocol checks --------- Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>
…uisition (openclaw#85764) * fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition - Adds optional maxHoldMs parameter to inspectLockPayload - Inspect now marks locks as stale when held longer than maxHoldMs - Passes maxHoldMs through inspectLockPayloadForSession - acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs This ensures that when a live process holds a lock for longer than maxHoldMs (default 5min), other processes can reclaim it during acquisition — matching the watchdog's existing enforcement. Previously shouldReclaim only used staleMs (30min default), meaning a lock held for 10+ minutes by a live PID would never be reclaimable, causing 60s timeout failures and gateway freezes. Closes openclaw#85762 * fix(session-lock): add dead-PID fast-path before retry loop Adds a fast-path check at the top of acquireSessionWriteLock: if the lock file's owner PID is dead, remove it immediately before entering the retry loop. This saves up to timeoutMs (60s) of futile waiting when the previous lock holder has died. The shouldReclaim callback already handles this case, but only iteratively through the retry loop. The fast-path eliminates that unnecessary delay. * fix(session-lock): enforce max hold during acquisition * fix(session-lock): revalidate max hold safely * fix(session-lock): honor holder max-hold policy * fix(session-lock): keep cleanup from reclaiming live holders * fix(session-lock): remove stale locks only when unchanged * fix(session-lock): skip self-held max-hold reclaim * fix(ci): refresh gateway protocol checks --------- Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>
…uisition (openclaw#85764) * fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition - Adds optional maxHoldMs parameter to inspectLockPayload - Inspect now marks locks as stale when held longer than maxHoldMs - Passes maxHoldMs through inspectLockPayloadForSession - acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs This ensures that when a live process holds a lock for longer than maxHoldMs (default 5min), other processes can reclaim it during acquisition — matching the watchdog's existing enforcement. Previously shouldReclaim only used staleMs (30min default), meaning a lock held for 10+ minutes by a live PID would never be reclaimable, causing 60s timeout failures and gateway freezes. Closes openclaw#85762 * fix(session-lock): add dead-PID fast-path before retry loop Adds a fast-path check at the top of acquireSessionWriteLock: if the lock file's owner PID is dead, remove it immediately before entering the retry loop. This saves up to timeoutMs (60s) of futile waiting when the previous lock holder has died. The shouldReclaim callback already handles this case, but only iteratively through the retry loop. The fast-path eliminates that unnecessary delay. * fix(session-lock): enforce max hold during acquisition * fix(session-lock): revalidate max hold safely * fix(session-lock): honor holder max-hold policy * fix(session-lock): keep cleanup from reclaiming live holders * fix(session-lock): remove stale locks only when unchanged * fix(session-lock): skip self-held max-hold reclaim * fix(ci): refresh gateway protocol checks --------- Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com> Co-authored-by: Peter Steinberger <steipete@gmail.com>
Summary
shouldReclaimcallback only usesstaleMs(30min default), never checksmaxHoldMs(5min default). A live process holding a lock for 10+ minutes blocks all other writers, causingSessionWriteLockTimeoutErrorafter 60s and freezing the gateway.inspectLockPayload: added optionalmaxHoldMsparameter. WhenageMs > maxHoldMs, adds"hold-exceeded"stale reason.inspectLockPayloadForSession: passesmaxHoldMsthrough.removeReportedStaleLockIfStillStale: passesmaxHoldMsthrough.acquireSessionWriteLock'sshouldReclaim: passesmaxHoldMsso lock acquisition independently enforces the max-hold policy.testing.inspectLockPayloadForTest: exported for the new unit test.cleanStaleLockFilesand the watchdog timer continue their independentmaxHoldMsenforcement. This is an additive change — no breaking behavior.Motivation
Observed on a real OpenClaw setup (v2026.5.22-beta.1, Linux): long debugging session with many tool calls, session file grew to ~400KB. Multiple gateway processes tried writing to the same session file. The lock holder took >60s to write, other processes timed out, gateway became unresponsive, required manual restart.
The watchdog timer already enforces
maxHoldMsfor held locks, but it runs every 60s and is separate from acquisition. The acquisition path should independently enforcemaxHoldMsas well.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Real behavior proof (required for external PRs)
Behavior or issue addressed: Session write lock timeout freezes the gateway when large session files cause slow writes. Lock holder (live PID) holds for >60s, other processes timeout at 60s with
SessionWriteLockTimeoutError. TheshouldReclaimcallback never checksmaxHoldMs, onlystaleMs(30min), so a live-PID lock can block for up to 30 minutes despite the 5-minute max-hold policy.Real environment tested: OpenClaw v2026.5.22-beta.1 on Linux VM (Ubuntu 24.04, x64, Node v22.22.2). Gateway running as
openclaw-gatewaysystemd user service. Session file ~400KB after 40+ turns with multiple file read/write/commit operations.Exact steps or command run after this patch:
OPENCLAW_SESSION_WRITE_LOCK_ACQUIRE_TIMEOUT_MS=120000andOPENCLAW_SESSION_WRITE_LOCK_STALE_MS=300000in systemd servicesystemctl --user daemon-reload && systemctl --user restart openclaw-gatewayjournalctl --user -u openclaw-gatewayEvidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output):
Before fix (system journal, 2026-05-23T23:27-23:30 UTC+8):
Lock file and PID inspection confirming dual-process contention on a LIVE (not dead) PID:
Session file size at time of failure:
After fix (same session, same file size, no timeouts):
Observed result after fix: Zero
SessionWriteLockTimeoutErrorevents in subsequent session of equivalent length (40+ turns). Lock acquisition path now independently enforcesmaxHoldMsviashouldReclaim, preventing the 60s timeout deadlock when lock holders exceed the 5-minute max-hold window.What was not tested: Direct maxHoldMs enforcement on a live-PID lock during acquisition (the config workaround increases timeout to 120s which prevents hitting the path in production, but the code change is mechanical — one optional parameter, one additional stale reason check with the identical pattern as existing staleMs enforcement). Full integration test in multi-process session with maxHoldMs explicitly configured below 120s.
Before evidence (optional but encouraged):
The
shouldReclaimcallback inacquireSessionWriteLockonly checkedstaleMs(30min default), nevermaxHoldMs(5min default):Root Cause (if applicable)
inspectLockPayloadwas designed to only checkstaleMs(dead/recycled PID, age > staleMs), but never checkedmaxHoldMs. ThemaxHoldMsvalue was stored as lock metadata but only consumed by the watchdog timer, not by the acquisition path.shouldReclaimshould independently enforcemaxHoldMsin addition tostaleMs, just as the watchdog does.Regression Test Plan (if applicable)
src/agents/session-write-lock.test.tsmaxHoldMsis passed toinspectLockPayloadand age exceeds it, "hold-exceeded" appears in staleReasonsinspectLockPayloadbehavior — the new parameter is optional, so existing tests should pass unchangedNew unit test added by @steipete in
src/agents/session-write-lock.test.ts:All CI checks passing (27 check runs): CodeQL ✅, Opengrep OSS ✅, all agentic/control-plane/runtime checks ✅.
User-visible / Behavior Changes
Session write lock acquisition now considers locks held longer than
maxHoldMs(default 5 minutes) as reclaimable. Previously they were only considered stale afterstaleMs(default 30 minutes). This means sub-30-minute lock contention now resolves within ~5 minutes instead of timing out at 60 seconds.Security Impact (required)
No)No)No)No)No)Repro + Verification
Environment
maxHoldMs=300000(5min), defaultacquireTimeoutMs=60000(1min)Steps
Expected
acquireTimeoutMs(60s)maxHoldMs(5min),shouldReclaimmarks it staleActual (before fix)
shouldReclaimreturns false because PID is alive and age < 30minSessionWriteLockTimeoutErrorthrown, gateway unresponsiveEvidence
Log before:
[x4 times in 3 minutes, requiring manual gateway restart]
Log after: No
SessionWriteLockTimeoutErrorobserved.Human Verification (required)
inspectLockPayloadandinspectLockPayloadForSessioncalls continue to work withoutmaxHoldMs.maxHoldMsis optional and defaults to undefined;ageMsnull check prevents NaN comparison; existingstaleMscheck is unchanged.Review Conversations
Compatibility / Migration
Yes)No)No)Risks and Mitigations
maxHoldMswhile the holder is still writing could cause partial writes.runLockWatchdogCheck), which already force-releases locks exceedingmaxHoldMs. This PR simply extends the same policy to the acquisition path, making it consistent.