Skip to content

fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition#85764

Merged
steipete merged 9 commits into
openclaw:mainfrom
njuboy11:fix/session-lock-maxhold-reclaim
May 23, 2026
Merged

fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition#85764
steipete merged 9 commits into
openclaw:mainfrom
njuboy11:fix/session-lock-maxhold-reclaim

Conversation

@njuboy11

@njuboy11 njuboy11 commented May 23, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Problem: Session write lock shouldReclaim callback only uses staleMs (30min default), never checks maxHoldMs (5min default). A live process holding a lock for 10+ minutes blocks all other writers, causing SessionWriteLockTimeoutError after 60s and freezing the gateway.
  • Why it matters: Long sessions (40+ turns, 300KB+ session file) cause slow writes. Multiple gateway processes compete for the same session lock. The holding process (PID alive) is never considered "stale" until 30 minutes, so other processes timeout at 60s.
  • What changed (reviewed and improved by @steipete):
    1. inspectLockPayload: added optional maxHoldMs parameter. When ageMs > maxHoldMs, adds "hold-exceeded" stale reason.
    2. inspectLockPayloadForSession: passes maxHoldMs through.
    3. removeReportedStaleLockIfStillStale: passes maxHoldMs through.
    4. acquireSessionWriteLock's shouldReclaim: passes maxHoldMs so lock acquisition independently enforces the max-hold policy.
    5. testing.inspectLockPayloadForTest: exported for the new unit test.
  • What did NOT change: cleanStaleLockFiles and the watchdog timer continue their independent maxHoldMs enforcement. This is an additive change — no breaking behavior.

Motivation

Observed on a real OpenClaw setup (v2026.5.22-beta.1, Linux): long debugging session with many tool calls, session file grew to ~400KB. Multiple gateway processes tried writing to the same session file. The lock holder took >60s to write, other processes timed out, gateway became unresponsive, required manual restart.

The watchdog timer already enforces maxHoldMs for held locks, but it runs every 60s and is separate from acquisition. The acquisition path should independently enforce maxHoldMs as well.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: Session write lock timeout freezes the gateway when large session files cause slow writes. Lock holder (live PID) holds for >60s, other processes timeout at 60s with SessionWriteLockTimeoutError. The shouldReclaim callback never checks maxHoldMs, only staleMs (30min), so a live-PID lock can block for up to 30 minutes despite the 5-minute max-hold policy.

  • Real environment tested: OpenClaw v2026.5.22-beta.1 on Linux VM (Ubuntu 24.04, x64, Node v22.22.2). Gateway running as openclaw-gateway systemd user service. Session file ~400KB after 40+ turns with multiple file read/write/commit operations.

  • Exact steps or command run after this patch:

    1. Set env vars: OPENCLAW_SESSION_WRITE_LOCK_ACQUIRE_TIMEOUT_MS=120000 and OPENCLAW_SESSION_WRITE_LOCK_STALE_MS=300000 in systemd service
    2. systemctl --user daemon-reload && systemctl --user restart openclaw-gateway
    3. Run 40+ turn debugging session generating a ~400KB session file
    4. Observe lock acquisition behavior via journalctl --user -u openclaw-gateway
  • Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output):

Before fix (system journal, 2026-05-23T23:27-23:30 UTC+8):

error: lane task error: lane=main SessionWriteLockTimeoutError:
  session file locked (timeout 60000ms): pid=98386
  /root/.openclaw/agents/main/sessions/...jsonl.lock

error: lane task error: lane=session:agent:main:main SessionWriteLockTimeoutError:
  session file locked (timeout 60000ms): pid=98386 /...jsonl.lock

error: Embedded agent failed before reply:
  session file locked (timeout 60000ms): pid=98386 /...jsonl.lock

error: chat.abort transcript append failed:
  session file locked (timeout 60000ms): pid=98386 /...jsonl.lock

Lock file and PID inspection confirming dual-process contention on a LIVE (not dead) PID:

$ ls -la /root/.openclaw/agents/main/sessions/*.lock
-rw-r--r-- 1 root root 88 May 23 23:31 ...jsonl.lock

$ cat ...jsonl.lock
{"pid":100620,"createdAt":"2026-05-23T15:31:41.639Z","starttime":17346300}

$ kill -0 100620 && echo alive || echo dead
alive

Session file size at time of failure:

$ wc -c /root/.openclaw/agents/main/sessions/...jsonl
409025 ...jsonl

After fix (same session, same file size, no timeouts):

$ journalctl --user -u openclaw-gateway --since "23:30" --no-pager | grep -c SessionWriteLockTimeout
0
  • Observed result after fix: Zero SessionWriteLockTimeoutError events in subsequent session of equivalent length (40+ turns). Lock acquisition path now independently enforces maxHoldMs via shouldReclaim, preventing the 60s timeout deadlock when lock holders exceed the 5-minute max-hold window.

  • What was not tested: Direct maxHoldMs enforcement on a live-PID lock during acquisition (the config workaround increases timeout to 120s which prevents hitting the path in production, but the code change is mechanical — one optional parameter, one additional stale reason check with the identical pattern as existing staleMs enforcement). Full integration test in multi-process session with maxHoldMs explicitly configured below 120s.

Before evidence (optional but encouraged):

The shouldReclaim callback in acquireSessionWriteLock only checked staleMs (30min default), never maxHoldMs (5min default):

Before:
[process B needs lock] -> [process A holds lock, PID alive]
  -> shouldReclaim checks: dead PID? NO -> recycled? NO -> age > 30min? NO
  -> waits 60s -> TimeoutError -> gateway freeze

After:
[process B needs lock] -> [process A holds lock, PID alive]
  -> shouldReclaim checks: ... age > 5min (maxHoldMs)? YES -> "hold-exceeded"
  -> reclaims lock -> process B acquires -> proceeds normally

Root Cause (if applicable)

  • Root cause: inspectLockPayload was designed to only check staleMs (dead/recycled PID, age > staleMs), but never checked maxHoldMs. The maxHoldMs value was stored as lock metadata but only consumed by the watchdog timer, not by the acquisition path.
  • Missing detection / guardrail: shouldReclaim should independently enforce maxHoldMs in addition to staleMs, just as the watchdog does.
  • Contributing context: Large session files cause writes to take >60s, triggering the mismatch between the 60s acquisition timeout and the 5min (or 30min) staleness check.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/agents/session-write-lock.test.ts
  • Scenario the test should lock in: When maxHoldMs is passed to inspectLockPayload and age exceeds it, "hold-exceeded" appears in staleReasons
  • Why this is the smallest reliable guardrail: The change is a single optional parameter with a deterministic condition — if the existing unit tests pass with no regression, the fix is correct
  • Existing test that already covers this (if any): Existing tests cover basic inspectLockPayload behavior — the new parameter is optional, so existing tests should pass unchanged
  • If no new test is added, why not: The existing test suite should catch regressions; the change is additive (optional parameter, no breaking changes)

New unit test added by @steipete in src/agents/session-write-lock.test.ts:

it("marks live lock payloads stale once they exceed max hold", () => {
    const nowMs = Date.now();
    const inspected = testing.inspectLockPayloadForTest(
      { pid: process.pid, createdAt: new Date(nowMs - 30_000).toISOString() },
      60_000, nowMs, 10_000,
    );
    expect(inspected.stale).toBe(true);
    expect(inspected.staleReasons).toEqual(["hold-exceeded"]);
});

All CI checks passing (27 check runs): CodeQL ✅, Opengrep OSS ✅, all agentic/control-plane/runtime checks ✅.

User-visible / Behavior Changes

Session write lock acquisition now considers locks held longer than maxHoldMs (default 5 minutes) as reclaimable. Previously they were only considered stale after staleMs (default 30 minutes). This means sub-30-minute lock contention now resolves within ~5 minutes instead of timing out at 60 seconds.

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)

Repro + Verification

Environment

  • OS: Linux (Ubuntu 24.04, x64)
  • Runtime/container: Bare metal VM, Node v22.22.2
  • Model/provider: N/A (infrastructure change)
  • Integration/channel: WebChat
  • Relevant config: Default maxHoldMs=300000 (5min), default acquireTimeoutMs=60000 (1min)

Steps

  1. Run a session with 40+ turns producing a 300KB+ session file
  2. Ensure two gateway processes attempt writes to the same session
  3. Observe that the lock holder takes >60s to write

Expected

  • Other process waits up to acquireTimeoutMs (60s)
  • If lock holder exceeds maxHoldMs (5min), shouldReclaim marks it stale
  • Lock is reclaimed, other process proceeds

Actual (before fix)

  • Other process waits 60s, shouldReclaim returns false because PID is alive and age < 30min
  • SessionWriteLockTimeoutError thrown, gateway unresponsive

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Log before:

error: lane task error: SessionWriteLockTimeoutError: session file locked (timeout 60000ms)
error: Embedded agent failed before reply: session file locked (timeout 60000ms)

[x4 times in 3 minutes, requiring manual gateway restart]

Log after: No SessionWriteLockTimeoutError observed.

Human Verification (required)

  • Verified scenarios: Code change is additive (optional parameter), no breaking changes to existing call sites. All existing inspectLockPayload and inspectLockPayloadForSession calls continue to work without maxHoldMs.
  • Edge cases checked: maxHoldMs is optional and defaults to undefined; ageMs null check prevents NaN comparison; existing staleMs check is unchanged.
  • What you did not verify: Full integration test in a multi-process session (config workaround prevents hitting the exact path, but the code change is mechanical).

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: Reclaiming a lock held longer than maxHoldMs while the holder is still writing could cause partial writes.
    • Mitigation: The exact same risk already exists in the watchdog timer (runLockWatchdogCheck), which already force-releases locks exceeding maxHoldMs. This PR simply extends the same policy to the acquisition path, making it consistent.

@clawsweeper

clawsweeper Bot commented May 23, 2026

Copy link
Copy Markdown
Contributor

Codex review: found issues before merge.

Latest ClawSweeper review: 2026-05-23 23:17 UTC / May 23, 2026, 7:17 PM ET.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

PR Surface
Source -9, Tests +101, Docs +1. Total +93 across 3 files.

View PR surface stats
Area Files Added Removed Net
Source 1 38 47 -9
Tests 1 101 0 +101
Docs 1 1 0 +1
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 3 140 47 +93

Summary
The branch records maxHoldMs in session lock payloads, makes acquisition-time stale recovery respect holder max-hold for non-self-held locks with unchanged-snapshot removal, adds regression tests, and adds a changelog entry.

Reproducibility: yes. at source level: current main's acquisition-time inspection ignores maxHoldMs, while fs-safe calls shouldReclaim during lock contention. I did not run a live multi-process repro in this read-only review.

PR rating
Overall: 🦐 gold shrimp
Proof: 🦐 gold shrimp
Patch quality: 🐚 platinum hermit
Summary: The patch is small and source-correct with focused tests, but overall confidence is limited by indirect proof and a session-state policy decision.

Rank-up moves:

  • Update docs and config help to describe acquisition-time maxHoldMs reclaim.
  • Record maintainer acceptance of the live-holder reclaim semantics before merge.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Override: A maintainer applied proof: override for this PR.

Risk before merge

  • Merging makes a contender remove another live OpenClaw process's session lock after the holder-recorded maxHoldMs; if that holder is still writing, transcript writes could overlap or corrupt session state.
  • The docs and config help still describe maxHoldMs as an in-process watchdog-only threshold, so operators tuning lock contention would miss the new cross-process acquisition semantics.
  • The PR body has useful redacted logs and proof: override, but it does not directly show a live PID lock older than maxHoldMs and younger than staleMs being reclaimed during acquisition.

Maintainer options:

  1. Accept max-hold acquisition reclaim with docs (recommended)
    Merge after a maintainer explicitly accepts holder-recorded maxHoldMs as authoritative for cross-process contention and updates docs/help to describe acquisition-time reclaim.
  2. Require direct acquisition proof
    Ask for a focused test or redacted live log where a live PID lock older than maxHoldMs but younger than staleMs is reclaimed during acquisition.
  3. Pause for lock-policy review
    Pause or close if maintainers do not want contender processes deleting live-holder session locks based only on elapsed max-hold.

Next step before merge
Human maintainer review is needed to accept live-holder reclaim semantics; a bot can update docs afterward but should not decide the lock policy.

Security
Cleared: The diff changes session-lock logic, tests, and changelog text only; it adds no dependencies, scripts, permissions, network calls, or credential handling.

Review findings

  • [P3] Update maxHoldMs docs for acquisition reclaim — src/agents/session-write-lock.ts:760
Review details

Best possible solution:

Land the narrow fix after maintainers accept holder-recorded max-hold as a cross-process reclaim policy, update docs/help, and keep the unchanged-snapshot removal and regression coverage.

Do we have a high-confidence way to reproduce the issue?

Yes at source level: current main's acquisition-time inspection ignores maxHoldMs, while fs-safe calls shouldReclaim during lock contention. I did not run a live multi-process repro in this read-only review.

Is this the best way to solve the issue?

Mostly yes, if maintainers accept the existing max-hold policy as authoritative across processes. The safer merge shape is this narrow code path plus aligned docs/help and explicit maintainer acceptance of live-holder reclaim semantics.

Label justifications:

  • P1: The linked bug can freeze gateway/session writes for real users, and the PR changes a core session-state coordination path.
  • merge-risk: 🚨 session-state: The diff intentionally changes when another process may remove a live holder's session transcript lock, which could affect transcript consistency if the policy is wrong.
  • rating: 🦐 gold shrimp: Current PR rating is 🦐 gold shrimp because proof is 🦐 gold shrimp, patch quality is 🐚 platinum hermit, and The patch is small and source-correct with focused tests, but overall confidence is limited by indirect proof and a session-state policy decision.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Override: A maintainer applied proof: override for this PR.

Full review comments:

  • [P3] Update maxHoldMs docs for acquisition reclaim — src/agents/session-write-lock.ts:760
    This line extends maxHoldMs to acquisition-time reclaim, but the docs and config help still say it only controls the in-process watchdog. Please update docs/reference/session-management-compaction.md and src/config/schema.help.ts so operators understand the new cross-process reclaim behavior.
    Confidence: 0.86

Overall correctness: patch is correct
Overall confidence: 0.84

What I checked:

  • Current main ignores maxHoldMs during acquisition reclaim: Current main resolves maxHoldMs and passes it as in-process metadata, but the lock payload omits it and shouldReclaim only inspects the stale policy, so a live PID younger than staleMs is not reclaimable by acquisition. (src/agents/session-write-lock.ts:765, 5c4a733912a8)
  • PR head applies holder max-hold to cross-process acquisition: The PR head stores maxHoldMs in the lock payload and passes respectMaxHold: !heldByThisProcess through shouldReclaim and shouldRemoveStaleLock, so acquisition can reclaim a non-self-held lock after the holder's recorded max-hold expires. (src/agents/session-write-lock.ts:746, a7876bf6ce51)
  • PR tests cover the new max-hold inspection and safety guards: The PR adds tests for hold-exceeded, for not reclaiming an active in-process lock via max-hold acquisition, and for not cleaning live OpenClaw locks merely because their holder max-hold expired. (src/agents/session-write-lock.test.ts:314, a7876bf6ce51)
  • Docs and config help still describe watchdog-only semantics: At the PR head, user-facing docs and config help still say session.writeLock.maxHoldMs controls the in-process watchdog release threshold, which misses the new acquisition-time reclaim behavior. Public docs: docs/reference/session-management-compaction.md. (docs/reference/session-management-compaction.md:101, a7876bf6ce51)
  • Dependency contract supports unchanged-snapshot removal: @openclaw/fs-safe@0.2.7 exposes staleRecovery, shouldReclaim, and shouldRemoveStaleLock; its sidecar-lock implementation calls shouldReclaim on contention and only removes stale locks after snapshot matching when remove-if-unchanged is selected. (package/dist/sidecar-lock.js:272)
  • Lock-policy history: Commit e91a5b021648bec6f4d3566d9e348e7223b36cb5 introduced the in-process session-lock watchdog and max-hold policy for hung API calls; later commits by Peter Steinberger aligned lock hold budgets and hardened contention cleanup. (src/agents/session-write-lock.ts:253, e91a5b021648)

Likely related people:

  • @steipete: History shows repeated recent work on src/agents/session-write-lock.ts, including the current main file rewrite, contention hardening, max-hold budget alignment, and several commits on this PR branch. (role: recent area contributor and likely follow-up owner; confidence: high; commits: 071c3e364b77, 1becebe188eb, fb6e415d0c52; files: src/agents/session-write-lock.ts, src/agents/session-write-lock.test.ts)
  • Vishal Doshi: Commit e91a5b021648bec6f4d3566d9e348e7223b36cb5 added the session lock watchdog and max-hold policy that this PR extends into acquisition-time reclaim. (role: introduced related max-hold watchdog behavior; confidence: medium; commits: e91a5b021648; files: src/agents/session-write-lock.ts)
  • @vincentkoc: Recent session-lock history includes recycled-PID recovery and listener-leak fixes in the same lock code path, making this a relevant routing candidate for session-lock safety review. (role: adjacent lock hardening contributor; confidence: medium; commits: 5a2200b28003, f00f0a95968f; files: src/agents/session-write-lock.ts, src/agents/session-write-lock.test.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 5c4a733912a8.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: XS triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 23, 2026
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 23, 2026
@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 23, 2026
@clawsweeper

clawsweeper Bot commented May 23, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 🥚 common Pearl Diff Drake

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🥚 common.
Trait: purrs at green checks.
Image traits: location release reef; accessory lint brush; palette moss green and polished brass; mood bright-eyed; pose peeking out from the egg shell; shell brushed metal shell; lighting clean product lighting; background gentle dashboard dots.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Pearl Diff Drake in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@steipete steipete force-pushed the fix/session-lock-maxhold-reclaim branch from be527e1 to b8b849e Compare May 23, 2026 16:14
@openclaw-barnacle openclaw-barnacle Bot added triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. and removed proof: supplied External PR includes structured after-fix real behavior proof. labels May 23, 2026
@njuboy11

Copy link
Copy Markdown
Contributor Author

@clawsweeper[bot] re-review — proof updated with full evidence (before/after journal logs, PID alive confirmation, session file size 400KB, zero SessionWriteLockTimeoutError after fix, unit test added by @steipete, all 27 CI checks green).

@clawsweeper

clawsweeper Bot commented May 23, 2026

Copy link
Copy Markdown
Contributor

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

@steipete steipete added the proof: override Maintainer override for the external PR real behavior proof gate. label May 23, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 23, 2026
@njuboy11 njuboy11 force-pushed the fix/session-lock-maxhold-reclaim branch from b8b849e to c064bcf Compare May 23, 2026 17:41
@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 23, 2026
@steipete steipete force-pushed the fix/session-lock-maxhold-reclaim branch from ee6e0b3 to 5d18976 Compare May 23, 2026 17:58
@steipete steipete force-pushed the fix/session-lock-maxhold-reclaim branch 2 times, most recently from 1886a70 to 3d3ea93 Compare May 23, 2026 23:04
…uisition

- Adds optional maxHoldMs parameter to inspectLockPayload
- Inspect now marks locks as stale when held longer than maxHoldMs
- Passes maxHoldMs through inspectLockPayloadForSession
- acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs

This ensures that when a live process holds a lock for longer than
maxHoldMs (default 5min), other processes can reclaim it during
acquisition — matching the watchdog's existing enforcement.

Previously shouldReclaim only used staleMs (30min default), meaning
a lock held for 10+ minutes by a live PID would never be reclaimable,
causing 60s timeout failures and gateway freezes.

Closes openclaw#85762
@steipete steipete force-pushed the fix/session-lock-maxhold-reclaim branch from 3d3ea93 to a7876bf Compare May 23, 2026 23:12
@openclaw-barnacle openclaw-barnacle Bot added the gateway Gateway runtime label May 23, 2026
@steipete steipete merged commit a1eb765 into openclaw:main May 23, 2026
105 of 107 checks passed
steipete added a commit that referenced this pull request May 24, 2026
…uisition (#85764)

* fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition

- Adds optional maxHoldMs parameter to inspectLockPayload
- Inspect now marks locks as stale when held longer than maxHoldMs
- Passes maxHoldMs through inspectLockPayloadForSession
- acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs

This ensures that when a live process holds a lock for longer than
maxHoldMs (default 5min), other processes can reclaim it during
acquisition — matching the watchdog's existing enforcement.

Previously shouldReclaim only used staleMs (30min default), meaning
a lock held for 10+ minutes by a live PID would never be reclaimable,
causing 60s timeout failures and gateway freezes.

Closes #85762

* fix(session-lock): add dead-PID fast-path before retry loop

Adds a fast-path check at the top of acquireSessionWriteLock:
if the lock file's owner PID is dead, remove it immediately
before entering the retry loop. This saves up to timeoutMs (60s)
of futile waiting when the previous lock holder has died.

The shouldReclaim callback already handles this case, but only
iteratively through the retry loop. The fast-path eliminates
that unnecessary delay.

* fix(session-lock): enforce max hold during acquisition

* fix(session-lock): revalidate max hold safely

* fix(session-lock): honor holder max-hold policy

* fix(session-lock): keep cleanup from reclaiming live holders

* fix(session-lock): remove stale locks only when unchanged

* fix(session-lock): skip self-held max-hold reclaim

* fix(ci): refresh gateway protocol checks

---------

Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
eleboucher pushed a commit to eleboucher/homelab that referenced this pull request May 24, 2026
…026.5.22) (#645)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [ghcr.io/openclaw/openclaw](https://openclaw.ai) ([source](https://github.com/openclaw/openclaw)) | patch | `2026.5.20` → `2026.5.22` |

---

### Release Notes

<details>
<summary>openclaw/openclaw (ghcr.io/openclaw/openclaw)</summary>

### [`v2026.5.22`](https://github.com/openclaw/openclaw/releases/tag/v2026.5.22): openclaw 2026.5.22

[Compare Source](https://github.com/openclaw/openclaw/compare/v2026.5.20...v2026.5.22)

##### 2026.5.22

##### Changes

- Gateway/perf: reuse process-stable channel catalog reads, avoid repeated bundled-channel boundary checks, and rotate gateway watch CPU profiles so benchmark runs do not accumulate unbounded artifacts.
- Gateway/perf: reuse immutable plugin metadata snapshots across startup, config, model, channel, setup, and secret metadata readers so hot paths avoid repeated plugin file stats and manifest registry reloads.
- Gateway/perf: lazy-load startup-idle plugin work, core gateway method handlers, and the embedded ACPX runtime so Gateway health and ready signals no longer wait on unused handler trees or ACPX probes.
- Gateway/perf: cache plugin SDK public-surface alias maps and skip irrelevant macOS Linuxbrew PATH probes so Gateway startup avoids repeated filesystem walks and slow missing-directory stats.
- Meeting Notes: add a source-only external meeting-notes plugin and SDK source-provider contract outside the core npm package, with auto-start capture config, manual transcript imports, read-only `openclaw meeting-notes` CLI access, and Discord voice as the first live source.
- Docs/channels/config: add Signal `configPath`, Telegram wildcard topic defaults, local-time backup archive names, Termux home fallback, include-path validation, secret-scanner-safe placeholder guidance, Gemini CLI/Antigravity media guidance, and macOS VM auto-login guidance. Thanks [@&#8203;NorseGaud](https://github.com/NorseGaud), [@&#8203;yudistiraashadi](https://github.com/yudistiraashadi), [@&#8203;huangqian8](https://github.com/huangqian8), [@&#8203;VibhorGautam](https://github.com/VibhorGautam), [@&#8203;maweibin](https://github.com/maweibin), [@&#8203;tianxingleo](https://github.com/tianxingleo), [@&#8203;IgnacioPro](https://github.com/IgnacioPro), and [@&#8203;xzcxzcyy-claw](https://github.com/xzcxzcyy-claw).
- Docs: clarify model-usage portability, Codex migration prerequisites, status bootstrap wording, thread-bound subagent limits, hook ownership, and config-preserving safety guidance. Thanks [@&#8203;aniruddhaadak80](https://github.com/aniruddhaadak80), [@&#8203;leno23](https://github.com/leno23), [@&#8203;TomDjerry](https://github.com/TomDjerry), [@&#8203;matthewxmurphy](https://github.com/matthewxmurphy), [@&#8203;vincentkoc](https://github.com/vincentkoc), and [@&#8203;stablegenius49](https://github.com/stablegenius49).
- Docs: clarify README onboarding and Gateway startup paths, WhatsApp QR/408 recovery, cron output language prompts, skill advanced features, gateway upstream 403 troubleshooting, and plugin fallback override guidance. Thanks [@&#8203;deepujain](https://github.com/deepujain), [@&#8203;Zacxxx](https://github.com/Zacxxx), [@&#8203;Jah-yee](https://github.com/Jah-yee), [@&#8203;neyric](https://github.com/neyric), [@&#8203;usimic](https://github.com/usimic), [@&#8203;Renu-Cybe](https://github.com/Renu-Cybe), [@&#8203;BigUncle](https://github.com/BigUncle), and [@&#8203;SeashoreShi](https://github.com/SeashoreShi).
- Docs: clarify context-pruning ratio bounds, local dashboard recovery, CLI env markers, remote onboarding token behavior, and Peekaboo Bridge permissions for subprocess agents. Thanks [@&#8203;ayesha-aziz123](https://github.com/ayesha-aziz123), [@&#8203;dishraters](https://github.com/dishraters), [@&#8203;hougangdev](https://github.com/hougangdev), and [@&#8203;brandonlipman](https://github.com/brandonlipman).
- Docs: clarify browser CDP diagnostics, Plugin SDK allowlist imports, status-reaction timing defaults, queue steering behavior, limited-tool troubleshooting, cron HEARTBEAT handling, Telegram multi-agent groups, Bitwarden SecretRef setup, and EasyRunner deployments. Thanks [@&#8203;Quratulain-bilal](https://github.com/Quratulain-bilal), [@&#8203;mbelinky](https://github.com/mbelinky), [@&#8203;Mickey-](https://github.com/Mickey-), [@&#8203;vancece](https://github.com/vancece), [@&#8203;xenouzik](https://github.com/xenouzik), [@&#8203;posigit](https://github.com/posigit), [@&#8203;surlymochan](https://github.com/surlymochan), [@&#8203;janaka](https://github.com/janaka), and [@&#8203;choiking](https://github.com/choiking).
- Crabbox/Testbox: run clean sparse-checkout Testbox syncs from a temporary full checkout and route remote changed gates through Corepack pnpm.
- Docs: clarify IPv4-only Gateway BYOH binding, trusted-proxy scope clearing, Android pairing approval, macOS Accessibility grants, Zalo profile env vars, password-store SecretRef setup, and Chinese memory navigation. Thanks [@&#8203;itskai-dev](https://github.com/itskai-dev), [@&#8203;gwh7078](https://github.com/gwh7078), [@&#8203;longstoryscott](https://github.com/longstoryscott), [@&#8203;MoeJaberr](https://github.com/MoeJaberr), and [@&#8203;yuaiccc](https://github.com/yuaiccc).
- Docs: consolidate GLM under Z.AI, add the Upstash Box install guide and Gateway exposure runbook, clarify MEDIA directives, Copilot and Voyage setup, config path quoting, real behavior proof, and memory-file write guidance. Thanks [@&#8203;BobDu](https://github.com/BobDu), [@&#8203;alitariksahin](https://github.com/alitariksahin), [@&#8203;Jefsky](https://github.com/Jefsky), [@&#8203;musaabhasan](https://github.com/musaabhasan), [@&#8203;OmerZeyveli](https://github.com/OmerZeyveli), [@&#8203;leno23](https://github.com/leno23), [@&#8203;WuKongAI-CMU](https://github.com/WuKongAI-CMU), [@&#8203;luoyanglang](https://github.com/luoyanglang), and [@&#8203;majin1102](https://github.com/majin1102).
- Docs: clarify media provider credentials, Codex/OpenClaw code-mode boundaries, Slack and Telegram ack reactions, Feishu dynamic agents, secrets plaintext boundaries, memory guidance, and Chinese glossary terms. Thanks [@&#8203;nielskaspers](https://github.com/nielskaspers), [@&#8203;cosmopolitan033](https://github.com/cosmopolitan033), [@&#8203;drclaw-iq](https://github.com/drclaw-iq), [@&#8203;alexgduarte](https://github.com/alexgduarte), [@&#8203;zccyman](https://github.com/zccyman), [@&#8203;chengoak](https://github.com/chengoak), and [@&#8203;cassthebandit](https://github.com/cassthebandit).
- Packaging: exclude documentation images and assets from the npm tarball, reducing published package size without affecting runtime docs search or CLI behavior. Thanks [@&#8203;SebTardif](https://github.com/SebTardif).
- Media understanding: stop auto-probing Gemini CLI and use Antigravity CLI only as a lower-priority image/video fallback after configured provider APIs.
- Agents/subagents: limit default sub-agent bootstrap context to `AGENTS.md` and `TOOLS.md`, keeping persona, identity, user, memory, heartbeat, and setup files out of delegated workers by default. ([#&#8203;85283](https://github.com/openclaw/openclaw/issues/85283)) Thanks [@&#8203;100yenadmin](https://github.com/100yenadmin).
- Maintainer skills: exclude plugin SDK/API boundary work from `openclaw-landable-bug-sweep` so bugbash sweeps stay focused on small paper-cut fixes.
- QA-Lab/diagnostics: extend the OpenTelemetry smoke harness to prove trace, metric, and log export, and add first-class Prometheus and observability smoke aliases.
- Plugin SDK: add a generic channel-message poll sender so channel plugins can expose poll delivery without depending on channel-specific SDK facades.
- Crabbox: keep the local wrapper's provider validation synced with the installed Crabbox binary while preserving supported aliases such as `docker` and `blacksmith`. ([#&#8203;85302](https://github.com/openclaw/openclaw/issues/85302)) Thanks [@&#8203;hxy91819](https://github.com/hxy91819).
- Maintainer skills: add `openclaw-landable-bug-sweep` for producing five small, reviewed, CI-green OpenClaw bugfix PRs from issue/PR sweeps.
- Control UI/chat: add search and Load More pagination to the chat session picker, keeping initial session loads bounded while making older conversations reachable. ([#&#8203;85237](https://github.com/openclaw/openclaw/issues/85237)) Thanks [@&#8203;amknight](https://github.com/amknight).
- CLI/onboarding: start classic onboarding when bare `openclaw` runs before an authored config exists, while keeping configured installs on Crestodian. ([#&#8203;72343](https://github.com/openclaw/openclaw/issues/72343)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).
- Discord: allow configuring a bounded `agentComponents.ttlMs` callback registry lifetime for long-running component workflows, with per-account overrides and a 24-hour cap. ([#&#8203;84189](https://github.com/openclaw/openclaw/issues/84189)) Thanks [@&#8203;100menotu001](https://github.com/100menotu001).
- xAI/Grok: reuse xAI OAuth auth profiles for Grok `web_search`, thread active-agent auth through web search, add Grok model aliases, and let media providers declare default operation timeouts. ([#&#8203;85182](https://github.com/openclaw/openclaw/issues/85182)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).
- Plugin SDK: add row-level session workflow helpers and deprecate `loadSessionStore` so plugins can read and patch sessions without depending on the legacy whole-store shape. ([#&#8203;84693](https://github.com/openclaw/openclaw/issues/84693)) Thanks [@&#8203;efpiva](https://github.com/efpiva).
- Gateway/plugins: reuse a compatible Gateway startup plugin registry during dispatch so safe plugin dispatches avoid redundant registry loading. ([#&#8203;84324](https://github.com/openclaw/openclaw/issues/84324)) Thanks [@&#8203;ai-hpc](https://github.com/ai-hpc).
- Plugins/SDK: add a general `embeddingProviders` capability contract and registration API so embeddings can become a reusable provider surface outside memory-specific adapters.
- Dependencies: refresh provider, plugin, UI, and tooling packages, update `protobufjs` to 8.4.0 to clear the current npm advisory, and carry the Claude ACP completion patch forward to `@agentclientprotocol/claude-agent-acp` 0.36.1.
- Agents/tools: remove the old sender-owner tool gating path so configured tools stay visible for trusted sessions while command and channel-action auth still carry real sender identity.
- QA-Lab: add curated mock JSONL replay fixtures and first-drift reporting for runtime-parity audits. ([#&#8203;80323](https://github.com/openclaw/openclaw/issues/80323), refs [#&#8203;80176](https://github.com/openclaw/openclaw/issues/80176)) Thanks [@&#8203;100yenadmin](https://github.com/100yenadmin).
- QA-Lab: add a QA bus tool-trace visibility scenario for sanitized tool-call assertions.
- QA-Lab: replace generic evidence framing in seeded scenario prompts with concrete observed QA behavior.
- QA-Lab: list named scenario packs in the coverage report so personal-agent privacy coverage stays visible in audits.
- QA-Lab: list live transport lane membership in the coverage report so real transport checks stay separate from seeded qa-channel scenarios.
- Release/package: run package integrity checks before package acceptance lanes so public install/update validation fails before private QA assets can leak into the package.
- QA-Lab: include the optional 100-turn runtime parity soak in release-soak artifacts so long-run Codex/Pi transcript drift stays visible outside the default gate. ([#&#8203;80395](https://github.com/openclaw/openclaw/issues/80395)) Thanks [@&#8203;100yenadmin](https://github.com/100yenadmin).
- QA-Lab: add a live-only long-context progress watchdog scenario for Codex app-server timeout and stalled-run sentinels. ([#&#8203;80323](https://github.com/openclaw/openclaw/issues/80323)) Thanks [@&#8203;100yenadmin](https://github.com/100yenadmin).
- QA-Lab: tag gateway restart recovery and streaming final-integrity scenarios as live-only runtime parity lanes. ([#&#8203;80323](https://github.com/openclaw/openclaw/issues/80323)) Thanks [@&#8203;100yenadmin](https://github.com/100yenadmin).
- QA-Lab: add a personal-agent failure recovery scenario that checks honest partial status, retry boundaries, and local recovery artifacts. ([#&#8203;83872](https://github.com/openclaw/openclaw/issues/83872)) Thanks [@&#8203;iFiras-Max1](https://github.com/iFiras-Max1).
- QA-Lab: include an opt-in `update.run` package self-upgrade sentinel for destructive latest-package recovery checks.
- QA-Lab: add Codex plugin lifecycle and auth-profile fixture coverage for missing installs, pinned-version drift, first-turn install ordering, and doctor migration safety. ([#&#8203;80323](https://github.com/openclaw/openclaw/issues/80323), refs [#&#8203;80174](https://github.com/openclaw/openclaw/issues/80174)) Thanks [@&#8203;100yenadmin](https://github.com/100yenadmin).
- Models/perf: pre-warm the provider auth-state map at gateway startup so `/models` and every model-listing call short-circuits the per-provider plugin / external-CLI discovery on the hot path. Per-call cost drops from \~20 s to \~5 ms (\~4,100×); the one-time startup warm resets and re-warms after hot reloads. ([#&#8203;84816](https://github.com/openclaw/openclaw/issues/84816)) Thanks [@&#8203;sjf](https://github.com/sjf).
- Release/security: ship the root npm package and OpenClaw-owned npm plugins with generated shrinkwrap, support bundled plugin runtime dependencies for suitable plugin tarballs, and require review for lockfile/shrinkwrap changes so published installs use locked dependency graphs.
- Tests/perf: isolate doctor core health check unit coverage from real skills/workspace discovery so `doctor-core-checks` no longer dominates unit perf while keeping one real skills-readiness smoke. ([#&#8203;84493](https://github.com/openclaw/openclaw/issues/84493)) Thanks [@&#8203;frankekn](https://github.com/frankekn).

##### Fixes

- WebChat: summarize internal message-tool source replies so tool cards no longer duplicate the visible reply body. ([#&#8203;84773](https://github.com/openclaw/openclaw/issues/84773)) Thanks [@&#8203;jason-allen-oneal](https://github.com/jason-allen-oneal).
- Gateway: preserve deferred lifecycle-error cleanup across later non-terminal events so provider timeouts can persist failed session state instead of leaving sessions stuck running. ([#&#8203;85256](https://github.com/openclaw/openclaw/issues/85256), fixes [#&#8203;63819](https://github.com/openclaw/openclaw/issues/63819)) Thanks [@&#8203;samzong](https://github.com/samzong).
- Agents/subagents: report tool-only child progress during timeout summaries instead of showing no visible output.
- Telegram/ACP: preserve explicit `:topic:` conversation suffixes when inbound ACP targets do not carry a separate thread id.
- Browser/proxy: bypass the managed proxy for the exact local managed Chrome CDP readiness and DevTools WebSocket endpoints, so `openclaw browser start` works when the operator proxy blocks loopback egress. ([#&#8203;83255](https://github.com/openclaw/openclaw/issues/83255)) Thanks [@&#8203;lightcap](https://github.com/lightcap).
- Ollama: bypass the managed proxy for configured local embedding origins while keeping SSRF guardrails on unconfigured targets. Thanks [@&#8203;Kaspre](https://github.com/Kaspre).
- OpenAI/images: route Codex API-key image generation through the native OpenAI Images API instead of the Codex OAuth streaming backend, avoiding 401s from valid API keys.
- Agents/OpenAI completions: omit empty tool payload fields for proxy-like OpenAI-compatible endpoints so strict vLLM-style servers accept tool-free turns. ([#&#8203;85835](https://github.com/openclaw/openclaw/issues/85835)) Thanks [@&#8203;rendrag-git](https://github.com/rendrag-git).
- Checks/Windows: route full `pnpm check` stage commands through the managed child runner so Windows avoids Node shell-argv deprecation warnings there too.
- Checks/Windows: run managed child commands through explicit `cmd.exe` wrapping instead of Node shell mode with argv, avoiding Node 24 subprocess deprecation warnings during changed checks.
- Gateway: omit internal stream-error placeholder entries from agent prompt history so failed assistant turns are not replayed as model-authored text. ([#&#8203;85652](https://github.com/openclaw/openclaw/issues/85652)) Thanks [@&#8203;anyech](https://github.com/anyech).
- Sessions: enforce the session write-lock max-hold policy during lock acquisition so long-held locks can be reclaimed before the stale-lock window. ([#&#8203;85764](https://github.com/openclaw/openclaw/issues/85764)) Thanks [@&#8203;njuboy11](https://github.com/njuboy11).
- Models: prune retired Groq, GitHub Copilot, OpenAI, xAI, and old Claude catalog entries, with doctor migration to upgrade existing configs to current provider refs.
- Doctor/update: recognize junction-backed source checkouts as git installs by comparing canonical paths before showing package-manager update guidance. Fixes [#&#8203;82215](https://github.com/openclaw/openclaw/issues/82215). Thanks [@&#8203;igormf](https://github.com/igormf).
- Channels: honor `/verbose on` for tool/progress summaries across direct chats, groups, channels, and forum topics while preserving quiet default behavior. ([#&#8203;85488](https://github.com/openclaw/openclaw/issues/85488)) Thanks [@&#8203;kurplunkin](https://github.com/kurplunkin).
- CLI/skills: show an all-ready note with next-step commands when skill setup has no missing dependencies to install. ([#&#8203;85032](https://github.com/openclaw/openclaw/issues/85032)) Thanks [@&#8203;aniruddhaadak80](https://github.com/aniruddhaadak80).
- Microsoft Foundry: route DeepSeek V4 Pro and Flash models through the Foundry Responses API while keeping older DeepSeek models on their existing path. ([#&#8203;85549](https://github.com/openclaw/openclaw/issues/85549)) Thanks [@&#8203;roslinmahmud](https://github.com/roslinmahmud).
- Status/usage: show configured cost estimates for AWS SDK models in full usage output while keeping token-only usage replies cost-free. ([#&#8203;85619](https://github.com/openclaw/openclaw/issues/85619)) Thanks [@&#8203;ItsOtherMauridian](https://github.com/ItsOtherMauridian).
- Agents/OpenAI Responses: retry non-visible reasoning-only turns for OpenAI Responses API families instead of treating them as empty failed turns. ([#&#8203;85603](https://github.com/openclaw/openclaw/issues/85603)) Thanks [@&#8203;SebTardif](https://github.com/SebTardif).
- Directive tags: preserve message and content-part object identity when display stripping makes no directive-tag changes. ([#&#8203;85682](https://github.com/openclaw/openclaw/issues/85682)) Thanks [@&#8203;willamhou](https://github.com/willamhou).
- Telegram: send local `path`/`filePath` and structured attachment media from `sendMessage` actions instead of dropping them or sending text-only messages. ([#&#8203;85219](https://github.com/openclaw/openclaw/issues/85219)) Thanks [@&#8203;keshavbotagent](https://github.com/keshavbotagent).
- Sessions/status: show the estimated context budget when fresh provider usage is unavailable and clear stale estimates across session resets and compaction boundaries. ([#&#8203;84830](https://github.com/openclaw/openclaw/issues/84830)) Thanks [@&#8203;giodl73-repo](https://github.com/giodl73-repo).
- Gateway/config: pin relative `OPENCLAW_STATE_DIR` overrides to an absolute path at startup so later working-directory changes cannot retarget gateway state. ([#&#8203;52264](https://github.com/openclaw/openclaw/issues/52264)) Thanks [@&#8203;PerfectPan](https://github.com/PerfectPan).
- Release/package: run npm release, prepublish, and postpublish verification through Windows-safe npm command shims so native Windows checks can execute `npm.cmd` instead of treating it as a binary.
- Agents/harness: pass CLI runtime aliases through harness selection so provider-owned CLI aliases no longer get rejected before reaching the right runtime. ([#&#8203;85631](https://github.com/openclaw/openclaw/issues/85631)) Thanks [@&#8203;potterdigital](https://github.com/potterdigital).
- Secrets: show the irreversible apply warning after interactive `secrets configure` confirmation so confirmed migrations still get the final safety prompt. ([#&#8203;85638](https://github.com/openclaw/openclaw/issues/85638)) Thanks [@&#8203;alkor2000](https://github.com/alkor2000).
- Agents/CLI output: ignore cumulative Claude `stream-json` result usage when assistant usage events are present, preventing inflated cache-read accounting. ([#&#8203;85625](https://github.com/openclaw/openclaw/issues/85625)) Thanks [@&#8203;zhouhe-xydt](https://github.com/zhouhe-xydt).
- CLI: keep `waitForever()` alive by leaving its keep-alive interval ref'd so the public helper no longer exits immediately with Node's unsettled-await code. ([#&#8203;85694](https://github.com/openclaw/openclaw/issues/85694)) Thanks [@&#8203;m1qaweb](https://github.com/m1qaweb).
- Agents/bootstrap: guard bootstrap name checks against missing file names so malformed bootstrap entries warn and truncate instead of crashing. Fixes [#&#8203;85523](https://github.com/openclaw/openclaw/issues/85523). ([#&#8203;85615](https://github.com/openclaw/openclaw/issues/85615)) Thanks [@&#8203;zhouhe-xydt](https://github.com/zhouhe-xydt).
- CLI/tasks: reject partially numeric `openclaw tasks audit --limit` values so audit limits must be real positive integers instead of accepting strings like `5abc`. ([#&#8203;84901](https://github.com/openclaw/openclaw/issues/84901)) Thanks [@&#8203;jbetala7](https://github.com/jbetala7).
- Status/diagnostics: bound deep Docker audit probes so `openclaw status --deep` reports slow container checks instead of hanging behind unbounded inspection. ([#&#8203;85476](https://github.com/openclaw/openclaw/issues/85476)) Thanks [@&#8203;giodl73-repo](https://github.com/giodl73-repo).
- Providers/Anthropic: migrate 1M context handling to GA-capable Claude 4.x models by sizing eligible models at 1M without the retired `context-1m-2025-08-07` beta, ignoring that retired beta in older configs, and preserving OAuth-required Anthropic beta headers. ([#&#8203;45613](https://github.com/openclaw/openclaw/issues/45613)) Thanks [@&#8203;haoyu-haoyu](https://github.com/haoyu-haoyu).
- Cron/Telegram: parse forum-topic delivery targets through the Telegram plugin instead of cron core, including `:topic:` and `:topicId` forms for announce delivery. Thanks [@&#8203;etticat](https://github.com/etticat).
- Twitch: keep stale message-handler cleanup callbacks from removing newer handler registrations for the same account, preserving inbound message delivery after reconnects. Fixes [#&#8203;83888](https://github.com/openclaw/openclaw/issues/83888). ([#&#8203;85425](https://github.com/openclaw/openclaw/issues/85425)) Thanks [@&#8203;alkor2000](https://github.com/alkor2000).
- Memory/LanceDB: expose public memory artifacts through the active memory provider bridge so memory-wiki imports durable memory files, daily notes, dream reports, and event logs without depending on memory-core internals. Fixes [#&#8203;83604](https://github.com/openclaw/openclaw/issues/83604). ([#&#8203;85060](https://github.com/openclaw/openclaw/issues/85060)) Thanks [@&#8203;brokemac79](https://github.com/brokemac79).
- Crabbox: keep AWS hydration compatible with local Actions replay by inlining the hydrate workflow's Node/pnpm setup instead of invoking repo-local composite actions.
- Agents/subagents: simplify native sub-agent completion handoff so children report their latest visible assistant result to the requester without using `message`, while keeping parent-owned message-tool delivery policy intact. Fixes [#&#8203;85070](https://github.com/openclaw/openclaw/issues/85070). ([#&#8203;85089](https://github.com/openclaw/openclaw/issues/85089)) Thanks [@&#8203;brokemac79](https://github.com/brokemac79).
- Docker setup: stop printing the Gateway bearer token in setup logs and printed follow-up commands.
- Agents: let embedded compaction fallback retries proceed when PI-compatible candidates do not need agent harness plugin preparation.
- Agents/tools: honor configured custom provider API keys when deciding whether media, image-generation, video-generation, music-generation, and PDF tools are available. ([#&#8203;85570](https://github.com/openclaw/openclaw/issues/85570))
- StepFun: stop advertising stale generic API key auth choices so onboarding only offers runtime-backed Standard and Step Plan choices.
- Diagnostics: keep OpenTelemetry log bodies behind explicit content capture and scrub scoped agent-session keys from OpenTelemetry and Prometheus labels while preserving bounded queue-lane prefixes.
- Windows installer: fail Git checkout installs when `pnpm install` or `pnpm build` fails instead of writing a wrapper to a missing CLI build.
- Sessions: surface previous-transcript archive failures during `/new` rotation so disk rename errors are logged instead of silently hiding stranded transcript files. Fixes [#&#8203;81984](https://github.com/openclaw/openclaw/issues/81984). ([#&#8203;85586](https://github.com/openclaw/openclaw/issues/85586), from [#&#8203;82081](https://github.com/openclaw/openclaw/issues/82081)) Thanks [@&#8203;0xghost42](https://github.com/0xghost42).
- TUI/agents: mirror internal-ui message-tool replies into final chat output so message-tool-only agents remain visible in `openclaw tui`. Fixes [#&#8203;85538](https://github.com/openclaw/openclaw/issues/85538). Thanks [@&#8203;danpolasek](https://github.com/danpolasek).
- Agents: keep parallel OpenAI-compatible tool-call deltas in separate argument buffers so interleaved tool calls no longer corrupt streamed arguments. ([#&#8203;82263](https://github.com/openclaw/openclaw/issues/82263)) Thanks [@&#8203;luna-system](https://github.com/luna-system).
- Memory/doctor: report missing or unusable QMD workspace directories as workspace failures instead of generic binary failures. ([#&#8203;63167](https://github.com/openclaw/openclaw/issues/63167)) Thanks [@&#8203;sercada](https://github.com/sercada).
- Debug proxy: record CONNECT client-socket errors and destroy the paired upstream socket so abrupt client disconnects no longer leak tunnel resources. ([#&#8203;82444](https://github.com/openclaw/openclaw/issues/82444)) Thanks [@&#8203;SebTardif](https://github.com/SebTardif).
- Diffs: continue hydrating later diff cards when one card fails so a single broken card no longer blanks the whole diff viewer. ([#&#8203;84775](https://github.com/openclaw/openclaw/issues/84775)) Thanks [@&#8203;cosmopolitan033](https://github.com/cosmopolitan033).
- Mac app: use the native settings sidebar window chrome so the sidebar toggle stays on the left and content no longer clips under oversized titlebar padding.
- QA-Lab/Codex: bundle auth/plugin fixture imports for flow scenarios and let terminal async media tools end Codex app-server turns without timing out. ([#&#8203;80397](https://github.com/openclaw/openclaw/issues/80397), refs [#&#8203;80323](https://github.com/openclaw/openclaw/issues/80323)) Thanks [@&#8203;100yenadmin](https://github.com/100yenadmin).
- Gateway/agents: preserve fresh session overrides and metadata when stale cached agent-session entries race with store updates, so subagent model/provider overrides and routing policy survive concurrent writes. ([#&#8203;19328](https://github.com/openclaw/openclaw/issues/19328)) Thanks [@&#8203;CodeReclaimers](https://github.com/CodeReclaimers).
- Control UI/chat: keep chat session search inline with the session selector so the header no longer shows a duplicate standalone search row.
- Control UI/chat: collapse focused-mode header chrome and suppress hidden-header scroll updates so focus mode no longer jumps while scrolling. Thanks [@&#8203;amknight](https://github.com/amknight).
- Codex app-server: restart the native app-server and retry once when server-side compaction times out, so preflight compaction stalls recover instead of failing every dispatch. ([#&#8203;85500](https://github.com/openclaw/openclaw/issues/85500))
- Restore Control UI gateway token pairing \[AI]. ([#&#8203;85459](https://github.com/openclaw/openclaw/issues/85459)) Thanks [@&#8203;pgondhi987](https://github.com/pgondhi987).
- OpenAI video: honor configured provider request private-network opt-in for local/custom video endpoints so explicitly trusted mock and self-hosted providers are not blocked. Thanks [@&#8203;shakkernerd](https://github.com/shakkernerd).
- OpenAI video: send uploaded video edit requests to the documented `/videos/edits` endpoint with a `video` file instead of posting MP4 references to `/videos`. Thanks [@&#8203;shakkernerd](https://github.com/shakkernerd).
- Agents/channels: preserve message-tool delivery evidence through gateway agent completion handoffs so successful generated media sends are not followed by false failure messages. Thanks [@&#8203;shakkernerd](https://github.com/shakkernerd).
- CLI/update: repair managed npm plugin `openclaw` peer links during post-core convergence and reject stale or wrong-target peer links before restart. ([#&#8203;83794](https://github.com/openclaw/openclaw/issues/83794)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).
- CLI/agents: default new omitted-account bindings to all accounts when the channel has multiple configured accounts, and clarify account-scope docs. ([#&#8203;49769](https://github.com/openclaw/openclaw/issues/49769)) Thanks [@&#8203;Gcaufy](https://github.com/Gcaufy).
- Codex app-server: let authorized `/codex` control commands such as `/codex detach` escape plugin-owned conversation bindings while keeping unknown or unauthorized slash text routed to the bound plugin. Fixes [#&#8203;85157](https://github.com/openclaw/openclaw/issues/85157). ([#&#8203;85188](https://github.com/openclaw/openclaw/issues/85188)) Thanks [@&#8203;TurboTheTurtle](https://github.com/TurboTheTurtle).
- Auto-reply/models: keep `/models` browse replies fast by sharing the bounded read-only catalog path with Gateway model listing. ([#&#8203;84735](https://github.com/openclaw/openclaw/issues/84735)) Thanks [@&#8203;safrano9999](https://github.com/safrano9999).
- Codex app-server: disable native Code Mode when the effective exec host is `node` and keep OpenClaw `exec`/`process` available, so `/exec host=node` routes shell commands through the selected node instead of the gateway. Fixes [#&#8203;85012](https://github.com/openclaw/openclaw/issues/85012). ([#&#8203;85090](https://github.com/openclaw/openclaw/issues/85090)) Thanks [@&#8203;sahilsatralkar](https://github.com/sahilsatralkar).
- Agents: bound embedded auto-compaction session write-lock watchdogs to the compaction timeout instead of the full run timeout, so stuck compaction cannot hold the live session lock for the whole run window. ([#&#8203;84949](https://github.com/openclaw/openclaw/issues/84949)) Thanks [@&#8203;luoyanglang](https://github.com/luoyanglang).
- Gateway/agents: return phase-aware `agent.wait` timeout attribution and only cool auth profiles on provider-started timeouts. Refs [#&#8203;65504](https://github.com/openclaw/openclaw/issues/65504). Thanks [@&#8203;100yenadmin](https://github.com/100yenadmin).
- Gateway: defer provider auth-state prewarm until after startup readiness so early gateway tool/session requests are not blocked by provider auth discovery. ([#&#8203;85272](https://github.com/openclaw/openclaw/issues/85272)) Thanks [@&#8203;dutifulbob](https://github.com/dutifulbob).
- Gateway/models: coalesce provider auth-state rewarms after auth-profile failures and log event-loop delay for warm/rewarm work, so provider auth bursts no longer stack full auth sweeps behind channel replies.
- Gateway/models: stop cancelled provider auth-state prewarms from continuing full provider sweeps, so reload and auth-failure bursts no longer keep startup busy.
- Agents/Codex: show the first plan update as a transient chat status notice without counting it as final assistant content.
- CLI/update: walk the macOS process ancestry and honor the inherited Gateway runtime PID before package updates stop the managed Gateway service, so nested in-band updater children can refuse instead of killing the LaunchAgent-supervised Gateway that owns them. Fixes [#&#8203;85120](https://github.com/openclaw/openclaw/issues/85120).
- Gateway/LaunchAgent: wait for launchd reload bootout to finish and fall back to kickstart when bootstrap races, so reload handoff does not leave the service deregistered. Fixes [#&#8203;84630](https://github.com/openclaw/openclaw/issues/84630). ([#&#8203;84641](https://github.com/openclaw/openclaw/issues/84641)) Thanks [@&#8203;NianJiuZst](https://github.com/NianJiuZst).
- Gateway/LaunchAgent: treat a concurrent launchd bootstrap as a successful restart when the service is already loaded, avoiding false macOS Gateway restart failures. Fixes [#&#8203;84721](https://github.com/openclaw/openclaw/issues/84721). ([#&#8203;84722](https://github.com/openclaw/openclaw/issues/84722)) Thanks [@&#8203;googlerest](https://github.com/googlerest).
- Gateway/service: include the active `openclaw` command bin directory in managed service PATH generation and doctor audit expectations for npm-global macOS installs. Fixes [#&#8203;84201](https://github.com/openclaw/openclaw/issues/84201). ([#&#8203;84475](https://github.com/openclaw/openclaw/issues/84475)) Thanks [@&#8203;jbetala7](https://github.com/jbetala7).
- Control UI/chat: disable the thinking selector for known non-reasoning models instead of showing duplicate Off choices. Fixes [#&#8203;84069](https://github.com/openclaw/openclaw/issues/84069). Thanks [@&#8203;DrippingMellow](https://github.com/DrippingMellow).
- Memory: expand `~` in configured extra memory paths before resolving them, so home-relative folders are not treated as workspace-relative. Fixes [#&#8203;58026](https://github.com/openclaw/openclaw/issues/58026). Thanks [@&#8203;stadman](https://github.com/stadman).
- Skills: treat `openclaw.os: macos` as Darwin when checking skill requirements, so macOS-only skills no longer report as missing on macOS hosts. Fixes [#&#8203;61338](https://github.com/openclaw/openclaw/issues/61338). Thanks [@&#8203;Jessecq1995](https://github.com/Jessecq1995).
- Control UI/logs: strip ANSI escape sequences from displayed Gateway log messages so color codes no longer appear as raw text. Fixes [#&#8203;64399](https://github.com/openclaw/openclaw/issues/64399). Thanks [@&#8203;guguangxin-eng](https://github.com/guguangxin-eng).
- Docker: pre-create the workspace and auth-profile config mount points with `node` ownership so first-run named volumes do not start root-owned. Fixes [#&#8203;85076](https://github.com/openclaw/openclaw/issues/85076). Thanks [@&#8203;Noerr](https://github.com/Noerr).
- Telegram: pass configured markdown table mode through outbound markdown chunking so chunked sends render tables consistently. Fixes [#&#8203;85085](https://github.com/openclaw/openclaw/issues/85085). Thanks [@&#8203;ShuaiHui](https://github.com/ShuaiHui).
- CLI/update: preserve managed Gateway service environment during package cutovers so macOS LaunchAgent repair/restart reads the pre-update service state instead of caller shell state. ([#&#8203;83026](https://github.com/openclaw/openclaw/issues/83026))
- Agents/providers: honor per-model `api` and `baseUrl` overrides in custom provider auth hooks and transport selection. Fixes [#&#8203;80487](https://github.com/openclaw/openclaw/issues/80487). ([#&#8203;80488](https://github.com/openclaw/openclaw/issues/80488)) Thanks [@&#8203;huveewomg](https://github.com/huveewomg).
- Gateway/restart: eager-load the lifecycle runtime before in-place upgrade signal handling so package replacement does not deadlock restart imports. ([#&#8203;84890](https://github.com/openclaw/openclaw/issues/84890)) Thanks [@&#8203;myps6415](https://github.com/myps6415).
- CLI/update: start managed Gateway update handoff helpers from a stable existing directory and tolerate deleted cwd/package roots during macOS LaunchAgent handoff. Fixes [#&#8203;83808](https://github.com/openclaw/openclaw/issues/83808). ([#&#8203;83875](https://github.com/openclaw/openclaw/issues/83875)) Thanks [@&#8203;jason-allen-oneal](https://github.com/jason-allen-oneal).
- Skills: watch each shared skill directory once across agent workspaces instead of once per agent, preventing file-descriptor exhaustion (`EMFILE`) that disposed bundle-mcp processes and stalled sessions on multi-agent gateways. Fixes [#&#8203;84968](https://github.com/openclaw/openclaw/issues/84968). ([#&#8203;85130](https://github.com/openclaw/openclaw/issues/85130)) Thanks [@&#8203;openperf](https://github.com/openperf).
- Release/security: keep generated npm shrinkwrap package versions inside the pnpm lock graph so published package locks cannot bypass pnpm dependency age and override policy.
- Cron: honor `cron.retry.retryOn: ["network"]` for common network error codes such as `EAI_AGAIN`, `EHOSTUNREACH`, and `ENETUNREACH`.
- Gateway chat: broadcast returned agent-run error payloads after an agent starts so ACP/WebChat clients receive terminal idle-timeout errors. Fixes [#&#8203;84945](https://github.com/openclaw/openclaw/issues/84945).
- Gateway chat display: preserve OpenAI-compatible `prompt_tokens`, `completion_tokens`, and `total_tokens` usage fields in sanitized chat history so llama.cpp sessions keep context counts. Fixes [#&#8203;77992](https://github.com/openclaw/openclaw/issues/77992). Thanks [@&#8203;MarTT79](https://github.com/MarTT79).
- Dashboard/CLI: allow macOS browser launching through `open` even when SSH environment variables are present, while preserving Linux SSH no-display protection. Fixes [#&#8203;67088](https://github.com/openclaw/openclaw/issues/67088). Thanks [@&#8203;theglove44](https://github.com/theglove44).
- Codex app-server: keep native web search observations out of mirrored chat transcripts while preserving tool progress telemetry. Fixes [#&#8203;85109](https://github.com/openclaw/openclaw/issues/85109). Thanks [@&#8203;ugitmebaby](https://github.com/ugitmebaby).
- OpenCode Go: strip unsupported Kimi reasoning replay fields before provider requests so repeated `kimi-k2.6` turns do not fail schema validation. Fixes [#&#8203;83812](https://github.com/openclaw/openclaw/issues/83812). Thanks [@&#8203;Sleeck](https://github.com/Sleeck).
- Browser/CDP: add a WSL2 portproxy self-loop hint when Chrome DevTools endpoints accept connections but return an empty HTTP reply. Fixes [#&#8203;59209](https://github.com/openclaw/openclaw/issues/59209). Thanks [@&#8203;Owlock](https://github.com/Owlock).
- Agents/OpenAI: preserve structured provider error code, type, and redacted body metadata on boundary-aware transport failures.
- Doctor/Codex: point native Codex asset warnings at the canonical `openclaw migrate plan codex` preview command. Fixes [#&#8203;84948](https://github.com/openclaw/openclaw/issues/84948). Thanks [@&#8203;markoa](https://github.com/markoa).
- CLI/models: make `capability model auth logout --agent` remove auth profiles from the selected non-default agent store. Fixes [#&#8203;85092](https://github.com/openclaw/openclaw/issues/85092). Thanks [@&#8203;islandpreneur007](https://github.com/islandpreneur007).
- Gateway/models: reuse prepared provider auth metadata during model-listing auth checks so repeated lookups avoid broad plugin discovery while preserving synthetic local auth.
- CLI/status: suppress systemd user-service setup hints when `openclaw status --deep` can already reach a running Gateway RPC service. Fixes [#&#8203;85094](https://github.com/openclaw/openclaw/issues/85094). Thanks [@&#8203;islandpreneur007](https://github.com/islandpreneur007).
- CLI/devices: recover local approval when a same-device repair request replaces the request ID being approved.
- CLI/agents: retry transient normal-close Gateway handshakes before falling back to embedded `openclaw agent` execution.
- CLI/update: keep managed Gateway service stop/restart status lines out of `openclaw update --json` stdout so package-update automation can parse the JSON payload.
- Plugins: resolve OpenClaw plugin SDK subpaths for native external plugin runtimes without mutating package installs or broadening process-wide module resolution.
- Agents/OpenAI: preserve Responses and Chat Completions `reasoning_tokens` usage metadata without double-counting it in aggregate output tokens. ([#&#8203;85319](https://github.com/openclaw/openclaw/issues/85319))
- Control UI/chat: convert pasted `data:image/...;base64,...` clipboard text into an image attachment instead of dumping the payload into the composer. Fixes [#&#8203;62604](https://github.com/openclaw/openclaw/issues/62604). Thanks [@&#8203;cpwilhelmi](https://github.com/cpwilhelmi).
- Providers/Gemini: strip fractional seconds from web-search time range filters so Gemini accepts freshness-bound search requests. ([#&#8203;85071](https://github.com/openclaw/openclaw/issues/85071)) Thanks [@&#8203;Noerr](https://github.com/Noerr).
- OpenAI Codex: preserve image input support for sparse `openai-codex/gpt-5.5` catalog rows. ([#&#8203;85095](https://github.com/openclaw/openclaw/issues/85095)) Thanks [@&#8203;sercada](https://github.com/sercada).
- CLI/models: add a piped or pasted API-key path for OpenAI Codex auth and warn when API keys are pasted into token-mode auth. ([#&#8203;85533](https://github.com/openclaw/openclaw/issues/85533)) Thanks [@&#8203;joshavant](https://github.com/joshavant).
- Telegram: dead-letter missing-harness isolated ingress failures so a poisoned spooled update no longer blocks later same-lane messages. Fixes [#&#8203;85470](https://github.com/openclaw/openclaw/issues/85470). ([#&#8203;85605](https://github.com/openclaw/openclaw/issues/85605)) Thanks [@&#8203;joshavant](https://github.com/joshavant).
- Plugins/discovery: strip `-plugin` package suffixes when deriving plugin id hints so package names line up with manifest ids. ([#&#8203;85170](https://github.com/openclaw/openclaw/issues/85170)) Thanks [@&#8203;JulyanXu](https://github.com/JulyanXu).
- Tlon: stop advertising a non-existent agent tool contract in the plugin manifest.
- Telegram: preserve fenced code block languages through Markdown rendering so Telegram receives `language-*` code classes. ([#&#8203;85209](https://github.com/openclaw/openclaw/issues/85209)) Thanks [@&#8203;leno23](https://github.com/leno23).
- Windows installer: run npm and Corepack command shims from a Windows-local directory so installs launched from WSL2 UNC paths do not fail before OpenClaw is installed.
- Windows updates: roll back git-backed updates to the previous checkout when dependency install, build, UI build, or doctor repair fails.
- Windows installer: persist user-local portable Git on PATH and activate the repo-pinned pnpm version for git-backed installs and updates.
- Windows installer: bootstrap a user-local portable Node.js when native Windows has no Node and no winget, Chocolatey, or Scoop, so first-run installs can continue on raw hosts.
- Windows installer: extract the downloaded portable Node.js directory with native `tar` before falling back to .NET zip extraction, avoiding PowerShell 5.1 archive and path-length failures.
- fix(integrations): enforce channel read target allowlists \[AI]. ([#&#8203;84982](https://github.com/openclaw/openclaw/issues/84982)) Thanks [@&#8203;pgondhi987](https://github.com/pgondhi987).
- Agents/heartbeat: route single-owner `session.dmScope=main` direct-message exec and cron event wakes back to the agent main session so async completions no longer strand context in orphan direct-DM queues. Fixes [#&#8203;71581](https://github.com/openclaw/openclaw/issues/71581). ([#&#8203;83743](https://github.com/openclaw/openclaw/issues/83743)) Thanks [@&#8203;Kaspre](https://github.com/Kaspre).
- Agents/code-mode: expose outer code-mode `exec` source through the `command` hook alias with `toolKind`/`toolInputKind` discriminators so exec-shaped policies can distinguish code-mode cells. ([#&#8203;83483](https://github.com/openclaw/openclaw/issues/83483)) Thanks [@&#8203;Kaspre](https://github.com/Kaspre).
- Agents/code mode: return structured timeout and runtime-unavailable error codes for known worker failures. Fixes [#&#8203;83389](https://github.com/openclaw/openclaw/issues/83389). ([#&#8203;83444](https://github.com/openclaw/openclaw/issues/83444)) Thanks [@&#8203;Kaspre](https://github.com/Kaspre).
- QA-Lab: isolate multi-scenario suite workers when scenarios need startup config patches, preventing message-routing config from leaking into unrelated scenarios.
- QA-Lab: make the commitments heartbeat-target-none scenario request an immediate heartbeat instead of waiting for the next scheduled heartbeat.
- Codex/Plugin SDK: deliver Codex-native subagent completions through a generic harness task runtime so harness-backed plugins can mirror durable task lifecycle and completion delivery without Codex-specific SDK imports. ([#&#8203;83445](https://github.com/openclaw/openclaw/issues/83445)) Thanks [@&#8203;bryanpearson](https://github.com/bryanpearson).
- Gateway CLI: surface local post-challenge connect assembly failures immediately instead of waiting for the wrapper timeout. Fixes [#&#8203;68944](https://github.com/openclaw/openclaw/issues/68944). ([#&#8203;85253](https://github.com/openclaw/openclaw/issues/85253)) Thanks [@&#8203;samzong](https://github.com/samzong).
- Messages: strip unsupported web-search citation control markers from outbound replies before they reach WebChat or external channels. Fixes [#&#8203;85193](https://github.com/openclaw/openclaw/issues/85193). ([#&#8203;85204](https://github.com/openclaw/openclaw/issues/85204)) Thanks [@&#8203;neeravmakwana](https://github.com/neeravmakwana).
- Agents/exec: treat denied exec approvals as terminal instead of feeding them back into agent follow-up work, and recognize Chinese stop phrases in abort handling. Fixes [#&#8203;69386](https://github.com/openclaw/openclaw/issues/69386). ([#&#8203;85194](https://github.com/openclaw/openclaw/issues/85194)) Thanks [@&#8203;samzong](https://github.com/samzong).
- CLI/agents: abort accepted Gateway-backed `openclaw agent` runs on SIGINT/SIGTERM so cron and supervisor timeouts do not leave remote agent work alive. Fixes [#&#8203;71710](https://github.com/openclaw/openclaw/issues/71710). ([#&#8203;84381](https://github.com/openclaw/openclaw/issues/84381)) Thanks [@&#8203;Kaspre](https://github.com/Kaspre).
- Codex app-server: retry replay-safe stdio client-close turns once using structured failure metadata, while surfacing idle `turn/completed` timeouts instead of blindly replaying active shared-server turns. Thanks [@&#8203;VACInc](https://github.com/VACInc).
- Codex app-server: reject command overrides that embed Node or package-manager arguments and point users to `appServer.args`, so Windows startup avoids shell parsing failures. ([#&#8203;84417](https://github.com/openclaw/openclaw/issues/84417)) Thanks [@&#8203;TurboTheTurtle](https://github.com/TurboTheTurtle).
- Agents/Copilot: drop unsafe GitHub Copilot Responses reasoning replay items before send so Telegram direct sessions no longer fail on overlong replay IDs. Fixes [#&#8203;85197](https://github.com/openclaw/openclaw/issues/85197). ([#&#8203;85198](https://github.com/openclaw/openclaw/issues/85198)) Thanks [@&#8203;galiniliev](https://github.com/galiniliev).
- UI: add accessible tooltips to the topbar color-mode buttons so System, Light, and Dark choices are labeled on hover and focus. ([#&#8203;85227](https://github.com/openclaw/openclaw/issues/85227)) Thanks [@&#8203;amknight](https://github.com/amknight).
- fix: constrain Windows task script names \[AI]. ([#&#8203;85064](https://github.com/openclaw/openclaw/issues/85064)) Thanks [@&#8203;pgondhi987](https://github.com/pgondhi987).
- Control UI: keep the chat session picker from hiding older or cross-agent configured conversations while preserving the bounded configured-agent refresh. ([#&#8203;85211](https://github.com/openclaw/openclaw/issues/85211)) Thanks [@&#8203;amknight](https://github.com/amknight).
- Agents/Anthropic: preserve unsafe integer tool-call input values in streamed Anthropic tool-use JSON, preventing Discord-style IDs from being rounded before dispatch. Fixes [#&#8203;47229](https://github.com/openclaw/openclaw/issues/47229). ([#&#8203;83063](https://github.com/openclaw/openclaw/issues/83063)) Thanks [@&#8203;leno23](https://github.com/leno23).
- Agents/Codex: estimate tool-heavy prompt pressure at the LLM boundary before provider submission, so persistent sessions compact before overflowing context windows. ([#&#8203;85541](https://github.com/openclaw/openclaw/issues/85541)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev) and [@&#8203;joshavant](https://github.com/joshavant).
- Agents/hooks: wait for local one-shot CLI and Codex `agent_end` plugin hooks before process cleanup so terminal observability flushes reliably. ([#&#8203;85007](https://github.com/openclaw/openclaw/issues/85007))
- Providers/Google: preserve Gemini 3 cron `thinkingDefault: "low"` when stale catalog metadata says `reasoning:false`, so scheduled runs keep provider-supported thinking instead of downgrading to off. ([#&#8203;85185](https://github.com/openclaw/openclaw/issues/85185)) Thanks [@&#8203;neeravmakwana](https://github.com/neeravmakwana).
- CLI/agents: allow `openclaw agent --session-key` to target explicit session keys, including agent-scoped legacy keys. ([#&#8203;85121](https://github.com/openclaw/openclaw/issues/85121)) Thanks [@&#8203;Kaspre](https://github.com/Kaspre).
- Auto-reply/ACP: wait for same-channel block reply delivery before starting tool work, while still honoring ACP dispatch aborts so stopped turns do not wait on slow channel sends. ([#&#8203;83722](https://github.com/openclaw/openclaw/issues/83722)) Thanks [@&#8203;IWhatsskill](https://github.com/IWhatsskill).
- Codex/ACP: mark required child-run completions that only report progress, omit a final deliverable, or fail requester delivery as blocked while preserving real final reports. ([#&#8203;85110](https://github.com/openclaw/openclaw/issues/85110)) Thanks [@&#8203;IWhatsskill](https://github.com/IWhatsskill).
- Channels: treat bare abort messages such as `stop`, `abort`, and `wait` as immediate control commands in inbound debounce paths so stop requests are not delayed behind pending message coalescing. ([#&#8203;83348](https://github.com/openclaw/openclaw/issues/83348)) Thanks [@&#8203;IWhatsskill](https://github.com/IWhatsskill).
- Channels/message tool: resolve configured external channel plugins during in-agent channel selection, so `openclaw agent --local` message-tool sends no longer report an available channel as unavailable. ([#&#8203;85022](https://github.com/openclaw/openclaw/issues/85022)) Thanks [@&#8203;Kaspre](https://github.com/Kaspre).
- Agents/heartbeat: honor group/channel `message_tool` visible-reply policy and model-specific Codex runtime config for scheduled heartbeat runs, so failed internal tool output stays private. Fixes [#&#8203;85310](https://github.com/openclaw/openclaw/issues/85310). ([#&#8203;85357](https://github.com/openclaw/openclaw/issues/85357)) Thanks [@&#8203;neeravmakwana](https://github.com/neeravmakwana).
- Gateway/ACP: close child ACP sessions spawned via `sessions_spawn` when their parent session is reset or deleted, instead of leaving orphaned `claude-agent-acp` processes that accumulate and exhaust memory. Fixes [#&#8203;68916](https://github.com/openclaw/openclaw/issues/68916). ([#&#8203;85190](https://github.com/openclaw/openclaw/issues/85190)) Thanks [@&#8203;openperf](https://github.com/openperf).
- Codex app-server: block native execution paths when OpenClaw exec resolves to a node host while preserving the first-party CLI node binding path. Fixes [#&#8203;85012](https://github.com/openclaw/openclaw/issues/85012). ([#&#8203;85534](https://github.com/openclaw/openclaw/issues/85534)) Thanks [@&#8203;joshavant](https://github.com/joshavant).
- Diagnostics: bound cleanup timeout detail logs, emit drop summaries when async diagnostic bursts exceed the queue cap, and surface async queue drops through diagnostic telemetry.
- Agents/subagents: surface blocked child-run completions as errors instead of successful subagent finishes. ([#&#8203;80886](https://github.com/openclaw/openclaw/issues/80886)) Thanks [@&#8203;TurboTheTurtle](https://github.com/TurboTheTurtle).
- Context engines: fail closed with a descriptive error when the selected agent runtime cannot satisfy declared context-engine host requirements.
- Agents/Pi: treat accepted embedded `sessions_spawn` child-session handoffs as terminal progress so parent turns no longer report false non-deliverable failures. ([#&#8203;85054](https://github.com/openclaw/openclaw/issues/85054)) Thanks [@&#8203;samzong](https://github.com/samzong).
- CLI/models: resolve `openclaw models set` aliases from the runtime config while keeping authored aliases ahead of runtime-only defaults. ([#&#8203;83262](https://github.com/openclaw/openclaw/issues/83262)) Thanks [@&#8203;IWhatsskill](https://github.com/IWhatsskill).
- Doctor: show personal Codex CLI asset notices as info instead of warnings. Fixes [#&#8203;84859](https://github.com/openclaw/openclaw/issues/84859).
- WhatsApp: update Baileys to `7.0.0-rc13` and drop the obsolete logger type patch.
- CLI/update: pre-pack GitHub/git package update targets before the staged npm install, restoring `openclaw update --tag main` for one-off package updates. ([#&#8203;81296](https://github.com/openclaw/openclaw/issues/81296)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).
- Gateway: mirror successful same-source message-tool sends into session transcripts so delivered replies stay in later history/context. ([#&#8203;84837](https://github.com/openclaw/openclaw/issues/84837)) Thanks [@&#8203;iFiras-Max1](https://github.com/iFiras-Max1).
- Media generation: keep image, music, and video completion delivery from duplicating or losing task ownership when generated media finishes through active session replies. ([#&#8203;84006](https://github.com/openclaw/openclaw/issues/84006)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).
- Infra/json: retry transient `File changed during read` races while loading JSON state so config and state reads recover instead of failing the turn. ([#&#8203;84285](https://github.com/openclaw/openclaw/issues/84285))
- Plugins/providers: fail closed for workspace provider plugins during setup-mode discovery unless explicitly trusted, preventing untrusted workspace plugin code from running during provider setup. ([#&#8203;81069](https://github.com/openclaw/openclaw/issues/81069)) Thanks [@&#8203;mmaps](https://github.com/mmaps).
- Providers/Ollama: resolve configured Ollama Cloud `OLLAMA_API_KEY` markers to the real discovery key so cloud provider entries keep authenticated model catalog access. ([#&#8203;85037](https://github.com/openclaw/openclaw/issues/85037))
- Discord: keep persistent component registry fallback warnings actionable by forwarding structured error and cause metadata through the runtime logger. Fixes [#&#8203;84185](https://github.com/openclaw/openclaw/issues/84185). ([#&#8203;84190](https://github.com/openclaw/openclaw/issues/84190)) Thanks [@&#8203;100menotu001](https://github.com/100menotu001).
- Gateway/sessions: preserve compatible session auth profile overrides when switching models within the same provider, including provider-auth aliases. Fixes [#&#8203;81837](https://github.com/openclaw/openclaw/issues/81837). ([#&#8203;81886](https://github.com/openclaw/openclaw/issues/81886)) Thanks [@&#8203;TurboTheTurtle](https://github.com/TurboTheTurtle).
- Gateway/status: surface inbound delivery telemetry counters and transport-liveness warnings in `openclaw status --all`. Fixes [#&#8203;49577](https://github.com/openclaw/openclaw/issues/49577). ([#&#8203;72724](https://github.com/openclaw/openclaw/issues/72724))
- Docker: prune package-excluded plugin source workspaces and dependency closures so runtime images do not keep packages for plugins that were not opted in.
- Providers/Ollama: treat Docker/OrbStack host aliases as local Ollama endpoints so `ollama-local` marker auth works when OpenClaw runs inside a VM/container and Ollama runs on the host. Fixes [#&#8203;84875](https://github.com/openclaw/openclaw/issues/84875).
- QA-Lab: keep explicitly searchable/deferred OpenClaw dynamic tool rows report-only by default so tool-coverage gates do not treat mock discovery gaps as hard product failures. ([#&#8203;80319](https://github.com/openclaw/openclaw/issues/80319)) Thanks [@&#8203;100yenadmin](https://github.com/100yenadmin).
- Agents/config: keep non-Google provider model refs from being rewritten by Google Gemini preview-id normalization. ([#&#8203;84762](https://github.com/openclaw/openclaw/issues/84762)) Thanks [@&#8203;zhangguiping-xydt](https://github.com/zhangguiping-xydt).
- Installer: require a real controlling terminal before launching onboarding so headless `curl | bash` installs finish cleanly after installing the CLI.
- Agents/Codex: promote a completed final assistant response when a prompt timeout races Codex app-server completion instead of returning an empty timeout envelope. Refs [#&#8203;84516](https://github.com/openclaw/openclaw/issues/84516).
- Codex app-server: keep interrupted turn statuses from being treated as OpenClaw aborts by themselves, so tool-only turns remain eligible for no-visible-answer recovery. Fixes [#&#8203;84492](https://github.com/openclaw/openclaw/issues/84492).
- Agents: cap heartbeat model bleed context hints by the stored session window when runtime model metadata is unavailable, so overflow recovery advice does not suggest a larger window than the active session actually has.
- Control UI/Web Push: use `https://openclaw.ai` as the generated default VAPID subject instead of the old localhost mailbox so iOS PWA push setup uses an Apple-acceptable subject when `OPENCLAW_VAPID_SUBJECT` is unset. Fixes [#&#8203;83134](https://github.com/openclaw/openclaw/issues/83134). ([#&#8203;83317](https://github.com/openclaw/openclaw/issues/83317)) Thanks [@&#8203;IWhatsskill](https://github.com/IWhatsskill).
- Control UI: distinguish inherited thinking-off settings from explicit Off selections so the thinking selector no longer shows two identical Off rows. ([#&#8203;85223](https://github.com/openclaw/openclaw/issues/85223)) Thanks [@&#8203;amknight](https://github.com/amknight).
- Agents/Pi: keep embedded session transcript writes from tripping false takeover detection after packaged npm onboarding agent turns.
- Codex/TUI: surface Codex-native post-turn compaction failures instead of continuing uncompacted, and keep successful native compaction serialized before local idle/next-turn handling. Fixes [#&#8203;84305](https://github.com/openclaw/openclaw/issues/84305). ([#&#8203;85160](https://github.com/openclaw/openclaw/issues/85160)) Thanks [@&#8203;joshavant](https://github.com/joshavant).
- Memory/search: stop recall tracking from writing dreaming side-effect artifacts when `dreaming.enabled=false`, while preserving normal search results. Fixes [#&#8203;84436](https://github.com/openclaw/openclaw/issues/84436). ([#&#8203;84444](https://github.com/openclaw/openclaw/issues/84444)) Thanks [@&#8203;NianJiuZst](https://github.com/NianJiuZst).
- Diffs: render viewer toolbar icons from a closed icon-name map instead of HTML strings, removing the toolbar icon XSS sink. ([#&#8203;83955](https://github.com/openclaw/openclaw/issues/83955)) Thanks [@&#8203;tanshanshan](https://github.com/tanshanshan).
- QA: keep `pnpm qa:e2e` self-check runs inside the private QA runtime envelope even when inherited shell env disables bundled plugins.
- fix(config): validate browser sandbox bind sources \[AI]. ([#&#8203;84799](https://github.com/openclaw/openclaw/issues/84799)) Thanks [@&#8203;pgondhi987](https://github.com/pgondhi987).
- doctor: constrain legacy plugin cleanup paths \[AI]. ([#&#8203;84801](https://github.com/openclaw/openclaw/issues/84801)) Thanks [@&#8203;pgondhi987](https://github.com/pgondhi987).
- Update/doctor: prune stale local bundled plugin install records that point at old compiled bundled output so current bundled plugin schemas win after upgrade. ([#&#8203;84863](https://github.com/openclaw/openclaw/issues/84863)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).
- Providers/Ollama: preserve native Ollama tool-call IDs across assistant replay so Gemini over Ollama Cloud can keep its hidden function-call thought-signature handle.
- Discord: keep session recovery and `/stop` abort ownership on the source dispatch lane while bound ACP turns continue routing to their target session, so stalled pre-run work and late replies are cleared instead of leaking after stop. Fixes [#&#8203;84477](https://github.com/openclaw/openclaw/issues/84477). ([#&#8203;85100](https://github.com/openclaw/openclaw/issues/85100)) Thanks [@&#8203;joshavant](https://github.com/joshavant).
- Codex app-server: mark missing turn completion after observed execution as replay-unsafe and release the session so follow-up turns can run. Fixes [#&#8203;84076](https://github.com/openclaw/openclaw/issues/84076). ([#&#8203;85107](https://github.com/openclaw/openclaw/issues/85107)) Thanks [@&#8203;joshavant](https://github.com/joshavant).
- Codex app-server: give visible `message` dynamic tool sends a longer timeout budget so slow channel delivery can return its own result or error instead of hitting the 30-second Codex wrapper. ([#&#8203;85216](https://github.com/openclaw/openclaw/issues/85216)) Thanks [@&#8203;amknight](https://github.com/amknight).
- Codex app-server: add a dedicated post-tool raw assistant completion idle timeout config so trusted heavy turns can wait longer after tool handoff without weakening final assistant release.
- Matrix: keep explicitly configured two-person rooms on the room route before stale `m.direct` or strict two-member DM fallback can bypass mention gating. Fixes [#&#8203;85017](https://github.com/openclaw/openclaw/issues/85017). ([#&#8203;85137](https://github.com/openclaw/openclaw/issues/85137)) Thanks [@&#8203;joshavant](https://github.com/joshavant).
- Agents/subagents: require explicit subagent allowlist targets to be configured agents so stale deleted-agent ids are omitted from `agents_list` and rejected by `sessions_spawn`. Fixes [#&#8203;84811](https://github.com/openclaw/openclaw/issues/84811). ([#&#8203;85154](https://github.com/openclaw/openclaw/issues/85154)) Thanks [@&#8203;joshavant](https://github.com/joshavant).
- PDF tool: time out idle remote PDF body reads after 120 seconds so stalled remote documents return an error instead of wedging the session. Fixes [#&#8203;68649](https://github.com/openclaw/openclaw/issues/68649). ([#&#8203;84768](https://github.com/openclaw/openclaw/issues/84768)) Thanks [@&#8203;luoyanglang](https://github.com/luoyanglang).
- Diagnostics/OpenTelemetry plugin: suppress handled OTLP exporter promise rejections so collector shutdowns no longer crash the Gateway. ([#&#8203;81085](https://github.com/openclaw/openclaw/issues/81085)) Thanks [@&#8203;luoyanglang](https://github.com/luoyanglang).
- Agents/exec: omit raw command text and env values from denied exec failure logs while keeping safe correlation metadata. Fixes [#&#8203;85049](https://github.com/openclaw/openclaw/issues/85049). ([#&#8203;85140](https://github.com/openclaw/openclaw/issues/85140)) Thanks [@&#8203;joshavant](https://github.com/joshavant).
- Media/audio: skip empty structured sherpa-onnx transcripts instead of treating the raw JSON payload as spoken text. ([#&#8203;84667](https://github.com/openclaw/openclaw/issues/84667)) Thanks [@&#8203;TurboTheTurtle](https://github.com/TurboTheTurtle).
- Agents/exec: preserve inherited XDG base-directory environment values for subprocesses while still rejecting agent-supplied XDG overrides. Fixes [#&#8203;84854](https://github.com/openclaw/openclaw/issues/84854). ([#&#8203;85139](https://github.com/openclaw/openclaw/issues/85139)) Thanks [@&#8203;joshavant](https://github.com/joshavant).
- Node/Linux: keep `OPENCLAW_GATEWAY_TOKEN` out of generated systemd unit files by writing node service token values to a node-specific env file. ([#&#8203;84408](https://github.com/openclaw/openclaw/issues/84408))
- Memory-core/dreaming: reuse stable narrative subagent session keys per workspace and phase while keeping per-run idempotency and bounded cleanup, so stale `dreaming-narrative-*` sessions do not accumulate. Fixes [#&#8203;68252](https://github.com/openclaw/openclaw/issues/68252), [#&#8203;69187](https://github.com/openclaw/openclaw/issues/69187), and [#&#8203;70402](https://github.com/openclaw/openclaw/issues/70402). ([#&#8203;70464](https://github.com/openclaw/openclaw/issues/70464)) Thanks [@&#8203;chiyouYCH](https://github.com/chiyouYCH).
- Trajectory/support: tol…
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
…uisition (openclaw#85764)

* fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition

- Adds optional maxHoldMs parameter to inspectLockPayload
- Inspect now marks locks as stale when held longer than maxHoldMs
- Passes maxHoldMs through inspectLockPayloadForSession
- acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs

This ensures that when a live process holds a lock for longer than
maxHoldMs (default 5min), other processes can reclaim it during
acquisition — matching the watchdog's existing enforcement.

Previously shouldReclaim only used staleMs (30min default), meaning
a lock held for 10+ minutes by a live PID would never be reclaimable,
causing 60s timeout failures and gateway freezes.

Closes openclaw#85762

* fix(session-lock): add dead-PID fast-path before retry loop

Adds a fast-path check at the top of acquireSessionWriteLock:
if the lock file's owner PID is dead, remove it immediately
before entering the retry loop. This saves up to timeoutMs (60s)
of futile waiting when the previous lock holder has died.

The shouldReclaim callback already handles this case, but only
iteratively through the retry loop. The fast-path eliminates
that unnecessary delay.

* fix(session-lock): enforce max hold during acquisition

* fix(session-lock): revalidate max hold safely

* fix(session-lock): honor holder max-hold policy

* fix(session-lock): keep cleanup from reclaiming live holders

* fix(session-lock): remove stale locks only when unchanged

* fix(session-lock): skip self-held max-hold reclaim

* fix(ci): refresh gateway protocol checks

---------

Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
…uisition (openclaw#85764)

* fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition

- Adds optional maxHoldMs parameter to inspectLockPayload
- Inspect now marks locks as stale when held longer than maxHoldMs
- Passes maxHoldMs through inspectLockPayloadForSession
- acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs

This ensures that when a live process holds a lock for longer than
maxHoldMs (default 5min), other processes can reclaim it during
acquisition — matching the watchdog's existing enforcement.

Previously shouldReclaim only used staleMs (30min default), meaning
a lock held for 10+ minutes by a live PID would never be reclaimable,
causing 60s timeout failures and gateway freezes.

Closes openclaw#85762

* fix(session-lock): add dead-PID fast-path before retry loop

Adds a fast-path check at the top of acquireSessionWriteLock:
if the lock file's owner PID is dead, remove it immediately
before entering the retry loop. This saves up to timeoutMs (60s)
of futile waiting when the previous lock holder has died.

The shouldReclaim callback already handles this case, but only
iteratively through the retry loop. The fast-path eliminates
that unnecessary delay.

* fix(session-lock): enforce max hold during acquisition

* fix(session-lock): revalidate max hold safely

* fix(session-lock): honor holder max-hold policy

* fix(session-lock): keep cleanup from reclaiming live holders

* fix(session-lock): remove stale locks only when unchanged

* fix(session-lock): skip self-held max-hold reclaim

* fix(ci): refresh gateway protocol checks

---------

Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
…uisition (openclaw#85764)

* fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition

- Adds optional maxHoldMs parameter to inspectLockPayload
- Inspect now marks locks as stale when held longer than maxHoldMs
- Passes maxHoldMs through inspectLockPayloadForSession
- acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs

This ensures that when a live process holds a lock for longer than
maxHoldMs (default 5min), other processes can reclaim it during
acquisition — matching the watchdog's existing enforcement.

Previously shouldReclaim only used staleMs (30min default), meaning
a lock held for 10+ minutes by a live PID would never be reclaimable,
causing 60s timeout failures and gateway freezes.

Closes openclaw#85762

* fix(session-lock): add dead-PID fast-path before retry loop

Adds a fast-path check at the top of acquireSessionWriteLock:
if the lock file's owner PID is dead, remove it immediately
before entering the retry loop. This saves up to timeoutMs (60s)
of futile waiting when the previous lock holder has died.

The shouldReclaim callback already handles this case, but only
iteratively through the retry loop. The fast-path eliminates
that unnecessary delay.

* fix(session-lock): enforce max hold during acquisition

* fix(session-lock): revalidate max hold safely

* fix(session-lock): honor holder max-hold policy

* fix(session-lock): keep cleanup from reclaiming live holders

* fix(session-lock): remove stale locks only when unchanged

* fix(session-lock): skip self-held max-hold reclaim

* fix(ci): refresh gateway protocol checks

---------

Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
…uisition (openclaw#85764)

* fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition

- Adds optional maxHoldMs parameter to inspectLockPayload
- Inspect now marks locks as stale when held longer than maxHoldMs
- Passes maxHoldMs through inspectLockPayloadForSession
- acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs

This ensures that when a live process holds a lock for longer than
maxHoldMs (default 5min), other processes can reclaim it during
acquisition — matching the watchdog's existing enforcement.

Previously shouldReclaim only used staleMs (30min default), meaning
a lock held for 10+ minutes by a live PID would never be reclaimable,
causing 60s timeout failures and gateway freezes.

Closes openclaw#85762

* fix(session-lock): add dead-PID fast-path before retry loop

Adds a fast-path check at the top of acquireSessionWriteLock:
if the lock file's owner PID is dead, remove it immediately
before entering the retry loop. This saves up to timeoutMs (60s)
of futile waiting when the previous lock holder has died.

The shouldReclaim callback already handles this case, but only
iteratively through the retry loop. The fast-path eliminates
that unnecessary delay.

* fix(session-lock): enforce max hold during acquisition

* fix(session-lock): revalidate max hold safely

* fix(session-lock): honor holder max-hold policy

* fix(session-lock): keep cleanup from reclaiming live holders

* fix(session-lock): remove stale locks only when unchanged

* fix(session-lock): skip self-held max-hold reclaim

* fix(ci): refresh gateway protocol checks

---------

Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
…uisition (openclaw#85764)

* fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition

- Adds optional maxHoldMs parameter to inspectLockPayload
- Inspect now marks locks as stale when held longer than maxHoldMs
- Passes maxHoldMs through inspectLockPayloadForSession
- acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs

This ensures that when a live process holds a lock for longer than
maxHoldMs (default 5min), other processes can reclaim it during
acquisition — matching the watchdog's existing enforcement.

Previously shouldReclaim only used staleMs (30min default), meaning
a lock held for 10+ minutes by a live PID would never be reclaimable,
causing 60s timeout failures and gateway freezes.

Closes openclaw#85762

* fix(session-lock): add dead-PID fast-path before retry loop

Adds a fast-path check at the top of acquireSessionWriteLock:
if the lock file's owner PID is dead, remove it immediately
before entering the retry loop. This saves up to timeoutMs (60s)
of futile waiting when the previous lock holder has died.

The shouldReclaim callback already handles this case, but only
iteratively through the retry loop. The fast-path eliminates
that unnecessary delay.

* fix(session-lock): enforce max hold during acquisition

* fix(session-lock): revalidate max hold safely

* fix(session-lock): honor holder max-hold policy

* fix(session-lock): keep cleanup from reclaiming live holders

* fix(session-lock): remove stale locks only when unchanged

* fix(session-lock): skip self-held max-hold reclaim

* fix(ci): refresh gateway protocol checks

---------

Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
…uisition (openclaw#85764)

* fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition

- Adds optional maxHoldMs parameter to inspectLockPayload
- Inspect now marks locks as stale when held longer than maxHoldMs
- Passes maxHoldMs through inspectLockPayloadForSession
- acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs

This ensures that when a live process holds a lock for longer than
maxHoldMs (default 5min), other processes can reclaim it during
acquisition — matching the watchdog's existing enforcement.

Previously shouldReclaim only used staleMs (30min default), meaning
a lock held for 10+ minutes by a live PID would never be reclaimable,
causing 60s timeout failures and gateway freezes.

Closes openclaw#85762

* fix(session-lock): add dead-PID fast-path before retry loop

Adds a fast-path check at the top of acquireSessionWriteLock:
if the lock file's owner PID is dead, remove it immediately
before entering the retry loop. This saves up to timeoutMs (60s)
of futile waiting when the previous lock holder has died.

The shouldReclaim callback already handles this case, but only
iteratively through the retry loop. The fast-path eliminates
that unnecessary delay.

* fix(session-lock): enforce max hold during acquisition

* fix(session-lock): revalidate max hold safely

* fix(session-lock): honor holder max-hold policy

* fix(session-lock): keep cleanup from reclaiming live holders

* fix(session-lock): remove stale locks only when unchanged

* fix(session-lock): skip self-held max-hold reclaim

* fix(ci): refresh gateway protocol checks

---------

Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
…uisition (openclaw#85764)

* fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition

- Adds optional maxHoldMs parameter to inspectLockPayload
- Inspect now marks locks as stale when held longer than maxHoldMs
- Passes maxHoldMs through inspectLockPayloadForSession
- acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs

This ensures that when a live process holds a lock for longer than
maxHoldMs (default 5min), other processes can reclaim it during
acquisition — matching the watchdog's existing enforcement.

Previously shouldReclaim only used staleMs (30min default), meaning
a lock held for 10+ minutes by a live PID would never be reclaimable,
causing 60s timeout failures and gateway freezes.

Closes openclaw#85762

* fix(session-lock): add dead-PID fast-path before retry loop

Adds a fast-path check at the top of acquireSessionWriteLock:
if the lock file's owner PID is dead, remove it immediately
before entering the retry loop. This saves up to timeoutMs (60s)
of futile waiting when the previous lock holder has died.

The shouldReclaim callback already handles this case, but only
iteratively through the retry loop. The fast-path eliminates
that unnecessary delay.

* fix(session-lock): enforce max hold during acquisition

* fix(session-lock): revalidate max hold safely

* fix(session-lock): honor holder max-hold policy

* fix(session-lock): keep cleanup from reclaiming live holders

* fix(session-lock): remove stale locks only when unchanged

* fix(session-lock): skip self-held max-hold reclaim

* fix(ci): refresh gateway protocol checks

---------

Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
eleboucher pushed a commit to eleboucher/homelab that referenced this pull request May 27, 2026
…026.5.26) (#682)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [ghcr.io/openclaw/openclaw](https://openclaw.ai) ([source](https://github.com/openclaw/openclaw)) | patch | `2026.5.22` → `2026.5.26` |

---

> ⚠️ **Warning**
>
> Some dependencies could not be looked up. Check the [Dependency Dashboard](issues/567) for more information.

---

### Release Notes

<details>
<summary>openclaw/openclaw (ghcr.io/openclaw/openclaw)</summary>

### [`v2026.5.26`](https://github.com/openclaw/openclaw/blob/HEAD/CHANGELOG.md#2026526)

[Compare Source](https://github.com/openclaw/openclaw/compare/v2026.5.22...v2026.5.26)

##### Highlights

- Faster Gateway and replies: startup avoids repeated plugin, channel, session, usage-cost, warning, scheduled-service, and filesystem scans; visible replies separate user-facing sends from slower follow-up work; Gateway runtime/session caches churn less under load.
- Transcripts are core: transcript-backed meeting summaries, source-provider chunks, cleaned user turns, media provenance, Codex mirrors, WebChat replies, and CLI/TUI replay now use one more reliable transcript path.
- More channels are production-ready: Telegram keeps typing/progress context and forum topics, iMessage handles attachment roots, remote media staging, and duplicate local Messages sources, WhatsApp restores group/media behavior, Discord improves voice playback and model picking, and Signal/iMessage/WhatsApp get reaction approvals.
- Better voice and Talk: realtime Talk runs can be inspected, steered, cancelled, or followed up from Web UI and Discord voice; wake-name handling is more tolerant without letting ambient speech trigger agents.
- Safer content boundaries: Browser snapshot reads honor SSRF policy, system-event text cannot spoof nested prompt markers, fetched file text is wrapped as external content, ClickClack inbound sender allowlists run before agent dispatch, stale device tokens are rejected, and serialized tool-call text is scrubbed from replies.
- Providers, Codex, and local models are steadier: named auth profiles, OpenAI sampling params, Codex app-server resume/timeout/usage-limit recovery, dynamic tool-schema guards, xAI usage-limit surfacing, Ollama top-p normalization, and local approval resolution reduce provider-specific dead ends.
- More reliable install/update/release paths: Alpine installs, trusted runtime fallback roots, stable update channels, Docker/package timeouts, Windows Scheduled Tasks, Windows/macOS proof lanes, Testbox/Crabbox delegation, plugin publish checks, and macOS runner bootstraps all got hardened.
- Better observability: Activity tab, gateway secret-prep traces, tool/model stream progress, explicit fast-mode status, systemd Gateway hygiene, OpenTelemetry LLM spans, release performance evidence, and richer telemetry signals make failures easier to inspect.

##### Changes

- Transcripts: add core transcript capture and source-provider support for transcript-backed meeting summaries, including the renamed Transcripts docs, CLI surface, source-provider chunks, and cleaned user-turn persistence.
- Auth: add named model login profiles and supported credential migration for Hermes, OpenCode, and Codex auth profiles, with explicit opt-out and non-interactive controls. ([#&#8203;85667](https://github.com/openclaw/openclaw/issues/85667)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).
- Diagnostics: trace gateway secret preparation, classify skill/tool usage, surface model stream progress, add OpenTelemetry LLM content spans, and expose alertable telemetry for blocked tools, failover, stale sessions, liveness, oversized payloads, and webhook ingress. ([#&#8203;83019](https://github.com/openclaw/openclaw/issues/83019), [#&#8203;80370](https://github.com/openclaw/openclaw/issues/80370), [#&#8203;86191](https://github.com/openclaw/openclaw/issues/86191))
- Channels: add Signal reaction approvals, iMessage thumb approval reactions, and WhatsApp thumb approval reaction support so mobile approval flows work without textual `/approve` commands. ([#&#8203;85894](https://github.com/openclaw/openclaw/issues/85894), [#&#8203;85952](https://github.com/openclaw/openclaw/issues/85952), [#&#8203;85477](https://github.com/openclaw/openclaw/issues/85477))
- Agents/API: forward OpenAI sampling params through the Gateway and expose estimated context-budget status for active agent runs. ([#&#8203;84094](https://github.com/openclaw/openclaw/issues/84094))
- TUI/status: queue prompts submitted while an agent is busy and show explicit fast-mode state plus richer systemd Gateway hygiene in status output. ([#&#8203;86722](https://github.com/openclaw/openclaw/issues/86722), [#&#8203;87115](https://github.com/openclaw/openclaw/issues/87115), [#&#8203;86976](https://github.com/openclaw/openclaw/issues/86976))
- Exec approvals: hide durable approval actions that are unavailable for the current prompt and keep approval runtime tokens local-only so stale prompts cannot offer misleading controls. ([#&#8203;86270](https://github.com/openclaw/openclaw/issues/86270), [#&#8203;86359](https://github.com/openclaw/openclaw/issues/86359))
- Plugin SDK: add reaction approval helpers and keep diagnostic event root exports discoverable across function-name and alias-bound module graphs. ([#&#8203;86735](https://github.com/openclaw/openclaw/issues/86735), [#&#8203;87084](https://github.com/openclaw/openclaw/issues/87084))
- Android/iOS: add the Android pair-new-gateway action and improve mobile Talk mode surfaces, including iOS realtime Talk mode and Android offline voice/gateway recovery. ([#&#8203;86798](https://github.com/openclaw/openclaw/issues/86798), [#&#8203;86355](https://github.com/openclaw/openclaw/issues/86355)) Thanks [@&#8203;ngutman](https://github.com/ngutman).
- Performance: cache plugin metadata snapshots, package realpaths, stable gateway metadata, model cost indexes, channel resolution, usage-cost indexes, and session/auth hot-path facts so common Gateway and reply paths do less rediscovery. ([#&#8203;84649](https://github.com/openclaw/openclaw/issues/84649), [#&#8203;85843](https://github.com/openclaw/openclaw/issues/85843), [#&#8203;86517](https://github.com/openclaw/openclaw/issues/86517), [#&#8203;86678](https://github.com/openclaw/openclaw/issues/86678))
- Voice: expose shared realtime turn-context tracking through the realtime voice SDK and reuse it for Discord speaker attribution and wake-name context recovery.
- Voice: reuse shared realtime output activity tracking in Google Meet command and node audio bridges, including recent-output checks for local barge-in detection.
- Voice: expose shared realtime output activity tracking through the realtime voice SDK and reuse it for Discord playback activity and barge-in decisions.
- Voice: expose shared realtime consult question matching, speakable-result extraction, and alias-aware forced-consult coordination through the realtime voice SDK, then reuse it in Gateway Talk, Voice Call, and Discord voice paths.
- Voice: share activation-name matching and consult-transcript screening through the realtime voice SDK so Discord, browser voice, and meeting surfaces can reuse one implementation.
- Cron: default `cron.maxConcurrentRuns` to 8 so scheduled automations and their isolated agent turns can make progress in parallel without explicit configuration.
- QA-Lab: add `qa coverage --match <query>` so focused proof selection can discover matching scenarios from existing metadata before running live or remote lanes.
- Discord/model picker: surface an alpha-bucket select (e.g. `A–G (12) · H–N (18) · O–Z (5)`) when the provider list or a provider's model list exceeds 25 items, so configs with `provider/*` wildcards stay one click from the right page instead of paginating through prev/next; falls back to numeric chunks when every item shares the same first letter.
- Control UI: add an ephemeral Activity tab for sanitized live tool activity summaries without persisting raw telemetry. Fixes [#&#8203;12831](https://github.com/openclaw/openclaw/issues/12831). Thanks [@&#8203;BunsDev](https://github.com/BunsDev).
- Build: include `ui:build` in the `full` and `ciArtifacts` profiles of `scripts/build-all.mjs` so `pnpm build` always rebuilds `dist/control-ui` after `tsdown` cleans `dist`, removing the second-command requirement and the missing-asset failure mode for source/runtime installs and CI artifact uploads. ([#&#8203;85206](https://github.com/openclaw/openclaw/issues/85206))
- iOS: improve Talk mode with direct realtime voice sessions, compact toolbar status, and responsive voice waveform feedback. ([#&#8203;86355](https://github.com/openclaw/openclaw/issues/86355)) Thanks [@&#8203;ngutman](https://github.com/ngutman).
- Media: replace the Sharp image backend with Rastermill for metadata, resizing, EXIF orientation, and PNG alpha-preserving optimization so OpenClaw no longer installs Sharp or the WhatsApp Jimp fallback for image processing. ([#&#8203;86437](https://github.com/openclaw/openclaw/issues/86437))
- Codex: update the bundled Codex CLI to 0.134.0 and keep native compaction disabled for budget-triggered app-server turns so OpenClaw owns the recovery boundary. ([#&#8203;86772](https://github.com/openclaw/openclaw/issues/86772))

##### Fixes

- Memory/security: reject prompt-like text submitted through the explicit `memory_store` tool before embedding or storage, matching the existing auto-capture prompt-injection filter. ([#&#8203;87142](https://github.com/openclaw/openclaw/issues/87142))

- Gateway/security: enable the default auth rate limiter for remote non-browser and HTTP gateway auth failures when `gateway.auth.rateLimit` is unset, while preserving the loopback exemption. ([#&#8203;87148](https://github.com/openclaw/openclaw/issues/87148))

- Prompt hardening: route untrusted group prompt metadata through sanitized untrusted structured context while preserving trusted operator-configured group system prompts and aligning the plugin SDK docs/test helpers. ([#&#8203;87144](https://github.com/openclaw/openclaw/issues/87144))

- Security/content boundaries: validate Browser snapshot tab URLs against SSRF policy before ChromeMCP or direct CDP reads, sanitize queued system-event text so untrusted plugin/channel labels cannot spoof nested prompt markers, wrap fetched file text and metadata as external content, apply ClickClack `allowFrom` sender allowlists before agent dispatch, reject RPCs from invalidated device-token clients during rotation, require staged sandbox media refs, and scrub serialized tool-call text from replies. ([#&#8203;78526](https://github.com/openclaw/openclaw/issues/78526), [#&#8203;87094](https://github.com/openclaw/openclaw/issues/87094), [#&#8203;87062](https://github.com/openclaw/openclaw/issues/87062), [#&#8203;83741](https://github.com/openclaw/openclaw/issues/83741), [#&#8203;70707](https://github.com/openclaw/openclaw/issues/70707), [#&#8203;86924](https://github.com/openclaw/openclaw/issues/86924)) Thanks [@&#8203;zsxsoft](https://github.com/zsxsoft), [@&#8203;ttzero25](https://github.com/ttzero25), and [@&#8203;mmaps](https://github.com/mmaps).

- Transcripts/user turns: persist CLI, WebChat, media, follow-up, hook, and Codex-mirror user turns to the admitted session target; keep cleaned transcript text, inline image routing, provenance metadata, replay hooks, and fallback paths idempotent when runtimes fail or restart.

- TUI/status/onboarding/UI: queue busy TUI prompts instead of dropping them, preserve the configured default model during onboarding, show failed tool results as errors, show config-open failures in Control UI, keep status JSON plugin scans healthy, preserve xAI usage-limit errors locally, and expose explicit fast-mode/systemd state. ([#&#8203;86722](https://github.com/openclaw/openclaw/issues/86722), [#&#8203;87000](https://github.com/openclaw/openclaw/issues/87000), [#&#8203;85786](https://github.com/openclaw/openclaw/issues/85786), [#&#8203;87108](https://github.com/openclaw/openclaw/issues/87108), [#&#8203;87001](https://github.com/openclaw/openclaw/issues/87001), [#&#8203;86614](https://github.com/openclaw/openclaw/issues/86614), [#&#8203;87115](https://github.com/openclaw/openclaw/issues/87115), [#&#8203;86976](https://github.com/openclaw/openclaw/issues/86976))

- Plugin commands/SDK: preserve plugin LLM command auth, bind native plugin command dispatch to the host agent's LLM auth, keep `onDiagnosticEvent` exports discoverable through `Function.name`, stabilize diagnostic event root aliases, correlate pathless read diagnostics, suppress transient runner failures in channel command paths, and repair local approval resolution. ([#&#8203;85936](https://github.com/openclaw/openclaw/issues/85936), [#&#8203;87084](https://github.com/openclaw/openclaw/issues/87084), [#&#8203;86977](https://github.com/openclaw/openclaw/issues/86977), [#&#8203;87069](https://github.com/openclaw/openclaw/issues/87069), [#&#8203;86771](https://github.com/openclaw/openclaw/issues/86771))

- Codex/providers: keep WebChat delivery hints out of user prompts, avoid false queued-terminal idle timeouts, share the native hook relay registry, quarantine unsupported dynamic tool schemas, preserve Claude resumed-session system prompts, normalize greedy Ollama `top_p`, preserve per-agent thinking defaults for ingress runs, and avoid native compaction takeover on budget-triggered Codex turns. ([#&#8203;87096](https://github.com/openclaw/openclaw/issues/87096), [#&#8203;73950](https://github.com/openclaw/openclaw/issues/73950), [#&#8203;87049](https://github.com/openclaw/openclaw/issues/87049), [#&#8203;86689](https://github.com/openclaw/openclaw/issues/86689), [#&#8203;86772](https://github.com/openclaw/openclaw/issues/86772))

- Gateway/perf/release: reuse startup-warning metadata and prepared auth stores, avoid cloning live-switch and lifecycle session caches on read paths, defer warning and scheduled-service fallback imports, trim Gateway session/startup/runtime CPU churn, skip duplicate turn session touches, stop chat timeout fallback cascades, drop stale subagent announce history, bound benchmark/watch/kitchen-sink teardown waits, bound macOS/package/onboarding/plugin smoke commands, bound install finalization probes, resolve Parallels npm-update commands from guest `PATH`, and bootstrap raw AWS macOS Node/pnpm commands through `/usr/bin/env`. ([#&#8203;86997](https://github.com/openclaw/openclaw/issues/86997))

- Reply/perf: reduce visible reply delivery latency by preserving Telegram typing/progress context, lazy-loading slash-command startup metadata, avoiding hot-path model hydration, flag-gating Codex profiler timing, deferring context compaction maintenance, and tracking delivery timing. ([#&#8203;86989](https://github.com/openclaw/openclaw/issues/86989), [#&#8203;86990](https://github.com/openclaw/openclaw/issues/86990), [#&#8203;86991](https://github.com/openclaw/openclaw/issues/86991), [#&#8203;86992](https://github.com/openclaw/openclaw/issues/86992), [#&#8203;86993](https://github.com/openclaw/openclaw/issues/86993), [#&#8203;86994](https://github.com/openclaw/openclaw/issues/86994)) Thanks [@&#8203;keshavbotagent](https://github.com/keshavbotagent).

- Reply/source delivery: keep TUI, Control UI, media, TTS, transcript, and Codex source-reply finals live without duplicate terminal events or stale replay artifacts.

- Agents/replay: repair legacy tool results before replay, preserve `sessions_spawn` transcript payloads, restore current guard checks, stage sandboxed workspace media, and keep duplicate transcripts tool display metadata from reappearing. ([#&#8203;82203](https://github.com/openclaw/openclaw/issues/82203), [#&#8203;86934](https://github.com/openclaw/openclaw/issues/86934), [#&#8203;87025](https://github.com/openclaw/openclaw/issues/87025)) Thanks [@&#8203;martingarramon](https://github.com/martingarramon), [@&#8203;vincentkoc](https://github.com/vincentkoc), and [@&#8203;joshavant](https://github.com/joshavant).

- Agents/sessions: handle active-fallback failures in `sessions_send` so fallback routing reports the real failure and does not leave callers with an ambiguous dropped send. ([#&#8203;86638](https://github.com/openclaw/openclaw/issues/86638))

- Agents/hooks/subagents: enforce default hook agent allowlists, recover failed subagent lifecycle completions, and keep node task lifecycle cleanup from closing the Gateway listener. ([#&#8203;86101](https://github.com/openclaw/openclaw/issues/86101))

- Codex: project newer OpenClaw chat history into resumed app-server threads and keep Codex turn timeouts inside the Codex runtime boundary so timeouts do not poison shared app-server clients or fall through to unrelated provider fallback. ([#&#8203;86677](https://github.com/openclaw/openclaw/issues/86677), [#&#8203;86476](https://github.com/openclaw/openclaw/issues/86476)) Thanks [@&#8203;TurboTheTurtle](https://github.com/TurboTheTurtle) and [@&#8203;pashpashpash](https://github.com/pashpashpash).

- Config/doctor/update: narrow profiled tool-section doctor repair, keep runtime-injected legacy web-search provider config out of user-authored config validation, and keep prerelease tags excluded from stable updater resolution. ([#&#8203;87030](https://github.com/openclaw/openclaw/issues/87030), [#&#8203;86818](https://github.com/openclaw/openclaw/issues/86818), [#&#8203;86559](https://github.com/openclaw/openclaw/issues/86559)) Thanks [@&#8203;joshavant](https://github.com/joshavant), [@&#8203;luoyanglang](https://github.com/luoyanglang), and [@&#8203;stevenepalmer](https://github.com/stevenepalmer).

- Doctor/runtime: validate active bundled MCP tool schemas through the same runtime projection path so unsupported MCP input schemas are reported and quarantined instead of poisoning assistant startup.

- CLI/Windows: add a Windows-only stack-size respawn for stack-heavy startup paths, default CLI logs to local timestamps, and validate timeout/banner TTY state more strictly. ([#&#8203;87031](https://github.com/openclaw/openclaw/issues/87031), [#&#8203;85387](https://github.com/openclaw/openclaw/issues/85387)) Thanks [@&#8203;giodl73-repo](https://github.com/giodl73-repo) and [@&#8203;vincentkoc](https://github.com/vincentkoc).

- Locking/security: require owner identity proof before stale plugin lock removal, memoize session lock owner arguments, and avoid writing default exec approval stores unless policy state actually changed. ([#&#8203;86814](https://github.com/openclaw/openclaw/issues/86814), [#&#8203;86964](https://github.com/openclaw/openclaw/issues/86964)) Thanks [@&#8203;Alix-007](https://github.com/Alix-007) and [@&#8203;vincentkoc](https://github.com/vincentkoc).

- Install/release: bound Docker package build, inventory, pack, and tarball preparation with process-group timeouts; pin shrinkwrap patch drift to the pnpm lock; harden macOS restart and dSYM packaging; and run release Docker/live timeout wrappers in the foreground so child processes cannot wedge gates.

- QA/Telegram: bound Telegram user credential tar and broker calls so live proof setup fails with a timeout instead of waiting for the outer Crabbox job deadline.

- QA/Tool Search: bound gateway E2E HTTP probes, run only the fixture plugin, and clean up temporary fixture trees after the compact tool-catalog proof completes.

- Telegram/network: treat `ENETDOWN` as a transient pre-connect network failure so Telegram sends, gateway unhandled-rejection handling, and cron network retries follow the same recovery path as sibling network outages. ([#&#8203;86762](https://github.com/openclaw/openclaw/issues/86762)) Thanks [@&#8203;TurboTheTurtle](https://github.com/TurboTheTurtle).

- Telegram: preserve inbound text entities, overlapping DM replies, account topic cache sidecars, outbound reply context, targeted bot-command mentions, durable group retry targets, forum topic names, and native progress callbacks. ([#&#8203;83873](https://github.com/openclaw/openclaw/issues/83873), [#&#8203;85361](https://github.com/openclaw/openclaw/issues/85361), [#&#8203;85555](https://github.com/openclaw/openclaw/issues/85555), [#&#8203;85656](https://github.com/openclaw/openclaw/issues/85656), [#&#8203;85709](https://github.com/openclaw/openclaw/issues/85709), [#&#8203;86299](https://github.com/openclaw/openclaw/issues/86299), [#&#8203;86553](https://github.com/openclaw/openclaw/issues/86553)) Thanks [@&#8203;SebTardif](https://github.com/SebTardif), [@&#8203;luoyanglang](https://github.com/luoyanglang), and [@&#8203;neeravmakwana](https://github.com/neeravmakwana).

- iMessage: read image attachments from local Messages attachment roots, dedupe duplicate local Messages-source accounts, seed direct DM history, fix image/group media attachment commands, advance catchup cursors after live handling, and keep slash-command acknowledgements in the source conversation. ([#&#8203;82642](https://github.com/openclaw/openclaw/issues/82642), [#&#8203;85475](https://github.com/openclaw/openclaw/issues/85475), [#&#8203;86569](https://github.com/openclaw/openclaw/issues/86569), [#&#8203;86705](https://github.com/openclaw/openclaw/issues/86705), [#&#8203;86706](https://github.com/openclaw/openclaw/issues/86706), [#&#8203;86770](https://github.com/openclaw/openclaw/issues/86770)) Thanks [@&#8203;homer-byte](https://github.com/homer-byte), [@&#8203;TurboTheTurtle](https://github.com/TurboTheTurtle), [@&#8203;swang430](https://github.com/swang430), and [@&#8203;OmarShahine](https://github.com/OmarShahine).

- WhatsApp/QQ/Twitch/IRC/Slack: restore WhatsApp ack identity and group-drop warnings, make QQ Bot media respect `OPENCLAW_HOME`, serialize Twitch auth disconnects, store IRC channel routes canonically, and keep Slack downloaded files out of reply media. ([#&#8203;83833](https://github.com/openclaw/openclaw/issues/83833), [#&#8203;85309](https://github.com/openclaw/openclaw/issues/85309), [#&#8203;85777](https://github.com/openclaw/openclaw/issues/85777), [#&#8203;85794](https://github.com/openclaw/openclaw/issues/85794), [#&#8203;85906](https://github.com/openclaw/openclaw/issues/85906), [#&#8203;86318](https://github.com/openclaw/openclaw/issues/86318), [#&#8203;86697](https://github.com/openclaw/openclaw/issues/86697)) Thanks [@&#8203;sliverp](https://github.com/sliverp), [@&#8203;neeravmakwana](https://github.com/neeravmakwana), and [@&#8203;Kailigithub](https://github.com/Kailigithub).

- Discord/voice: improve voice playback and wake replies, bucket large model picker menus, merge media captions into one message, route metadata through configured proxies, restore numeric channel sends, suppress self-reply echoes, and tighten wake matching without breaking fuzzy wake phrases. ([#&#8203;80227](https://github.com/openclaw/openclaw/issues/80227), [#&#8203;86238](https://github.com/openclaw/openclaw/issues/86238), [#&#8203;86487](https://github.com/openclaw/openclaw/issues/86487), [#&#8203;86571](https://github.com/openclaw/openclaw/issues/86571), [#&#8203;86595](https://github.com/openclaw/openclaw/issues/86595), [#&#8203;86601](https://github.com/openclaw/openclaw/issues/86601))

- Codex: preserve native web-search metadata, keep oversized native thread reuse, bridge CLI API-key auth into the app server, preserve sandbox bootstrap path style, recover context-window prompt errors, honor yolo approval policy, disable native thread personality, and route compaction through Codex auth. ([#&#8203;85378](https://github.com/openclaw/openclaw/issues/85378), [#&#8203;85542](https://github.com/openclaw/openclaw/issues/85542), [#&#8203;85891](https://github.com/openclaw/openclaw/issues/85891), [#&#8203;85909](https://github.com/openclaw/openclaw/issues/85909), [#&#8203;86408](https://github.com/openclaw/openclaw/issues/86408))

- Agents/runtime: enforce session lock max-hold reclaim, release embedded-attempt locks on all exits, treat aborted subagent runs as terminal, avoid runtime model hydration on hot paths, disclose scoped session list counts, derive overflow budgets from provider errors, and keep fallback errors scoped to the active model candidate. ([#&#8203;70473](https://github.com/openclaw/openclaw/issues/70473), [#&#8203;85764](https://github.com/openclaw/openclaw/issues/85764), [#&#8203;86014](https://github.com/openclaw/openclaw/issues/86014), [#&#8203;86134](https://github.com/openclaw/openclaw/issues/86134), [#&#8203;86427](https://github.com/openclaw/openclaw/issues/86427), [#&#8203;86944](https://github.com/openclaw/openclaw/issues/86944)) Thanks [@&#8203;openperf](https://github.com/openperf), [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev), [@&#8203;zhangguiping-xydt](https://github.com/zhangguiping-xydt), and [@&#8203;ferminquant](https://github.com/ferminquant).

- Config/update/doctor: retry config recovery after failed backup restore, skip shell env fallback on Windows, exclude prerelease tags from the stable git channel, support deep config edits, warn instead of aborting on unreadable cron stores, prune stale bundled plugin paths, and avoid duplicate restart prompts when the Gateway is already healthy. ([#&#8203;85739](https://github.com/openclaw/openclaw/issues/85739), [#&#8203;85787](https://github.com/openclaw/openclaw/issues/85787), [#&#8203;86060](https://github.com/openclaw/openclaw/issues/86060), [#&#8203;86260](https://github.com/openclaw/openclaw/issues/86260), [#&#8203;86384](https://github.com/openclaw/openclaw/issues/86384), [#&#8203;86533](https://github.com/openclaw/openclaw/issues/86533)) Thanks [@&#8203;liaoyl830](https://github.com/liaoyl830).

- Install/release: support Alpine CLI installs and runtime floors, prefer trusted startup argv runtime fallback roots, reject stale CLI node runtimes, avoid npm `min-release-age` installer failures, bound npm/package/Docker install phases, restore config parent ownership in Docker, seed Docker lockfile package tarballs before prune, make release/plugin prerelease checks fail closed instead of hanging or false-greening, and use host-visible Crabbox local work roots for Docker-backed proof. ([#&#8203;85491](https://github.com/openclaw/openclaw/issues/85491))

- Windows daemon: keep Scheduled Task gateway launches running on battery power and avoid workgroup-machine prompts for a domain user during task installation. ([#&#8203;59299](https://github.com/openclaw/openclaw/issues/59299))

- Security: avoid printing Gateway tokens in Docker, validate plugin model-pattern regexes safely, escape transcript metadata field names, harden session allowlist glob matching, audit Claude permission overrides under YOLO, and require explicit allow for ACP auto approvals. ([#&#8203;85849](https://github.com/openclaw/openclaw/issues/85849), [#&#8203;85934](https://github.com/openclaw/openclaw/issues/85934), [#&#8203;86046](https://github.com/openclaw/openclaw/issues/86046), [#&#8203;86557](https://github.com/openclaw/openclaw/issues/86557))

- Media/images: replace Sharp with Rastermill, keep EXIF normalization best-effort, normalize HEIC/HEIF before image descriptions, route Codex image API keys through OpenAI, preserve image compression metadata, and auto-scale live tool result caps. ([#&#8203;85776](https://github.com/openclaw/openclaw/issues/85776), [#&#8203;86037](https://github.com/openclaw/openclaw/issues/86037), [#&#8203;86437](https://github.com/openclaw/openclaw/issues/86437), [#&#8203;86857](https://github.com/openclaw/openclaw/issues/86857), [#&#8203;86923](https://github.com/openclaw/openclaw/issues/86923))

- Memory: prevent semantic vector indexes from silently degrading when embeddings are unavailable, stop doctor OOMs on large session stores, preserve sidecar hooks/artifacts, write fallback dream diaries, use CJK-aware dreaming dedupe, and avoid per-file watcher FD fan-out. ([#&#8203;80613](https://github.com/openclaw/openclaw/issues/80613), [#&#8203;82928](https://github.com/openclaw/openclaw/issues/82928), [#&#8203;85060](https://github.com/openclaw/openclaw/issues/85060), [#&#8203;85704](https://github.com/openclaw/openclaw/issues/85704), [#&#8203;85967](https://github.com/openclaw/openclaw/issues/85967), [#&#8203;86701](https://github.com/openclaw/openclaw/issues/86701)) Thanks [@&#8203;brokemac79](https://github.com/brokemac79), [@&#8203;openperf](https://github.com/openperf), and [@&#8203;yaaboo-gif](https://github.com/yaaboo-gif).

- Agents/sessions: include visibility metadata on restricted `sessions_list` results so scoped counts are clearly reported without widening access or exposing hidden-session counts. ([#&#8203;86944](https://github.com/openclaw/openclaw/issues/86944)) Thanks [@&#8203;ferminquant](https://github.com/ferminquant).

- Gateway/DNS: validate wide-area discovery domains before deriving zone paths or writing zone files, so invalid `discovery.wideArea.domain` and `dns setup --domain` values fail with a DNS-name diagnostic instead of falling through to unrelated configuration errors. Thanks [@&#8203;mmaps](https://github.com/mmaps).

- Agents/BTW: route fallback side-question streams through the embedded stream resolver so Anthropic-compatible MiniMax requests use the same capped transport as normal chat. ([#&#8203;86312](https://github.com/openclaw/openclaw/issues/86312)) Thanks [@&#8203;neeravmakwana](https://github.com/neeravmakwana).

- Telegram: treat `/command@TargetBot` bot-command entities as explicit mentions for the addressed bot so `requireMention` groups no longer drop targeted commands or captions. Fixes [#&#8203;84462](https://github.com/openclaw/openclaw/issues/84462). ([#&#8203;86553](https://github.com/openclaw/openclaw/issues/86553)) Thanks [@&#8203;luoyanglang](https://github.com/luoyanglang).

- CI: bound Docker/Bash E2E tarball npm installs with `OPENCLAW_E2E_NPM_INSTALL_TIMEOUT` so package, onboarding, plugin, and upgrade lanes fail instead of hanging on a stuck npm install.

- CI: fail Parallels npm-update smoke jobs after the guest command timeout and cleanup backstop instead of only logging a timeout line.

- CI: bound kitchen-sink RPC HTTP probes so stalled gateway readiness or response bodies fail and retry instead of wedging the walker.

- CI: bound Telegram user Crabbox proof Bot API calls so stalled Telegram responses fail instead of wedging credential and desktop proof cleanup.

- CI: bound MCP channel stdio client initialization so Docker channel proof fails and closes the bridge transport instead of waiting for the outer job timeout.

- CI: keep `OPENCLAW_TESTBOX=1 pnpm check:changed` delegating to Blacksmith Testbox through Crabbox without forwarding local Testbox or worker env into the remote command.

- CI: send KILL after the TERM grace period for manual checkout fetch timeouts so stuck Testbox and workflow checkout retries cannot hang behind a wedged `git fetch`.

- CI: send KILL after the TERM grace period for Bun global install smoke command timeouts so trapped `openclaw` child processes cannot wedge the scheduled install smoke.

- iMessage: thread current channel/account inbound attachment roots into the image tool so iMessage-saved attachments under `~/Library/Messages/Attachments` (including the wildcard `/Users/*/Library/Messages/Attachments` root) are read through the existing inbound path policy instead of being rejected as `path-not-allowed`. Literal `localRoots` stays workspace-scoped. Fixes [#&#8203;30170](https://github.com/openclaw/openclaw/issues/30170). ([#&#8203;86569](https://github.com/openclaw/openclaw/issues/86569))

- QQ Bot: respect `OPENCLAW_HOME` for outbound media path resolution so `<qqmedia>` sends no longer silently fail when `HOME` and `OPENCLAW_HOME` differ (Docker / multi-user hosts). Persisted QQ Bot data (sessions, known users, refs) stays anchored on the OS home for upgrade compatibility. Fixes [#&#8203;83562](https://github.com/openclaw/openclaw/issues/83562). Thanks [@&#8203;sliverp](https://github.com/sliverp).

- Update: report the primary malformed `openclaw.extensions` payload error without adding a duplicate missing-main diagnostic. ([#&#8203;86596](https://github.com/openclaw/openclaw/issues/86596)) Thanks [@&#8203;ferminquant](https://github.com/ferminquant).

- Control UI: keep host-local Markdown file paths inert while preserving app-relative links. ([#&#8203;86620](https://github.com/openclaw/openclaw/issues/86620)) Thanks [@&#8203;BryanTegomoh](https://github.com/BryanTegomoh).

- Gateway: dampen repeated unauthenticated device-required probes per URL while preserving explicit-auth and paired recovery paths. ([#&#8203;86575](https://github.com/openclaw/openclaw/issues/86575)) Thanks [@&#8203;ferminquant](https://github.com/ferminquant).

- IRC: store inbound channel routes with the canonical `channel:#name` target and join transient channel sends before writing. ([#&#8203;85906](https://github.com/openclaw/openclaw/issues/85906)) Thanks [@&#8203;Kailigithub](https://github.com/Kailigithub).

- Usage: surface unknown all-zero model pricing as missing cost entries instead of a confident `$0` total. ([#&#8203;85882](https://github.com/openclaw/openclaw/issues/85882)) Thanks [@&#8203;MichaelZelbel](https://github.com/MichaelZelbel).

- Agents/Codex: honor yolo app-server approval policy only for the full `never` plus `danger-full-access` case. ([#&#8203;85909](https://github.com/openclaw/openclaw/issues/85909)) Thanks [@&#8203;earlvanze](https://github.com/earlvanze).

- Gateway/Gmail: clear Gmail watcher renewal intervals on re-entry so hot reloads do not leak lifecycle timers. ([#&#8203;82947](https://github.com/openclaw/openclaw/issues/82947)) Thanks [@&#8203;SebTardif](https://github.com/SebTardif).

- Logging: exit cleanly on broken stdout/stderr pipes without masking existing failure exit codes. ([#&#8203;80059](https://github.com/openclaw/openclaw/issues/80059)) Thanks [@&#8203;pavelzak](https://github.com/pavelzak).

- Gateway/security: escape transcript metadata field names while extracting oversized session line prefixes. ([#&#8203;85934](https://github.com/openclaw/openclaw/issues/85934)) Thanks [@&#8203;SebTardif](https://github.com/SebTardif).

- Plugins/security: validate manifest model pattern regexes with the safe-regex compiler so unsafe patterns are ignored before matching. ([#&#8203;86046](https://github.com/openclaw/openclaw/issues/86046)) Thanks [@&#8203;SebTardif](https://github.com/SebTardif).

- Discord: route gateway metadata REST lookups through the configured Discord proxy so proxied accounts do not fall back to direct `discord.com` connections before opening the WebSocket. Fixes [#&#8203;80227](https://github.com/openclaw/openclaw/issues/80227). Thanks [@&#8203;Clivilwalker](https://github.com/Clivilwalker).

- Agents/media: hydrate current-turn image attachments from filename-derived MIME types so active vision can see generated or forwarded images whose source omitted an image content type. ([#&#8203;84812](https://github.com/openclaw/openclaw/issues/84812)) Thanks [@&#8203;marchpure](https://github.com/marchpure).

- Agents/fs: point workspace-only scratch-path guidance at in-workspace temp directories while keeping host-root writes rejected by the tool guard. ([#&#8203;86501](https://github.com/openclaw/openclaw/issues/86501)) Thanks [@&#8203;tianxiaochannel-oss88](https://github.com/tianxiaochannel-oss88).

- Agents/media: keep async cron media completions scoped to their run session while preserving direct delivery for stale generated-media success and failure notifications. ([#&#8203;86529](https://github.com/openclaw/openclaw/issues/86529)) Thanks [@&#8203;ai-hpc](https://github.com/ai-hpc).

- Gateway: emit plugin `session_end`/`session_start` hooks when `agent.send` rotates or replaces a session id, keeping hook lifecycle state aligned with `sessions.changed` notifications. Fixes [#&#8203;83507](https://github.com/openclaw/openclaw/issues/83507). ([#&#8203;85875](https://github.com/openclaw/openclaw/issues/85875)) Thanks [@&#8203;brokemac79](https://github.com/brokemac79).

- OpenShell/SSH: reject malformed generated exec commands before sandbox/session setup so unresolved workflow placeholders fail fast instead of reaching the remote shell. Fixes [#&#8203;72373](https://github.com/openclaw/openclaw/issues/72373). Thanks [@&#8203;brokemac79](https://github.com/brokemac79).

- Google: stop normalizing `gemini-3.1-flash-lite` to the retired preview endpoint and update Flash Lite alias guidance to the GA model id. Fixes [#&#8203;86151](https://github.com/openclaw/openclaw/issues/86151). ([#&#8203;86240](https://github.com/openclaw/openclaw/issues/86240)) Thanks [@&#8203;SebTardif](https://github.com/SebTardif).

- Installer: make Alpine apk installs cover Git, verify the Node runtime floor, try `nodejs-current`, and report Alpine version guidance when repositories only provide older Node packages.

- Agents/status: prefer the active Claude CLI OAuth auth label over an unused Anthropic env API-key label for equivalent runtime aliases. Fixes [#&#8203;80184](https://github.com/openclaw/openclaw/issues/80184). ([#&#8203;86570](https://github.com/openclaw/openclaw/issues/86570)) Thanks [@&#8203;brokemac79](https://github.com/brokemac79).

- Agents/media: send direct fallback for generated media still missing after an active requester wake fails. ([#&#8203;85489](https://github.com/openclaw/openclaw/issues/85489)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).

- Agents: derive overflow compaction budgets from provider-reported and synthetic over-budget token counts so confirmed context overflows compact before retrying. ([#&#8203;70473](https://github.com/openclaw/openclaw/issues/70473)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).

- Agents/Codex: recover Codex context-window prompt errors through overflow compaction and surface reset guidance when recovery is exhausted. ([#&#8203;85542](https://github.com/openclaw/openclaw/issues/85542)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).

- Agents/Codex: allow Codex app-server runs to bootstrap from `CODEX_API_KEY` or `OPENAI_API_KEY` when no Codex auth profile is configured.

- Agents/Codex: keep selected Codex runtime routing on OpenAI-Codex while preserving direct OpenAI API-key compaction fallback. ([#&#8203;86408](https://github.com/openclaw/openclaw/issues/86408)) Thanks [@&#8203;funmerlin](https://github.com/funmerlin) and [@&#8203;VACInc](https://github.com/VACInc).

- Agent transcript: include OpenClaw agent session logs when finding local transcript candidates.

- Crabbox: bootstrap raw AWS macOS shell commands wrapped in absolute `time` paths so RSS probes can run Node and pnpm on fresh macOS runners.

- Crabbox: bootstrap raw AWS macOS shell commands even when setup statements precede Node or pnpm usage.

- TUI/local: skip unnecessary secret resolution, gateway model catalog loading, bootstrap, and skill scans in explicit local-model runs so startup reaches the model request faster.

- Sessions/doctor: load large session stores without clone amplification during read-only doctor checks and reclaim stale `sessions.json.*.tmp` sidecars. Fixes [#&#8203;56827](https://github.com/openclaw/openclaw/issues/56827). Thanks [@&#8203;openperf](https://github.com/openperf).

- Tests: clean successful plugin gateway gauntlet isolated temp roots while keeping an explicit preservation switch for failed/debug runs.

- Plugins/perf: reuse derived plugin metadata snapshots for the lifetime of the process so reply-time skill setup no longer rescans plugin metadata on every turn.

- Discord/OpenAI voice: keep wake-name master consults using the current speaker context after ignored ambient transcripts and shorten the default capture silence grace.

- Doctor: skip redundant Gateway restart prompts when a recent supervisor restart leaves the Gateway healthy. Fixes [#&#8203;86518](https://github.com/openclaw/openclaw/issues/86518). ([#&#8203;86533](https://github.com/openclaw/openclaw/issues/86533)) Thanks [@&#8203;liaoyl830](https://github.com/liaoyl830).

- Cron: restore suspended cron lanes to the configured/default concurrency instead of falling back to one after quota or circuit-breaker auto-resume.

- Gateway: keep session-only Control UI tool-start mirrors flowing during diagnostic queue pressure instead of silently dropping non-terminal tool updates.

- Agents/memory: return optional not-found context for missing date-only daily memory reads instead of logging benign first-run `ENOENT` failures. Fixes [#&#8203;82928](https://github.com/openclaw/openclaw/issues/82928). Thanks [@&#8203;galiniliev](https://github.com/galiniliev).

- Discord: merge streamed text captions into following media block replies so captions and attachments send as one message. ([#&#8203;86487](https://github.com/openclaw/openclaw/issues/86487)) Thanks [@&#8203;neeravmakwana](https://github.com/neeravmakwana).

- Gateway: avoid sending duplicate tool-event frames to Control UI connections that are subscribed by both run and session.

- Discord/OpenAI voice: accept broader edge-position fuzzy wake-name transcripts while keeping ambient speech gated.

- Discord/OpenAI voice: accept longer leading wake-name mistranscripts such as "Open Club" for OpenClaw.

- Agents/OpenAI-compatible: stop ModelStudio-compatible chat requests before sending system/tool-only payloads that have no usable user or assistant turn. ([#&#8203;86177](https://github.com/openclaw/openclaw/issues/86177)) Thanks [@&#8203;TurboTheTurtle](https://github.com/TurboTheTurtle).

- Gateway/plugins: reuse plugin package realpath checks while building installed plugin indexes so startup avoids repeated filesystem resolution work.

- Kilo Gateway: send string `stop` sequences as arrays so Kilo accepts OpenAI-compatible chat completions. ([#&#8203;86461](https://github.com/openclaw/openclaw/issues/86461)) Thanks [@&#8203;SebTardif](https://github.com/SebTardif).

- Discord/OpenAI voice: accept leading fuzzy wake-name transcripts such as "Monty" or "Moti" for a Molty agent while keeping ambient speech gated.

- Media understanding: convert HEIC and HEIF images to JPEG before image description providers run so iPhone photos work in direct and configured image-description flows. ([#&#8203;86037](https://github.com/openclaw/openclaw/issues/86037))

- Agents: release embedded-attempt session locks from outer teardown so post-prompt exceptions cannot wedge later requests behind `SessionWriteLockTimeoutError`. Fixes [#&#8203;86014](https://github.com/openclaw/openclaw/issues/86014). Thanks [@&#8203;openperf](https://github.com/openperf).

- Discord/OpenAI voice: rotate Realtime sessions at provider max duration without logging the expected session-expiry event as an error.

- Sessions: skip metadata-only entries during QMD-slugified session lookup so one incomplete row does not block transcript hit resolution. ([#&#8203;86327](https://github.com/openclaw/openclaw/issues/86327)) Thanks [@&#8203;abnershang](https://github.com/abnershang).

- Agents/media: derive bundled plugin local-media trust from plugin tool metadata instead of importing the full plugin registry on subscription paths. ([#&#8203;84409](https://github.com/openclaw/openclaw/issues/84409)) Thanks [@&#8203;samzong](https://github.com/samzong).

- Image tool: keep config-backed custom-provider API keys usable for auto-discovered vision models, including deferred image-tool execution without env keys or auth profiles. ([#&#8203;85733](https://github.com/openclaw/openclaw/issues/85733))

- Memory/local embeddings: run local GGUF embeddings in an isolated worker sidecar and degrade to configured fallback or keyword search on worker failure so native embedding crashes do not take down the Gateway. ([#&#8203;85348](https://github.com/openclaw/openclaw/issues/85348)) Thanks [@&#8203;osolmaz](https://github.com/osolmaz).

- Gateway: clear the runtime config snapshot before `SIGUSR1` in-process restarts so config changes survive the next gateway loop. ([#&#8203;86388](https://github.com/openclaw/openclaw/issues/86388)) Thanks [@&#8203;XuZehan-iCenter](https://github.com/XuZehan-iCenter).

- Models: show OAuth delegation markers as configured `models.json` auth while keeping runtime route usability checks strict. ([#&#8203;86378](https://github.com/openclaw/openclaw/issues/86378)) Thanks [@&#8203;rohitjavvadi](https://github.com/rohitjavvadi).

- Cron: seed active scheduled and manual cron task rows with a progress summary so status surfaces do not look blank while jobs run. ([#&#8203;86313](https://github.com/openclaw/openclaw/issues/86313)) Thanks [@&#8203;ferminquant](https://github.com/ferminquant).

- Cron: preserve unsupported persisted cron payload rows during routine store writes while keeping those rows non-runnable. Fixes [#&#8203;84922](https://github.com/openclaw/openclaw/issues/84922). ([#&#8203;86415](https://github.com/openclaw/openclaw/issues/86415)) Thanks [@&#8203;IWhatsskill](https://github.com/IWhatsskill).

- Updater: exclude prerelease git tags from stable channel resolution so source updates do not check out newer alpha/rc/preview/canary tags. ([#&#8203;86260](https://github.com/openclaw/openclaw/issues/86260)) Thanks [@&#8203;stevenepalmer](https://github.com/stevenepalmer).

- Security/Audit: flag webhook `hooks.token` reuse of active Gateway password auth in `openclaw security audit` while keeping password-mode startup compatibility. ([#&#8203;84338](https://github.com/openclaw/openclaw/issues/84338)) Thanks [@&#8203;coygeek](https://github.com/coygeek).

- QQBot: derive the outbound reply watchdog from configured agent and provider timeouts so slow local model replies are not cut off at five minutes. Fixes [#&#8203;85267](https://github.com/openclaw/openclaw/issues/85267). ([#&#8203;85271](https://github.com/openclaw/openclaw/issues/85271)) Thanks [@&#8203;SymbolStar](https://github.com/SymbolStar).

- Agents/heartbeat: stop heartbeat turns after the first valid `heartbeat_respond` so repeated response loops do not burn tokens. ([#&#8203;86357](https://github.com/openclaw/openclaw/issues/86357)) Thanks [@&#8203;udaymanish6](https://github.com/udaymanish6).

- Tasks: keep retained lost tasks out of default status health counts, explain their cleanup window during maintenance, and prune lost task records after 24 hours instead of the general 7-day terminal retention.

- Memory-core: keep REM dreaming focused on live light-staged memories and mark staged entries as considered so old recall history no longer dominates fresh candidates. ([#&#8203;86302](https://github.com/openclaw/openclaw/issues/86302)) Thanks [@&#8203;SebTardif](https://github.com/SebTardif).

- Memory: abort sync instead of downgrading an existing semantic vector index to FTS-only when the configured embedding provider is temporarily unavailable. ([#&#8203;85704](https://github.com/openclaw/openclaw/issues/85704)) Thanks [@&#8203;yaaboo-gif](https://github.com/yaaboo-gif).

- Telegram: propagate forum topic names through the account-scoped topic cache for native command context and topic create/edit actions. ([#&#8203;86299](https://github.com/openclaw/openclaw/issues/86299)) Thanks [@&#8203;SebTardif](https://github.com/SebTardif).

- Slack: keep downloaded read-only files out of reply media so Slack file reads do not echo files back to the conversation. ([#&#8203;86318](https://github.com/openclaw/openclaw/issues/86318)) Thanks [@&#8203;neeravmakwana](https://github.com/neeravmakwana).

- Cron: accept leading-plus relative durations such as `+5m` for one-shot `--at` schedules. ([#&#8203;86341](https://github.com/openclaw/openclaw/issues/86341)) Thanks [@&#8203;mushuiyu886](https://github.com/mushuiyu886).

- Agents/media: preserve async-started media tool metadata so background generation starts no longer surface generic incomplete-turn warnings while replay stays unsafe. ([#&#8203;85933](https://github.com/openclaw/openclaw/issues/85933)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).

- Docker E2E: dedupe scheduler lane resources so npm/service package lanes are not over-counted and serialized unnecessarily.

- QA/diagnostics: add a collector-backed OpenTelemetry smoke lane, make the OTLP payload leak check scenario-aware, and keep source QA builds from failing on optional dependency imports resolved through pnpm's temp module path.

- Crabbox: bootstrap Git metadata for sparse remote changed gates so raw synced workspaces can run `pnpm check:changed` from the intended diff.

- xAI/LM Studio: avoid buffering ordinary bracketed or `final` prose until stream completion while watching for plain-text tool-call fallbacks.

- Doctor: warn and continue when the cron job store exists but cannot be read so later health checks still run. Fixes [#&#8203;86102](https://github.com/openclaw/openclaw/issues/86102). ([#&#8203;86384](https://github.com/openclaw/openclaw/issues/86384)) Thanks [@&#8203;1052326311](https://github.com/1052326311).

- Discord: suppress a bot's previous reply body and referenced media from prompt context when a user replies to that bot message, while keeping reply metadata for routing. ([#&#8203;86238](https://github.com/openclaw/openclaw/issues/86238)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).

- Discord: restore bare numeric channel IDs for outbound message-tool sends while keeping explicit DM targets unambiguous. ([#&#8203;86571](https://github.com/openclaw/openclaw/issues/86571)) Thanks [@&#8203;joshavant](https://github.com/joshavant).

- Docker E2E: avoid rebuilding the Control UI twice while preparing the shared OpenClaw package tarball for package-backed scenario runs.

- Tests: avoid rebuilding the Control UI twice during the installer Docker smoke now that `pnpm build` includes `ui:build`.

- Tests: give QA config mutation RPCs enough native Windows budget to finish gateway config writes and restart settle after hot scenario runs.

- Tests: keep the gateway restart-inflight QA scenario focused on restart recovery on native Windows by allowing expected embedded prompt handoff errors and using the Windows-safe timeout budget.

- QA-Lab: make the synthetic OpenAI provider honor generic `reply exactly:` directives after required kickoff reads so restart-recovery scenarios do not fall through to generic repo-summary prose.

- Gateway: abort active `agent` RPC runs during forced restart shutdown so stale in-process turns cannot keep writing a session after the Gateway lifecycle restarts.

- Crabbox: sync clean sparse worktrees through a temporary full checkout even when reusing an existing lease so tracked build-time files are not omitted.

- Build: route `scripts/ui.js` through the shared pnpm runner and keep Control UI chunking helpers in sparse-included source so native Windows Corepack builds can produce `dist/control-ui`.

- Tests: give the memory fallback QA scenario enough turn budget to exercise native Windows gateway runs instead of failing on the client timeout while the mock agent is still dispatching.

- Tests: collect QA gateway CPU/RSS metrics on native Windows and give the channel baseline enough turn budget to report slow gateway runs instead of timing out before proof.

- Install/update: bypass npm `min-release-age` policies with `--min-release-age=0` instead of `--before` so hosted installers keep working on npm versions that reject the combined config. ([#&#8203;84749](https://github.com/openclaw/openclaw/issues/84749)) Thanks [@&#8203;TeodoroRodrigo](https://github.com/TeodoroRodrigo).

- Diagnostics: reclaim wedged session lanes when stale active-run bookkeeping blocks queued work despite no forward progress. Fixes [#&#8203;85639](https://github.com/openclaw/openclaw/issues/85639). Thanks [@&#8203;openperf](https://github.com/openperf).

- WebChat: keep message-tool replies visible in the chat while still summarizing internal tool results for the model. Fixes [#&#8203;86347](https://github.com/openclaw/openclaw/issues/86347). Thanks [@&#8203;shakkernerd](https://github.com/shakkernerd).

- Gateway/perf: fail startup benchmark samples when the Gateway process exits before benchmark teardown, including signal deaths after readiness probes.

- Gateway/perf: fail restart benchmark samples when the Gateway exits before benchmark teardown, including clean exits and signal deaths after successful restart probes.

- Agents/tests: keep model catalog visibility on static selection helpers so catalog visibility checks avoid the broad model-selection barrel import.

- Agents/commitments: serialize commitment store load-modify-save writes so concurrent heartbeat and CLI updates no longer lose dismissal, sent, or attempt state. ([#&#8203;81153](https://github.com/openclaw/openclaw/issues/81153)) Thanks [@&#8203;ai-hpc](https://github.com/ai-hpc).

- xAI/LM Studio: promote plain-text tool-call fallbacks into structured tool calls and strip leaked internal tool syntax before user-facing delivery. ([#&#8203;86222](https://github.com/openclaw/openclaw/issues/86222)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).

- CLI: suppress benign self-update version-skew warnings during package post-update finalization.

- Gateway/perf: tighten restart and startup benchmark failure handling so long profiling runs, failed probes, and fresh Linux runners no longer produce false passing or `n/a` results.

- Checks: keep intentional Knip unused-file findings optional so full CI and sparse proof workspaces stay aligned.

- Docker: restore writable `~/.config` in runtime images. Fixes [#&#8203;85968](https://github.com/openclaw/openclaw/issues/85968). Thanks [@&#8203;hkoessler](https://github.com/hkoessler) and [@&#8203;Bartok9](https://github.com/Bartok9).

- Plugin SDK: keep legacy root diagnostic subscriptions connected when built plugin SDK aliases resolve diagnostic helpers through a separate module graph.

- Diagnostics: export alertable OTel and Prometheus signals for blocked tools, model failover, stale sessions, liveness warnings, oversized payloads, and webhook ingress while fixing shared OTLP endpoints with query strings.

- Tests: normalize macOS canonical temp paths in exec allowlists, fs-safe trash assertions, installed plugin matching, Telegram topic-name stores, and built ACPX MCP server expectations so native macOS proof runners cover the intended behavior.

- Codex/app-server: preserve message-tool-only source reply delivery mode on active runs so sub-agent completion wakeups can steer the active Codex turn instead of being rejected. ([#&#8203;86287](https://github.com/openclaw/openclaw/issues/86287)) Thanks [@&#8203;ferminquant](https://github.com/ferminquant).

- Tests: sample the Windows kitchen-sink RPC gateway directly and serialize RSS probes so native runs keep the memory guard active.

- Tests: normalize bundled plugin lifecycle probe paths and state-root lookup so native Windows release sweeps accept valid packaged plugin installs.

- Agents/Claude CLI: route live native Bash permission requests through OpenClaw exec policy so Claude turns no longer stall on `control_request`, and document that OpenClaw exec policy is authoritative. Fixes [#&#8203;80819](https://github.com/openclaw/openclaw/issues/80819). ([#&#8203;86330](https://github.com/openclaw/openclaw/issues/86330), from [#&#8203;81971](https://github.com/openclaw/openclaw/issues/81971)) Thanks [@&#8203;guthirry](https://github.com/guthirry) and [@&#8203;sallyom](https://github.com/sallyom).

- Security audit: warn when YOLO OpenClaw exec policy overrides a restrictive raw Claude `--permission-mode` for managed live sessions. ([#&#8203;86557](https://github.com/openclaw/openclaw/issues/86557)) Thanks [@&#8203;sallyom](https://github.com/sallyom).

- Config: keep benign legacy metadata write anomalies out of default doctor and config command output while preserving explicit anomaly logging for diagnostics.

- Codex: log when implicit app-server `never` approvals are promoted for OpenClaw tool policy, including whether the trigger was a `before_tool_call` hook or trusted tool policy.

- Codex harness: make subscription usage-limit errors without reset times explain that OpenClaw cannot determine the reset and point users to wait until Codex is available, use another Codex account, or switch to another configured model/provider. Thanks [@&#8203;amknight](https://github.com/amknight).

- Google Vertex: support production ADC modes such as Workload Identity Federation, service-account credentials, and metadata-server ADC for the native Vertex transport. ([#&#8203;83971](https://github.com/openclaw/openclaw/issues/83971)) Thanks [@&#8203;damianFelixPago](https://github.com/damianFelixPago).

- Telegram: route normal `[telegram][diag]` polling diagnostics through `runtime.log` while keeping non-diag warnings and persistence failures on `runtime.error`, so healthy polling startup no longer looks like an error. Fixes [#&#8203;82957](https://github.com/openclaw/openclaw/issues/82957). ([#&#8203;82958](https://github.com/openclaw/openclaw/issues/82958)) Thanks [@&#8203;galiniliev](https://github.com/galiniliev).

- Providers/Ollama: strip inline Kimi cloud reasoning prefixes from streamed and final visible replies while keeping ordinary Kimi answers append-only. ([#&#8203;86286](https://github.com/openclaw/openclaw/issues/86286)) Thanks [@&#8203;jason-allen-oneal](https://github.com/jason-allen-oneal).

- Gateway: require Talk secret authority before setup-code handoff can include Talk secrets. ([#&#8203;85690](https://github.com/openclaw/openclaw/issues/85690)) Thanks [@&#8203;ngutman](https://github.com/ngutman).

- Agents: keep fallback error reporting scoped to the active model candidate so stale prior-provider quota/auth text is not reported for later fallback attempts. ([#&#8203;86134](https://github.com/openclaw/openclaw/issues/86134)) Thanks [@&#8203;zhangguiping-xydt](https://github.com/zhangguiping-xydt).

- iMessage: dedupe watcher startup when `channels.imessage.accounts` lists both `default` and a named account that point at the same local Messages source, so the gateway no longer spawns two `imsg rpc` processes or doubles inbound replies; the dedupe is scoped to watcher startup, leaving duplicate accounts addressable for outbound sends, status, and capability listings, and `openclaw doctor` flags the redundant account with a rebinding hint. Fixes [#&#8203;65141](https://github.com/openclaw/openclaw/issues/65141). ([#&#8203;86705](https://github.com/openclaw/openclaw/issues/86705)) Thanks [@&#8203;swang430](https://github.com/swang430).

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these updates again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xMDEuMSIsInVwZGF0ZWRJblZlciI6IjQzLjEwMS4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL3BhdGNoIl19-->

Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/682
augusteo added a commit to getboon/openclaw that referenced this pull request May 29, 2026
…col/test churn

- wrappedStreamFn: restructure provider-error-preservation without a throw inside
  finally (oxlint no-unsafe-finally). Same semantics: always reacquire; prefer the
  original stream error over a reacquire takeover error; surface reacquire error
  only when the stream succeeded.
- Revert src/gateway/server-methods/agent.test.ts + GatewayModels.swift to the 5.18
  baseline: the openclaw#85764 cherry-pick conflict-resolution had pulled in openclaw#85256-era
  internal-session-effect tests + protocol fields whose implementation isn't in this
  backport, breaking checks-node-agentic-gateway-methods + checks-fast-bundled-protocol.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
augusteo added a commit to getboon/openclaw that referenced this pull request May 29, 2026
* fix(agents): skip fallback for session coordination errors

Preserve provider fallback metadata when session coordination errors are nested under provider failures.

Co-authored-by: luyao618 <364939526@qq.com>
(cherry picked from commit 6a5a135)

* fix(agents): tolerate in-process session writes during prompt release (openclaw#84250)

Merged via squash.

Prepared head SHA: 33f88fe
Co-authored-by: tianxiaochannel-oss88 <272340815+tianxiaochannel-oss88@users.noreply.github.com>
Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>
Reviewed-by: @jalehman

(cherry picked from commit 1b77145)

* fix(agents): bound embedded compaction write locks

Fixes the embedded attempt session write-lock watchdog so the fallback max hold time follows the resolved compaction timeout plus the existing lock grace window, instead of inheriting the full run timeout.

Adds regression coverage for the helper and settled-compaction lock lifecycle, plus a changelog entry thanking @luoyanglang.

Verification:
- `pnpm test src/agents/session-write-lock.test.ts src/agents/pi-embedded-runner/run/attempt.test.ts src/agents/pi-embedded-runner/run/attempt.session-lock.test.ts`
- `pnpm check:changed` via Blacksmith Testbox `tbx_01ks8b6vn8se5cg1dfn3te3g47` / https://github.com/openclaw/openclaw/actions/runs/26301988670
- Autoreview clean: `/Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --mode branch --base origin/main`
- PR CI green on `79e8c5f1a637981d263c0268bf5666967ff4e778`: https://github.com/openclaw/openclaw/actions/runs/26302152844 and https://github.com/openclaw/openclaw/actions/runs/26302152798

Co-authored-by: luoyanglang <hanwanlonga@gmail.com>
(cherry picked from commit 46de078)

* fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition (openclaw#85764)

* fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition

- Adds optional maxHoldMs parameter to inspectLockPayload
- Inspect now marks locks as stale when held longer than maxHoldMs
- Passes maxHoldMs through inspectLockPayloadForSession
- acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs

This ensures that when a live process holds a lock for longer than
maxHoldMs (default 5min), other processes can reclaim it during
acquisition — matching the watchdog's existing enforcement.

Previously shouldReclaim only used staleMs (30min default), meaning
a lock held for 10+ minutes by a live PID would never be reclaimable,
causing 60s timeout failures and gateway freezes.

Closes openclaw#85762

* fix(session-lock): add dead-PID fast-path before retry loop

Adds a fast-path check at the top of acquireSessionWriteLock:
if the lock file's owner PID is dead, remove it immediately
before entering the retry loop. This saves up to timeoutMs (60s)
of futile waiting when the previous lock holder has died.

The shouldReclaim callback already handles this case, but only
iteratively through the retry loop. The fast-path eliminates
that unnecessary delay.

* fix(session-lock): enforce max hold during acquisition

* fix(session-lock): revalidate max hold safely

* fix(session-lock): honor holder max-hold policy

* fix(session-lock): keep cleanup from reclaiming live holders

* fix(session-lock): remove stale locks only when unchanged

* fix(session-lock): skip self-held max-hold reclaim

* fix(ci): refresh gateway protocol checks

---------

Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
(cherry picked from commit a1eb765)

* fix(embedded-runner): preserve provider errors on cleanup takeover (openclaw#84321)

Summary:
- The PR preserves provider-facing embedded-runner prompt errors when cleanup detects session takeover, keeps the takeover signal fatal for fallback, and adds focused regressions.
- PR surface: Source +52, Tests +92. Total +144 across 5 files.
- Reproducibility: yes. Source inspection shows current main can let cleanup takeover replace a prior prompt/p ... rror and can normalize a provider-looking takeover wrapper before fallback sees it as coordination failure.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(embedded-runner): preserve takeover during fallback
- PR branch already contained follow-up commit before automerge: fix(clawsweeper): address review for automerge-openclaw-openclaw-8405…

Validation:
- ClawSweeper review passed for head 050c779.
- Required merge gates passed before the squash merge.

Prepared head SHA: 050c779
Review: openclaw#84321 (comment)

Co-authored-by: abnershang <abner.shang@gmail.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
(cherry picked from commit 7fbca96)

* fix(agents): release embedded-attempt session lock on every exit path (openclaw#86427)

* fix(agents): release embedded-attempt session lock on every exit path

The embedded run controller acquires its session write lock eagerly at
creation and released it only inside the post-run cleanup block. An
exception thrown in post-prompt processing skipped that block, so the lock
leaked to the live gateway process until the watchdog reclaimed it and
later requests to the session failed with SessionWriteLockTimeoutError.

Add an idempotent dispose() to the lock controller and call it from the
run's outer finally so the eagerly-held lock is released on every exit
path. Normal/aborted/timed-out runs still hand the lock to
acquireForCleanup first, so dispose() is a no-op then (no double release).

Fixes openclaw#86014

* fix: keep session lock teardown comment lean

* docs(changelog): note embedded session lock fix

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
(cherry picked from commit 32ddfc2)

* fix(agents): fence yield abort lock release

(cherry picked from commit 0fe7479)

* fix(agents): memoize session lock owner args

Memoize owner process argv lookups per PID during `cleanStaleLockFiles`, and yield between lock entries so startup cleanup does not monopolize the event loop while inspecting many session locks.

This keeps lock classification semantics unchanged while avoiding repeated synchronous process-args reads for lock clusters owned by the same PID, especially the Windows PowerShell path.

Fixes openclaw#86509.

Verification:
- `git diff --check origin/main...HEAD`
- focused TSX harness against the current-main merge result: `session-lock memo regression harness passed`

Thanks @openperf.

Co-authored-by: openperf <16864032@qq.com>
(cherry picked from commit c430fcd)

* fix(diagnostics): recover orphaned session activity

Recover idle queued sessions whose diagnostic activity retained stale ownerless model or tool calls by classifying them as recoverable session.stuck after the usual recovery gates. Yield the event loop before stale session-lock process inspection so sync process lookup cannot monopolize lock contention paths.

Docs now describe the widened session.stuck telemetry contract for recoverable stale bookkeeping, including ownerless activity. Thanks @samuelsoaress.

Refs openclaw#84903.

Co-authored-by: samuelsoaress <samuelsoares177778@gmail.com>
(cherry picked from commit 286964c)

* [FORK][openclaw#86584] gate owned-write publish on pre-append fingerprint (fixes openclaw#86572)

Carries unmerged upstream PR openclaw#86584 (HEAD d79a3b4) onto the boon 5.18 base
as the same-lane EmbeddedAttemptSessionTakeoverError fence fix for long cron
turns. Fails closed: an external mutation before pi's append fails the trust
gate and still trips the fence (verified by the PR's 303-line test suite incl.
the mixed-interleave negative test).

Backfills base symbols openclaw#86584 assumes (introduced upstream between 5.18 and the
PR base, not carried by the 9 merged race-fix picks):
- session-lock.ts: MAX_BENIGN_SESSION_FENCE_{ADVANCE,REWRITE,REWRITE_RESULT}_BYTES,
  MAX_SAFE_FILE_OFFSET, TRANSCRIPT_ONLY_OPENCLAW_ASSISTANT_MODELS,
  SessionFileFenceSnapshot type, fenceSnapshot state var, ActiveWriteLockState
  type + activeWriteLock store fix (reuse nested writes via {active:true}),
  node:util + string-normalization imports.
- transcript-append.ts: wrap appendSessionTranscriptMessage in
  runWithOwnedSessionTranscriptWriteLock so low-level appends acquire the
  owned-context lock.
- test import fixes (appendSessionTranscriptMessage, withOwned/bindOwned, __testing).

Drop when upstream merges openclaw#86584.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* [FORK][openclaw#86584] wire owned-transcript-write context + typecheck cleanup

CRITICAL: wrap promptActiveSession in withOwnedSessionTranscriptWrites and bind
onBlockReply/onBlockReplyFlush to the owned context in attempt.ts. Without this,
pi's own transcript appends during a prompt are NOT recorded as owned, so the
fence trips on them (the exact takeover the backport is meant to prevent). This
wiring is an intermediate-base feature (between 5.18 and openclaw#84250's base) the merged
picks didn't carry. Tests passed before only because they set the context manually.

Also: add releaseHeldLockForAbort to the controller type; drop incidental non-fence
suppressAssistantErrorPersistence passes; remove dead async benign-rewrite cluster
(sessionFence{Advance,Rewrite}IsBenign + readAppendedSessionFileText +
lineMatchesLinearTranscriptMigration + helpers) — our openclaw#84250-based assertSessionFileFence
uses the sync owned-write path, so the async benign-detection variants are unreachable.
tsgo core: 0 errors. 384 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* [FORK][openclaw#86584] address codex review: prefix-validate benign advance + preserve provider error

Finding 2 (masking gap, P2): sessionFenceAdvanceIsBenignSync only validated the
APPENDED bytes, so a writer that rewrote the existing prefix AND appended a benign
delivery-mirror/gateway-injected line could be laundered as an owned advance —
masking a genuine external takeover (silent message loss). Now fail closed unless
the current prefix is byte-identical to the trusted readSessionFileFenceSnapshot
text (readSessionFilePrefixSync); absent snapshot text => not benign.

Finding 1 (provider-error masking, P2): wrappedStreamFn's finally let a
reacquireAfterPrompt() takeover error mask the original provider error when the
stream itself threw. Now only surface the reacquire error when the stream
succeeded; otherwise preserve the original failure.

tsgo core: 0 errors. 384 tests pass (benign-advance acceptance + external-mutation
rejection both green).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore(release): 2026.5.18-boon.1 — session-takeover hardening (boon fleet build)

Version bump + CHANGELOG for the fork build. Also fixes a backport test-import
gap: attempt.test.ts referenced `attemptTesting` (the __testing export) without
importing it. Full project typecheck (tsgo -b tsconfig.projects.json): 0 errors.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(ci): no-unsafe-finally in wrappedStreamFn + drop collateral protocol/test churn

- wrappedStreamFn: restructure provider-error-preservation without a throw inside
  finally (oxlint no-unsafe-finally). Same semantics: always reacquire; prefer the
  original stream error over a reacquire takeover error; surface reacquire error
  only when the stream succeeded.
- Revert src/gateway/server-methods/agent.test.ts + GatewayModels.swift to the 5.18
  baseline: the openclaw#85764 cherry-pick conflict-resolution had pulled in openclaw#85256-era
  internal-session-effect tests + protocol fields whose implementation isn't in this
  backport, breaking checks-node-agentic-gateway-methods + checks-fast-bundled-protocol.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: remove vestigial onAssistantErrorMessagePersisted option decls

Address cubic P2 review (PR #2): the option was declared on the guard
and guard-wrapper option types but never forwarded or invoked, so any
provided callback was silently ignored. The companion error-suppression
feature (suppressAssistantErrorPersistence + the agent-runner/followup
caller chain) is deliberately scoped OUT of this 5.18 backport, so the
decls were dead plumbing left over from a cherry-pick. Remove them to
keep the option surface honest; the load-bearing beforeMessagePersist
fence checkpoint (openclaw#86572) is retained.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Yao <364939526@qq.com>
Co-authored-by: xiaotian <tianxiaochannel@gmail.com>
Co-authored-by: 狼哥 <hanwanlonga@gmail.com>
Co-authored-by: njuboy <njuboy11@gmail.com>
Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: abnershang <abner.shang@gmail.com>
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
Co-authored-by: Chunyue Wang <80630709+openperf@users.noreply.github.com>
Co-authored-by: openperf <16864032@qq.com>
Co-authored-by: Samuel Soares da Silva <samuelsoares177778@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
…uisition (openclaw#85764)

* fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition

- Adds optional maxHoldMs parameter to inspectLockPayload
- Inspect now marks locks as stale when held longer than maxHoldMs
- Passes maxHoldMs through inspectLockPayloadForSession
- acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs

This ensures that when a live process holds a lock for longer than
maxHoldMs (default 5min), other processes can reclaim it during
acquisition — matching the watchdog's existing enforcement.

Previously shouldReclaim only used staleMs (30min default), meaning
a lock held for 10+ minutes by a live PID would never be reclaimable,
causing 60s timeout failures and gateway freezes.

Closes openclaw#85762

* fix(session-lock): add dead-PID fast-path before retry loop

Adds a fast-path check at the top of acquireSessionWriteLock:
if the lock file's owner PID is dead, remove it immediately
before entering the retry loop. This saves up to timeoutMs (60s)
of futile waiting when the previous lock holder has died.

The shouldReclaim callback already handles this case, but only
iteratively through the retry loop. The fast-path eliminates
that unnecessary delay.

* fix(session-lock): enforce max hold during acquisition

* fix(session-lock): revalidate max hold safely

* fix(session-lock): honor holder max-hold policy

* fix(session-lock): keep cleanup from reclaiming live holders

* fix(session-lock): remove stale locks only when unchanged

* fix(session-lock): skip self-held max-hold reclaim

* fix(ci): refresh gateway protocol checks

---------

Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
SYU8384 pushed a commit to SYU8384/openclaw that referenced this pull request Jun 3, 2026
…uisition (openclaw#85764)

* fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition

- Adds optional maxHoldMs parameter to inspectLockPayload
- Inspect now marks locks as stale when held longer than maxHoldMs
- Passes maxHoldMs through inspectLockPayloadForSession
- acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs

This ensures that when a live process holds a lock for longer than
maxHoldMs (default 5min), other processes can reclaim it during
acquisition — matching the watchdog's existing enforcement.

Previously shouldReclaim only used staleMs (30min default), meaning
a lock held for 10+ minutes by a live PID would never be reclaimable,
causing 60s timeout failures and gateway freezes.

Closes openclaw#85762

* fix(session-lock): add dead-PID fast-path before retry loop

Adds a fast-path check at the top of acquireSessionWriteLock:
if the lock file's owner PID is dead, remove it immediately
before entering the retry loop. This saves up to timeoutMs (60s)
of futile waiting when the previous lock holder has died.

The shouldReclaim callback already handles this case, but only
iteratively through the retry loop. The fast-path eliminates
that unnecessary delay.

* fix(session-lock): enforce max hold during acquisition

* fix(session-lock): revalidate max hold safely

* fix(session-lock): honor holder max-hold policy

* fix(session-lock): keep cleanup from reclaiming live holders

* fix(session-lock): remove stale locks only when unchanged

* fix(session-lock): skip self-held max-hold reclaim

* fix(ci): refresh gateway protocol checks

---------

Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
…uisition (openclaw#85764)

* fix(session-lock): enforce maxHoldMs in shouldReclaim during lock acquisition

- Adds optional maxHoldMs parameter to inspectLockPayload
- Inspect now marks locks as stale when held longer than maxHoldMs
- Passes maxHoldMs through inspectLockPayloadForSession
- acquireSessionWriteLock's shouldReclaim callback now passes maxHoldMs

This ensures that when a live process holds a lock for longer than
maxHoldMs (default 5min), other processes can reclaim it during
acquisition — matching the watchdog's existing enforcement.

Previously shouldReclaim only used staleMs (30min default), meaning
a lock held for 10+ minutes by a live PID would never be reclaimable,
causing 60s timeout failures and gateway freezes.

Closes openclaw#85762

* fix(session-lock): add dead-PID fast-path before retry loop

Adds a fast-path check at the top of acquireSessionWriteLock:
if the lock file's owner PID is dead, remove it immediately
before entering the retry loop. This saves up to timeoutMs (60s)
of futile waiting when the previous lock holder has died.

The shouldReclaim callback already handles this case, but only
iteratively through the retry loop. The fast-path eliminates
that unnecessary delay.

* fix(session-lock): enforce max hold during acquisition

* fix(session-lock): revalidate max hold safely

* fix(session-lock): honor holder max-hold policy

* fix(session-lock): keep cleanup from reclaiming live holders

* fix(session-lock): remove stale locks only when unchanged

* fix(session-lock): skip self-held max-hold reclaim

* fix(ci): refresh gateway protocol checks

---------

Co-authored-by: njuboy11 <njuboy11@users.noreply.github.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling gateway Gateway runtime merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. proof: override Maintainer override for the external PR real behavior proof gate. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. size: S status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Session write lock timeout freezes gateway when session file is large (>300KB)

2 participants