fix(command): retry claude-cli transcript probe to close flush race#81048
Conversation
|
Codex review: needs maintainer review before merge. Reviewed May 29, 2026, 1:21 PM ET / 17:21 UTC. Summary PR surface: Source +40, Tests +415. Total +455 across 11 files. Reproducibility: yes. The source path is clear from the current transcript probe and the PR discussion includes live missing-transcript logs; I did not rerun the live Claude CLI/Telegram scenario in this read-only review. Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land this read-side Claude CLI transcript probe fix after maintainer review and green checks, while coordinating with #84234 for the write-side recovery path. Do we have a high-confidence way to reproduce the issue? Yes. The source path is clear from the current transcript probe and the PR discussion includes live missing-transcript logs; I did not rerun the live Claude CLI/Telegram scenario in this read-only review. Is this the best way to solve the issue? Yes. The deterministic cwd-based transcript path plus last-wins session id capture is a narrow read-side repair; the separate write-side binding/orphan handling should remain coordinated in the companion PR. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 5620229f9f92. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +40, Tests +415. Total +455 across 11 files. View PR surface stats
Acceptance criteria:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
Update: probe widened to 5 attempts / 3250ms after live evidence v1 + v2 were both insufficientThe single-retry shape from the PR body (one 150ms wait, gated on Evolution after PR submission
Field evidence driving the change (2026-05-13 EDT) Three live
All three hit the no-matching-jsonl branch, not the within-file branch. v1 didn't retry at all on that branch (it was gated on What changed in this commit
39/39 in Risk note Worst-case probe latency on the genuinely-no-session path is now 3250ms instead of 0ms. That path always feeds into transcript-missing recovery anyway, so the latency budget is still essentially free vs. losing context — but reviewers should weigh whether the "negative path is a prelude to recovery" assumption from the original PR still holds at this magnitude. If a tighter cap is preferred, a 3-step PR is left at its current status; this is a follow-up commit only. |
|
v3 still raced. Live amnesia event on this branch's running gateway at 12:37:58 EDT today That's a clean repro of the failure mode v3's narrative warned about, and v4: orchestrator owns the pathThe orchestrator already mints The encoding rule was live-verified against claude-cli 2.1.140: every non- On miss, v4 emits a structured Why this is the last call site changeIf v4 ever misses, it isn't "we picked the wrong filename" or "we probed Tests
Re-review progress:
|
v5 —
|
Overnight soak update — v5 holding cleanQuick follow-up with fresh data on top of yesterday's comment. Soak window: v5 deployed at Result: zero Main session has stayed bound to a single sessionId across heartbeats, scheduled compaction cycles, and the 04:00 daily rotation. Context continuity from the user's perspective: intact. Lesson learned on this PRIn hindsight I shipped the initial PR (commit
Going forward I'll hold a similar PR until I have at least 24h of clean soak before opening, rather than treating "passes locally + reproducer no longer fires" as ready. Apologies for the review-cycle churn. Happy to split v5 into its own PR for a cleaner review surface if maintainers prefer — or land as-is. |
952f92a to
e3df84a
Compare
… (v5) parseCliJsonl was first-wins: it captured the first session_id encountered and never updated. claude-cli emits ephemeral session_ids from SessionStart hooks before the canonical resumed session_id surfaces in the init event and the terminal result event. The orchestrator therefore bound to an ephemeral id whose JSONL never lands on the deterministic path probed by v4, triggering cli session reset reason=missing-transcript on every turn that observed a rotation. Flip parseCliJsonl to last-wins to match the existing parseCliJson and createCliJsonlStreamingParser semantics. Adds a regression test that feeds an ephemeral id followed by a canonical id and asserts the canonical id is captured.
oxlint flagged the `as string` cast on warnSpy.mock.calls[0]?.[0] as unnecessary — the value is already typed as string for the assertion path.
e3df84a to
bf260ae
Compare
|
Landed via squash onto main.
Thanks @benjamin1492! |
* fix(exec): bind node auto-review commands Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com> Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com> * fix(exec): honor node runtime policy for auto-review Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com> Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com> * fix(exec): harden auto-review prompt boundaries Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com> Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com> * fix(exec): align release validation surfaces Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com> Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com> * fix(exec): align release validation checks Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com> Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com> * test(e2e): repair release docker smoke fixtures Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com> Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com> * fix(exec): resolve auto approvals as runtime Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com> Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com> * ci: relax native OpenAI live proof timing Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com> Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com> * fix(exec): include mode in doctor policy warnings Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com> Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com> * test(release): repair live matrix expectations Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com> Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com> * fix(tts): centralize directive number parsing * fix(provider): bound Vydra and Comfy media downloads * fix(discord): validate error code integers * fix(discord): reject unsafe rate limit headers * ci(release): make plugin publish retries idempotent * perf(agent): lazy load embedded agent cli path * fix(whatsapp): validate inbound timestamps * refactor: share agent harness loader helpers * fix(agents): cap unsafe retry-after delays * perf(agent): defer session resolver for scoped gateway turns * fix(msteams): ignore unsafe retry-after delays * refactor: share store writer queue * fix(slack): reject unsafe inbound timestamps * fix(discord): reject unsafe retry-after delays * fix(qa-matrix): cap fault proxy bodies * fix(discord): bound delivery retry delays * refactor: share cron state parsing * Delete changelog directory * fix(zalouser): reject unsafe inbound timestamps * fix(cli): avoid underscored gateway test export * fix(scripts): cap clawtributor avatar probes * fix(telegram): centralize safe thread id parsing * fix(googlechat): drop invalid inbound timestamps * fix(doctor): label auth health by agent (openclaw#85924) Merged via squash. Prepared head SHA: 8c179fc Co-authored-by: giodl73-repo <235387111+giodl73-repo@users.noreply.github.com> Co-authored-by: giodl73-repo <235387111+giodl73-repo@users.noreply.github.com> Reviewed-by: @giodl73-repo * fix(qqbot): validate token expiry lifetimes * fix(openai): validate codex oauth token lifetimes * refactor: share node pairing surface helpers * fix(anthropic): validate oauth token lifetimes * fix(scripts): cap memory FD repro RPC bodies * fix(github-copilot): validate device code lifetimes * fix(msteams): validate oauth token lifetimes * refactor: share cli help argv scan * fix(github-copilot): validate oauth expiry values * fix(scripts): cap realtime smoke responses * fix(chutes): validate oauth token lifetimes * fix(auto-reply): reuse cli sessions for room events * fix(auto-reply): keep room event cli sessions transient * fix(agent-core): reject invalid session timestamps * fix(scripts): cap Claude usage response reads * refactor: centralize skills subsystem * refactor: move skill lifecycle code into skills subsystem * fix: bound skill index cache invalidation * fix: preserve skill snapshot freshness * fix: preserve preloaded skill snapshot entries * refactor: move session skill loader into skills subsystem * fix: preserve empty skill filter short circuit * fix: align empty default skill filter behavior * fix: align skills branch with upstream tar verbose test * fix: drop stale system prompt override imports * refactor: centralize skills runtime paths * refactor: remove stale agents skills barrel * refactor: use direct skills imports * refactor: organize skills subsystem layout * fix: lint centralized skills subsystem * refactor: split skills index follow-up * refactor: centralize skills subsystem * fix: unblock skills centralization checks * fix: route moved skills tests through unit-fast * refactor: centralize skills runtime tests * refactor: share web secret target selection * refactor: centralize safe expiry parsing * fix(exec): normalize unsafe timeout values * fix: persist Copilot SDK session bindings Persist GitHub Copilot SDK session ids in the plugin-state SQLite store so separate OpenClaw process turns can resume the same Copilot-side session when the compatibility fingerprint still matches. The fingerprint covers provider/model/cwd, resolved agent id, resolved Copilot home, and auth identity. Plugin-state lookup/register/delete failures are non-fatal, stale rows are invalidated, and reset delete failures use an in-process tombstone so reset does not accidentally reuse a durable binding. Also routes the QQBot token POST through the plugin SDK SSRF guard with capture disabled for the secret-bearing request, preserving the current token lifetime validation from main. Verification: focused Copilot and QQBot Vitest suites, raw channel fetch guard, autoreview clean, Blacksmith Testbox pnpm check:changed tbx_01kst9fwjmsfzwaxqatszcbf40, live local Copilot two-turn smoke with the same SDK session id persisted in SQLite. Refs openclaw#88064 * fix(exec): cap node run timeouts * perf(agent): skip plugin validation for gateway dispatch * fix(scripts): cap firecrawl compare HTML reads * fix(xai): normalize unsafe oauth lifetimes * refactor: share e2e text file helpers * fix(google): normalize unsafe oauth expiry * fix(openai): normalize codex device lifetimes * refactor: reuse e2e text tail helper * test(xai): type device-code note mock * fix(minimax): reject unsafe oauth expiry * fix(ci): cap dependency guard error bodies * fix(google-meet): normalize oauth expiry * fix(command): stabilize claude-cli transcript resume (openclaw#81048) Fix claude-cli transcript resume so session-id rotation and transcript flush timing do not drop valid resume state. - Capture the latest claude-cli session_id from JSONL output. - Resolve Claude project transcript paths through the shared canonical project-dir resolver. - Probe transcript content from the actual CLI process cwd. - Thanks @benjamin1492! * refactor: share codex e2e install helpers * fix(feishu): bound streaming token expiry * fix(openshell): cap command timeout config * refactor: centralize timer-safe timeout bounds * refactor: share e2e websocket open helper * fix(minimax): guard oauth token fetches (openclaw#88088) * fix(feishu): normalize app registration poll timers * fix(google): reject unsafe vertex adc lifetimes * fix(scripts): cap npm packument reads * fix(auth): reject unsafe wham reset windows * refactor: share qa report arg parsing * fix(retry): cap unsafe retry delays * fix(sandbox): bound novnc observer token ttl * feat(workboard): add agent coordination tools Summary: - Add Workboard agent coordination tools for list/read/claim/heartbeat/release/comment/proof/unblock flows. - Store artifacts, claims, diagnostics, and notifications in the Workboard SQLite-backed plugin state; surface the new metadata through Gateway, Control UI, docs, and plugin manifest contracts. - Add scoped claim authorization, token redaction, stale diagnostic cleanup, atomic proof artifact writes, and generated i18n metadata. Verification: - pnpm test ui/src/i18n/test/translate.test.ts extensions/browser/src/cli/browser-cli-actions-input/register.element.test.ts extensions/workboard/src/store.test.ts extensions/workboard/src/gateway.test.ts extensions/workboard/src/tools.test.ts ui/src/ui/controllers/workboard.test.ts ui/src/ui/views/workboard.test.ts - pnpm ui:i18n:check - env -u OPENCLAW_TESTBOX pnpm check:changed - autoreview --mode local: clean - PR CI passed; Windows checkout failure rerun passed on attempt 2 * perf(gateway): reuse session maintenance config during turns * fix(node-host): cap timeout wrapper delays * fix(talk): cap fast context timeout delay * fix(e2e): harden kitchen sink probe body caps * refactor: share bounded response reader * fix(providers): cap model request timeout delays * fix(oauth): cap request abort timeout delays * test: speed up slow assertions * test: stabilize slow assertion timings * test: shard channel import guardrails * perf(sessions): patch single-entry store writes * refactor: share script bounded response helper * fix(codex): cap responses request timeout delays * fix(scripts): cap gh-read json bodies * fix(lmstudio): cap model fetch timeout delays * feat(ios): default to hosted push relay (openclaw#88096) Merged via squash. Prepared head SHA: 75f939a Co-authored-by: ngutman <1540134+ngutman@users.noreply.github.com> Co-authored-by: ngutman <1540134+ngutman@users.noreply.github.com> Reviewed-by: @ngutman * fix(minimax): cap tts timeout delays * build(plugins): externalize copilot runtime * refactor: share codex app server start context * test(file-transfer): remove stale tar fixture awaits * fix(runtime): centralize safe timer timeout resolution * refactor: share ui chat send wrapper * docs(plugins): clarify external plugin installs * fix: close native hook relay replacement race * fix(qa-lab): cap credential broker request timeouts * refactor: share e2e incremental line reader * test(ci): fix main test expectations (openclaw#88122) * fix(copilot): cap oauth request timeouts * fix(oauth): cap tls preflight timeout * build(plugins): externalize tokenjuice * docs(plugins): add external package readmes * perf: reuse gateway session and plugin metadata paths * fix(exec): bind node auto-review to prepared plans * fix(auth): cap GitHub Copilot OAuth timeouts * docs(skills): expand Discrawl archive workflow * fix(discord): cap request timeout signals * fix(agents): preserve rotated compaction session identity Fix `sessions.json` persistence after compaction transcript rotation. When the agent runtime rotates from the pre-compaction session transcript to the post-compaction transcript, post-run consumers now receive the effective OpenClaw session id and session file. Backend CLI session ids remain backend metadata and no longer overwrite the top-level OpenClaw session identity. Refs openclaw#88040. Thanks @1052326311. Verification: - `node scripts/run-vitest.mjs src/agents/agent-command.compaction-rotation.test.ts src/agents/agent-command.live-model-switch.test.ts src/agents/command/session-store.test.ts` - Autoreview clean - GitHub CI green on PR head `c3d3c77ddf675bbba0b9ba6681b030a2f69a898c` * fix: keep compaction timeout snapshots continuable * feat(ios): add talk tab realtime playback (openclaw#88105) Merged via squash. Prepared head SHA: f41112a Co-authored-by: ngutman <1540134+ngutman@users.noreply.github.com> Co-authored-by: ngutman <1540134+ngutman@users.noreply.github.com> Reviewed-by: @ngutman * fix(signal): cap container timeout timers * fix(agents): forward ACP spawn attachments Forward initial image/file attachments when spawning ACP subagents through the existing sessions_spawn attachment opt-in. Remove the PR-only acpEnabled config split so ACP uses the same attachment gate as other runtimes. Also fix the PR branch CI fallout: type the browser element CLI request mock and use Vitest env stubs in the Azure speech test to satisfy the changed-path security scan. Verification: - GitHub CI passed on f6ca26b. - Autoreview clean. - Crabbox AWS live OpenAI proof passed: cbx_a576d49493fe / run_081dcc6c6a1b. Thanks @zhangguiping-xydt. * refactor: share e2e bounded response reader * docs(browser): add Notte cloud browser to direct WebSocket CDP providers Notte exposes a CDP-compatible WebSocket gateway at wss://us-prod.notte.cc/sessions/connect?token=<NOTTE_API_KEY> that auto-creates a session on connect — the same shape OpenClaw's existing "Direct WebSocket CDP providers" section was generically framed for (per openclaw#31085). Real behaviour proof (against wss://us-prod.notte.cc/sessions/connect): $ openclaw browser --browser-profile notte open https://example.com opened: https://example.com/ tab: t4 id: 7FE04AC44931A6E1C799DE4ABF0DC807 A screenshot captured against the same session is a 1254x1111 PNG of the rendered example.com page. Playwright connectOverCDP flow against the same URL (today): connectOverCDP 695ms context.newCDPSession(page) 169ms session.send('Target.getTargetInfo') → targetId 87ms page.goto('https://example.com') 631ms total 1.8s AI-assisted (Claude Opus 4.7). codex review --base origin/main returned clean. See PR description for the full pre-flight checklist. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> * fix(zalo): cap api request timeouts * fix: stabilize codex supervisor session listing * fix(qa-matrix): cap substrate request timeouts * fix(xiaomi): cap tts request timeouts * refactor: share e2e mock http helpers * docs(skills): require grouped release changelogs * fix(zai): cap endpoint probe timeouts * fix(mattermost): cap dm retry timeouts * perf: reuse provider handles and strict tool schemas * feat: add core session goals (openclaw#87469) * feat: add core session goals * feat: polish session goals in tui * fix: resolve goal tool session stores * fix: keep get goal read-only * fix: migrate legacy goal session slots * fix: persist goal token accounting * fix: validate goal session rows * refactor: remove unshipped goal legacy handling * fix: handle goal commands in local tui * fix: satisfy goal tool display checks * fix: reset goal budget on overdue resume * feat: surface session goals across control surfaces * test: update gateway protocol test import * test: align goal fixture types with protocol * fix: scope selected global transcript usage fallback * fix: scope selected global web subscriptions * fix: preserve selected global agent during chat dispatch * fix: scope chat inject to selected global agents * test: fix timeout mock return types * fix(crestodian): cap probe timeouts * fix: keep live OpenClaw session locks during cleanup (openclaw#88129) Keep session lock cleanup from removing live OpenClaw-owned locks solely because they are old. Cleanup now reports age-only stale locks without deleting them, while still removing dead, orphaned, recycled, malformed-old, and non-OpenClaw-owned locks. Update doctor docs and regression coverage for the cleanup/repair contract. Refs openclaw#87779 * fix(agents): cap model scan timeouts * refactor: share script budget number parsing * fix(provider): cap operation timeouts * fix(usage): cap provider usage fetch timeouts * fix: bound default heartbeat run timeout (openclaw#88133) Fixes openclaw#87438. Bound unset heartbeat run timeouts so background heartbeat turns no longer inherit the built-in 48-hour interactive agent default. Timeout precedence is explicit heartbeat timeout, explicit global agent timeout, then heartbeat cadence capped at 600 seconds. Verification: - git diff --check - Testbox tbx_01kstna69zvznn4fq7zrqr04a1: corepack pnpm test src/infra/heartbeat-runner.model-override.test.ts -- --reporter=verbose passed 13 tests - Direct node --import tsx runtime probe verified 300s, 600s, 60s, and 45s timeout precedence cases - Autoreview clean Known CI state: - PR CI run 26661465248 has failures matching latest main CI run 26661386468 at a7820b2; failures are outside this six-file heartbeat/docs diff. * fix(signal): cap client request timeouts * fix(feishu): cap async helper timeouts * refactor: share script bounded response reader * fix: move compaction planning off the event loop Move compaction planning work to a bounded worker-thread path so large transcript planning no longer monopolizes the agent event loop. Extract pure planning helpers, sanitize worker inputs before structured clone, package the worker entrypoint, and keep synchronous fallback only for worker-unavailable cases. Fixes openclaw#86358. * fix(browser): cap control fetch timeouts * fix(ci): repair main checks * fix(browser): cap node runtime timeouts * fix(codex-supervisor): centralize session limit parsing * fix(discord): cap monitor helper timeouts * perf: reuse gateway runtime metadata * fix(acp): cap turn timeout timers * refactor: share media temp save wrapper * fix(tts): cap speech provider timeouts * fix(media): cap generation provider timeouts * fix ci mainline checks (openclaw#88137) * fix(infra): cap request body timeouts * ci: stabilize main checks * feat: add skills index * perf: avoid unnecessary skills index maps * refactor: share skill command exposure policy * perf: centralize skill status lookup * refactor: reuse shared skills prompt formatter * perf: reuse resolved skills allowlist * perf: speed up skills filtering * perf: prepare bundled skill allowlist once * perf: use set for bundled skill allowlist * test: preserve real skills status exports * test: share skills entry fixtures * test: remove duplicate skill fixture wrappers * test: complete skills status mock surface * fix(gateway-client): cap stop wait timeout * perf: prefer package-local bundled plugin artifacts * fix(openai): cap codex oauth preflight timeout * fix(supervisor): narrow stored session limit parsing * refactor: share diagnostics timeline span helpers * fix(ci): repair main checks * fix(ci): break skills loading cycle * test: fix main CI regressions * fix(apns): cap relay timeout * fix(infra): cap jsonl socket timeouts * fix(infra): cap shell env timeouts * test: stabilize remaining CI flakes * fix(apns): cap direct timeout paths * Add plugin manifest contract for SecretRef provider integrations (openclaw#82326) * secret-provider-integrations Signed-off-by: sallyom <somalley@redhat.com> * feat(secrets): configure plugin provider presets * secrets: use plugin-managed provider refs Signed-off-by: sallyom <somalley@redhat.com> * fix secretref auth profile service env * test secret provider integration e2e * fix secretref plugin config service env * fix secret provider preset schema alignment * stabilize secret provider service proof * validate secret provider plugin integrations * harden secret provider resolver paths * scope secret provider config validation * stabilize openai secret provider proof * fix secret provider metadata proof * stabilize config baseline proof * fix secret provider e2e lint --------- Signed-off-by: sallyom <somalley@redhat.com> Co-authored-by: joshavant <830519+joshavant@users.noreply.github.com> * fix(proxy): cap connect tunnel timeouts * fix: route media completions through requester agent (openclaw#88141) * fix(scripts): cap issue labeler response bodies * refactor: share media understanding post params * fix(infra): cap transport readiness timeouts * ci: reduce main workflow critical path * test(gateway): stabilize live helper shard * refactor: share native approval route gates Share native approval route gate helpers across mainstream channel approval runtimes and keep PR openclaw#87770 green on current main. * fix(channels): centralize stall watchdog timer bounds * perf: resolve native esm plugin sdk imports * test: stabilize infra state shard * fix(nostr): cap profile import relay timers * test(infra): stabilize main CI tests * test(infra): preserve script wrapper fixture * fix(web): cap guarded fetch timeout seconds * fix(zalouser): cap probe timeout timer * refactor: add shared sqlite state database Adds the shared SQLite state database base, moves plugin keyed state into it with doctor migration coverage, and keeps generated Kysely guardrails aligned. Proof: focused SQLite/plugin-state tests, db:kysely:check, lint:kysely, architecture/dependency guards, autoreview, and PR CI all clean. * fix(codex): recover app-server completion stalls Fix Codex app-server completion-stall recovery so replay-safe stdio completion-idle failures retry once, while progress/terminal turn-watch timeouts only surface timeout payloads. Also preserve post-tool completion guards for scoped native response deltas and stabilize the oversized CONNECT timeout regression test picked up from latest main. Co-authored-by: Kelaw - Keshav's Agent <keshavbotagent@gmail.com> * fix(ci): repair main normalization checks * fix(zalouser): cap qr login timeouts * fix(dev): cap Discord smoke response bodies * fix(agents): centralize terminal run outcome precedence (openclaw#88136) * fix(agents): centralize terminal run outcome precedence * docs(agents): explain terminal outcome precedence * docs(agents): note terminal outcome helper * fix(agents): preserve pending hard timeout over late completion * test(agents): align global session scoping expectation * Revert "test(agents): align global session scoping expectation" This reverts commit 9b4a0c3. * test(infra): stabilize CONNECT timeout cap test * fix(agents): prioritize hard timeout terminal evidence * fix(gateway): preserve pending hard timeout snapshots * ci: skip bundled dts in artifact build * fix(memory): cap qmd process timeouts * fix(ci): repair main lint gates * test(infra): avoid max fake-timer jumps (openclaw#88155) * fix(whatsapp): cap credential flush timeout * ci: satisfy build profile lint * refactor: share live transport scenario helpers * fix(telegram): cap polling lease wait timer * fix(release): avoid gh api for candidate reads * fix(release): harden candidate run status polling * fix(feishu): reopen retryable bot menu replay * fix(release): avoid gh api in beta smoke * fix(release): build beta smoke REST curl command * test(realtime): stabilize websocket timeout test * test: stabilize realtime websocket timeout * fix(telegram): centralize positive timer bounds * fix(providers): cap local service timers * refactor: share provider oauth runtime helpers * fix(openrouter): cap music stream timeout * fix(release): harden release ci summary lookup * fix(fal): cap video queue deadline * test(ci): stabilize tool search gateway timeout helper * fix(reply): hide ACP tool traces from Telegram Telegram's surface renders tool-call traces poorly compared to Discord's. Add a per-channel visibility isolation list (currently just `telegram`) so the dispatch-acp delivery coordinator drops tool/status payloads to those channels and rewrites error payloads to a sanitized message that points to local logs instead of leaking the trace. - New ACP_VISIBILITY_ISOLATED_CHANNELS set + helper prepareAcpPayloadForChannelVisibility - Coordinator picks the effective target channel (originating or direct) and skips delivery when the payload is a tool / status / error trace - 89 lines of test coverage in dispatch-acp.test.ts for the new path --------- Signed-off-by: sallyom <somalley@redhat.com> Co-authored-by: joshavant <830519+joshavant@users.noreply.github.com> Co-authored-by: Vincent Koc <25068+vincentkoc@users.noreply.github.com> Co-authored-by: jesse-merhi <79823012+jesse-merhi@users.noreply.github.com> Co-authored-by: Peter Steinberger <steipete@gmail.com> Co-authored-by: Vincent Koc <vincentkoc@ieee.org> Co-authored-by: Shadow <shadow@openclaw.ai> Co-authored-by: Gio Della-Libera <giodl73@gmail.com> Co-authored-by: giodl73-repo <235387111+giodl73-repo@users.noreply.github.com> Co-authored-by: Ayaan Zaidi <hi@obviy.us> Co-authored-by: Shakker <shakkerdroid@gmail.com> Co-authored-by: Peter Steinberger <peter@steipete.me> Co-authored-by: benjamin1492 <35176637+benjamin1492@users.noreply.github.com> Co-authored-by: Nimrod Gutman <nimrod.gutman@gmail.com> Co-authored-by: ngutman <1540134+ngutman@users.noreply.github.com> Co-authored-by: Dallin Romney <dallinromney@gmail.com> Co-authored-by: xin zhuang <65798732+1052326311@users.noreply.github.com> Co-authored-by: zhang-guiping <zhang.guiping@xydigit.com> Co-authored-by: Lucas Giordano <giordano3102lucas@gmail.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Sally O'Malley <somalley@redhat.com> Co-authored-by: Kevin Lin <kevin@dendron.so> Co-authored-by: keshavbotagent <keshavbotagent@gmail.com>
Fix claude-cli transcript resume so session-id rotation and transcript flush timing do not drop valid resume state. - Capture the latest claude-cli session_id from JSONL output. - Resolve Claude project transcript paths through the shared canonical project-dir resolver. - Probe transcript content from the actual CLI process cwd. - Thanks @benjamin1492!
…026.5.28) (#759) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [ghcr.io/openclaw/openclaw](https://openclaw.ai) ([source](https://github.com/openclaw/openclaw)) | patch | `2026.5.27` → `2026.5.28` | --- ### Release Notes <details> <summary>openclaw/openclaw (ghcr.io/openclaw/openclaw)</summary> ### [`v2026.5.28`](https://github.com/openclaw/openclaw/blob/HEAD/CHANGELOG.md#2026528) [Compare Source](openclaw/openclaw@v2026.5.27...v2026.5.28) ##### Highlights - Agent and Codex runtime recovery is steadier: subagents keep cwd/workspace separation, hook context stays prompt-local, session locks release on timeout abort while live OpenClaw locks survive cleanup, stale restart continuations are avoided, and Codex app-server/helper failures no longer tear down shared runtime state. ([#​87218](openclaw/openclaw#87218), [#​86875](openclaw/openclaw#86875), [#​87409](openclaw/openclaw#87409), [#​87399](openclaw/openclaw#87399), [#​87375](openclaw/openclaw#87375), [#​88129](openclaw/openclaw#88129)) - Channel delivery and session identity got safer across outbound plugin hooks, Matrix room ids, iMessage reactions/approvals, Slack final replies, Discord recovered tool warnings, runtime-config message actions, WhatsApp profile auth roots, Telegram polling, and Microsoft Teams service URL trust checks. ([#​73706](openclaw/openclaw#73706), [#​75670](openclaw/openclaw#75670), [#​87366](openclaw/openclaw#87366), [#​87451](openclaw/openclaw#87451), [#​87334](openclaw/openclaw#87334), [#​84535](openclaw/openclaw#84535), [#​82492](openclaw/openclaw#82492), [#​83304](openclaw/openclaw#83304), [#​87160](openclaw/openclaw#87160)) - Mobile and chat surfaces got a broader refresh: the iOS Pro UI, hosted push relay default, realtime Talk tab playback, Gateway chat transport, onboarding, Talk permissions, WebChat reconnect delivery, and session picker behavior now preserve more state across reconnects and empty searches. ([#​87367](openclaw/openclaw#87367), [#​87531](openclaw/openclaw#87531), [#​87682](openclaw/openclaw#87682), [#​88096](openclaw/openclaw#88096), [#​88105](openclaw/openclaw#88105)) Thanks [@​ngutman](https://github.com/ngutman) and [@​BunsDev](https://github.com/BunsDev). - Browser, channel, and automation inputs are stricter: Browser tool timeouts, viewport/tab indices, Gateway ports, cron retry handling, Discord component ids, schema array refs, Telegram callback pages, and channel progress callbacks now reject malformed values earlier and preserve the intended delivery context. ([#​82887](openclaw/openclaw#82887)) - Provider, media, and document coverage expands with Claude Opus 4.8, Fal Krea image schemas, NVIDIA featured models, MiniMax streaming music responses, encrypted PDF extraction, voice model catalogs, GitHub Copilot agent runtime support, and a Codex Supervisor plugin path for delegated Codex workflows. ([#​87845](openclaw/openclaw#87845), [#​87890](openclaw/openclaw#87890), [#​80775](openclaw/openclaw#80775), [#​84764](openclaw/openclaw#84764), [#​87751](openclaw/openclaw#87751), [#​87794](openclaw/openclaw#87794)) - CLI, auth, doctor, and provider paths fail faster and recover more clearly: malformed numeric/version options are rejected, workspace dotenv provider credentials are ignored, heartbeat defaults, OAuth/token lifetimes, and local service startup requests are bounded, agent auth health labels are clearer, legacy `api_key` auth profiles migrate to canonical form, and restart guidance is actionable. ([#​87398](openclaw/openclaw#87398), [#​86281](openclaw/openclaw#86281), [#​87361](openclaw/openclaw#87361), [#​88133](openclaw/openclaw#88133), [#​83655](openclaw/openclaw#83655), [#​87559](openclaw/openclaw#87559), [#​88088](openclaw/openclaw#88088), [#​85924](openclaw/openclaw#85924)) Thanks [@​vincentkoc](https://github.com/vincentkoc) and [@​giodl73-repo](https://github.com/giodl73-repo). - Plugin and Gateway hot paths do less repeated work while preserving cache correctness for install records, config JSON parsing, tool search catalogs, session stores, manifest model rows, auto-enabled plugin config, browser tokens, viewer assets, and release-split external plugin packages. ([#​86699](openclaw/openclaw#86699)) - Release, QA, and E2E validation now bound more log, artifact, harness, and cross-OS waits so failing lanes produce proof instead of hanging or false-greening. ##### Changes - Status: show active subagent details in status output. - Diffs: split the default language pack and expand default Diffs language coverage while keeping the host floor aligned. ([#​87370](openclaw/openclaw#87370), [#​87372](openclaw/openclaw#87372)) Thanks [@​RomneyDa](https://github.com/RomneyDa). - ClawHub: add plugin display names plus skill verification and trust surfaces. ([#​87354](openclaw/openclaw#87354), [#​86699](openclaw/openclaw#86699)) Thanks [@​thewilloftheshadow](https://github.com/thewilloftheshadow) and [@​Patrick-Erichsen](https://github.com/Patrick-Erichsen). - iOS: refresh the dev app with Pro Command, Chat, Agents, Settings, hosted push relay defaults, and realtime Talk playback wired to gateway sessions, diagnostics, chat, and realtime Talk. ([#​87367](openclaw/openclaw#87367), [#​88096](openclaw/openclaw#88096), [#​88105](openclaw/openclaw#88105)) Thanks [@​Solvely-Colin](https://github.com/Solvely-Colin) and [@​ngutman](https://github.com/ngutman). - Docs: clarify Codex computer-use setup, paste-token stdin auth setup, macOS gateway sleep troubleshooting, native Codex hook relay recovery, container model auth, install deployment cards, device-token admin gating, CLI setup flow compatibility, Notte cloud browser CDP setup, and backport targets. ([#​87313](openclaw/openclaw#87313), [#​63050](openclaw/openclaw#63050), [#​87685](openclaw/openclaw#87685)) Thanks [@​bdjben](https://github.com/bdjben), [@​liaoandi](https://github.com/liaoandi), and [@​thewilloftheshadow](https://github.com/thewilloftheshadow). - PDF/tools: use ClawPDF for PDF extraction, support encrypted PDF extraction, and surface MCP structured content in agent tool results. ([#​87670](openclaw/openclaw#87670), [#​87751](openclaw/openclaw#87751)) - Providers: add Claude Opus 4.8 support, Fal Krea image model schemas, NVIDIA featured model catalogs, MiniMax streaming music responses, and provider-backed voice model catalogs. ([#​87845](openclaw/openclaw#87845), [#​87890](openclaw/openclaw#87890), [#​80775](openclaw/openclaw#80775), [#​84764](openclaw/openclaw#84764), [#​87794](openclaw/openclaw#87794)) Thanks [@​eleqtrizit](https://github.com/eleqtrizit) and [@​vincentkoc](https://github.com/vincentkoc). - Codex/GitHub: add the GitHub Copilot agent runtime and the Codex Supervisor plugin package. - Plugins: externalize GitHub Copilot and Tokenjuice as official install-on-demand plugins with npm and ClawHub publish metadata. - Workboard: add agent coordination tools for tracking and handing off active agent work. - Discord: show commentary in progress drafts so live Discord runs expose useful in-progress context. ([#​85200](openclaw/openclaw#85200)) - Plugin SDK: add a reply payload sending hook for plugins that need to deliver channel-owned replies and flatten package types for SDK declarations. ([#​82823](openclaw/openclaw#82823), [#​87165](openclaw/openclaw#87165)) Thanks [@​piersonr](https://github.com/piersonr) and [@​RomneyDa](https://github.com/RomneyDa). - Policy: add policy comparison, ingress-channel conformance, and sandbox-posture conformance checks. ([#​85572](openclaw/openclaw#85572), [#​85744](openclaw/openclaw#85744), [#​86768](openclaw/openclaw#86768)) ##### Fixes - Agents: fall back to local config pruning when the optional `agents delete` Gateway probe cannot authenticate, so offline installs can still delete agents without removing shared workspaces. - Tighten phone-control mutation authorization \[AI]. ([#​87150](openclaw/openclaw#87150)) Thanks [@​pgondhi987](https://github.com/pgondhi987). - Clarify directive persistence authorization policy \[AI]. ([#​86369](openclaw/openclaw#86369)) Thanks [@​pgondhi987](https://github.com/pgondhi987). - Agents/Codex: keep spawned agent cwd/workspace state separated, forward ACP spawn attachments, keep hook context prompt-local, release session locks on timeout abort and runtime teardown without deleting live OpenClaw-owned locks during cleanup, avoid session event queue self-wait, clean up exec abort listeners, stream assistant deltas incrementally, recover raw missing-thread compaction failures, preserve rotated compaction session identity, keep compaction-timeout snapshots continuable, preserve shared app-server state across startup or helper failures, keep native hook relay alive across restarts and prune stale bridge files, close native hook relay replacement races, keep Claude live tool progress visible for watchdog recovery, suppress abandoned requester completion handoff, route workspace memory through tools, resolve Codex runtime models first, report quarantined dynamic tools, format `skills` command output, bind node auto-review to prepared plans, retry Claude CLI transcript probes, and bound compaction/steering retries. ([#​87218](openclaw/openclaw#87218), [#​86875](openclaw/openclaw#86875), [#​86123](openclaw/openclaw#86123), [#​88129](openclaw/openclaw#88129), [#​87399](openclaw/openclaw#87399), [#​87375](openclaw/openclaw#87375), [#​72574](openclaw/openclaw#72574), [#​87383](openclaw/openclaw#87383), [#​87400](openclaw/openclaw#87400), [#​83022](openclaw/openclaw#83022), [#​87671](openclaw/openclaw#87671), [#​87738](openclaw/openclaw#87738), [#​87747](openclaw/openclaw#87747), [#​87706](openclaw/openclaw#87706), [#​87546](openclaw/openclaw#87546), [#​87541](openclaw/openclaw#87541), [#​81048](openclaw/openclaw#81048)) Thanks [@​mbelinky](https://github.com/mbelinky), [@​Alix-007](https://github.com/Alix-007), [@​luoyanglang](https://github.com/luoyanglang), [@​yetval](https://github.com/yetval), [@​sjf](https://github.com/sjf), [@​joshavant](https://github.com/joshavant), [@​benjamin1492](https://github.com/benjamin1492), [@​c19354837](https://github.com/c19354837), [@​fuller-stack-dev](https://github.com/fuller-stack-dev), [@​pfrederiksen](https://github.com/pfrederiksen), and [@​dodge1218](https://github.com/dodge1218). - Codex Supervisor: keep real-home app-server MCP session listing on the loaded state path, bound stored history scans, and close WebSocket probes cleanly. - Channels: thread canonical session keys into outbound hooks, preserve Matrix room-id case, keep fallback tool warnings mention-inert, retain delivered Slack final replies during late cleanup, continue iMessage polling after denied reactions, suppress duplicate native exec approvals, resolve Gateway message actions against the active runtime config, preserve Telegram SecretRef prompt config and polling keepalives, preserve WhatsApp profile auth roots, QR display, document filenames, and plugin hook config, suppress Discord recovered tool warnings, preserve the Discord voice outbound helper, cap Discord/Signal/Zalo channel request and container timeouts, and block untrusted Teams service URLs while keeping TeamsSDK patterns aligned. ([#​73706](openclaw/openclaw#73706), [#​75670](openclaw/openclaw#75670), [#​87366](openclaw/openclaw#87366), [#​87451](openclaw/openclaw#87451), [#​87465](openclaw/openclaw#87465), [#​87334](openclaw/openclaw#87334), [#​84535](openclaw/openclaw#84535), [#​76262](openclaw/openclaw#76262), [#​83304](openclaw/openclaw#83304), [#​82492](openclaw/openclaw#82492), [#​87581](openclaw/openclaw#87581), [#​77114](openclaw/openclaw#77114), [#​86426](openclaw/openclaw#86426), [#​85529](openclaw/openclaw#85529), [#​87160](openclaw/openclaw#87160)) Thanks [@​zeroaltitude](https://github.com/zeroaltitude), [@​lukeboyett](https://github.com/lukeboyett), [@​jarvis-mns1](https://github.com/jarvis-mns1), [@​xiaotian](https://github.com/xiaotian), [@​funmerlin](https://github.com/funmerlin), [@​joshavant](https://github.com/joshavant), [@​eleqtrizit](https://github.com/eleqtrizit), [@​heyitsaamir](https://github.com/heyitsaamir), [@​amittell](https://github.com/amittell), [@​lidge-jun](https://github.com/lidge-jun), [@​liorb-mountapps](https://github.com/liorb-mountapps), [@​masatohoshino](https://github.com/masatohoshino), [@​bladin](https://github.com/bladin), and [@​giodl73-repo](https://github.com/giodl73-repo). - CLI/auth/doctor/providers: reject malformed numeric/timeout/subcommand-version inputs, ignore workspace dotenv provider credentials, wait for respawn child shutdown, bound heartbeat defaults plus Codex, GitHub Copilot, OpenAI, Anthropic, Google, Feishu, LM Studio, MiniMax, Xiaomi TTS, and local-provider OAuth/token/model requests, harden Codex auth probes, label auth health by agent, preserve explicit agentRuntime pins during Codex model migration, warm provider auth off the main thread, honor Codex response timeouts, stop migrating current Claude Haiku 4.5 profiles to Sonnet, bound local service startup, resolve GPT-5.5 without cached catalog, migrate legacy memory auto-provider config, rewrite non-canonical `api_key` auth profiles, and make doctor restart follow-ups actionable. ([#​87398](openclaw/openclaw#87398), [#​86281](openclaw/openclaw#86281), [#​87361](openclaw/openclaw#87361), [#​88133](openclaw/openclaw#88133), [#​83655](openclaw/openclaw#83655), [#​87559](openclaw/openclaw#87559), [#​87719](openclaw/openclaw#87719), [#​88088](openclaw/openclaw#88088), [#​85924](openclaw/openclaw#85924), [#​84362](openclaw/openclaw#84362)) Thanks [@​Patrick-Erichsen](https://github.com/Patrick-Erichsen), [@​samzong](https://github.com/samzong), [@​giodl73-repo](https://github.com/giodl73-repo), [@​alkor2000](https://github.com/alkor2000), [@​mmaps](https://github.com/mmaps), [@​nxmxbbd](https://github.com/nxmxbbd), and [@​vincentkoc](https://github.com/vincentkoc). - Gateway/security/session state: expire browser tokens after auth rotation, scope assistant idempotency dedupe, drain probe client closes, avoid stale restart continuation reuse, preserve retry-after fallbacks and stale rate-limit cooldown probes, bound webchat image and artifact transcript scans, include seconds in inbound metadata timestamps, clear completed session active runs, clear stale chat stream buffers, and evict current plugin-state namespaces at row caps. ([#​87810](openclaw/openclaw#87810), [#​87833](openclaw/openclaw#87833), [#​75089](openclaw/openclaw#75089)) Thanks [@​joshavant](https://github.com/joshavant) and [@​litang9](https://github.com/litang9). - Config/parsing/network: reject partial numeric parsing, parse provider/Discord retry headers and dates strictly, honor IPv6 and bare IPv6 `no_proxy` entries, preserve empty plugin allowlists, canonicalize secret target array indexes, and reject malformed media content lengths, inspected TCP ports, marketplace content lengths, cron epochs, sandbox stat fields, unsafe duration values, empty config path segments, noncanonical schema array refs, unsafe Telegram callback pages, and invalid Teams attachment-fetch DNS targets. ([#​87883](openclaw/openclaw#87883)) Thanks [@​zhangguiping-xydt](https://github.com/zhangguiping-xydt). - Browser/input hardening: reject invalid tab indexes, excessive viewport resizes, explicit zero CDP ports, malformed geolocation options, unsafe screenshot or permission-grant timeouts, loose response-body limits, invalid cookie expiries, and non-finite Browser tool delays/timeouts. - Cron/automation: retry recurring jobs after transient model rate limits before waiting for the next scheduled slot, and preflight model fallbacks before skipping scheduled work. ([#​82887](openclaw/openclaw#82887)) Thanks [@​chen-zhang-cs-code](https://github.com/chen-zhang-cs-code). - Auto-reply/directives: respect provider and relayed channel metadata during directive persistence so channel-originated decisions keep their intended context. ([#​87683](openclaw/openclaw#87683)) - WhatsApp: resolve the auth directory from the active profile so profile-scoped WhatsApp installs do not drift to the wrong credential root. ([#​82492](openclaw/openclaw#82492)) Thanks [@​lidge-jun](https://github.com/lidge-jun). - Gateway/session state: clear completed session active runs, avoid cold-loading providers for MCP inventory, cache single-session child indexes, cap handshake timers, and bound preauth, auth-guard, media, transcript, readiness, and port options. - Channels/replies: preserve channel-owned progress callbacks when verbose output is off, keep group-room progress suppression intact, prefer external session delivery context, escape Discord component id delimiters, force final TUI chat repaints, show Slack reasoning previews, and normalize Discord/Matrix/Mattermost channel numeric options. ([#​87476](openclaw/openclaw#87476), [#​87423](openclaw/openclaw#87423)) - Agents/tool args: harden smart-quoted argument repair for edit arrays and exact escaped arguments so model-produced tool calls recover without corrupting valid input. ([#​86611](openclaw/openclaw#86611)) Thanks [@​ferminquant](https://github.com/ferminquant). - Providers/agents: preserve seeded Anthropic signatures, preserve signed thinking payloads, concatenate signature-delta chunks, preserve DeepSeek `reasoning_content` replay across tier suffixes, apply OpenRouter strict9 ids to Mistral routes, promote Ollama plain-text tool calls, load NVIDIA featured model catalogs, stream MiniMax music generation responses, and recover empty preflight compaction. ([#​87593](openclaw/openclaw#87593), [#​87493](openclaw/openclaw#87493), [#​80775](openclaw/openclaw#80775), [#​84764](openclaw/openclaw#84764)) Thanks [@​Pluviobyte](https://github.com/Pluviobyte) and [@​eleqtrizit](https://github.com/eleqtrizit). - Media/images: skip CLI image cache refs when resolving generated images, allow trusted generated HTML attachments, and bound generated video downloads so stale refs and slow providers fail cleanly. ([#​87523](openclaw/openclaw#87523), [#​87982](openclaw/openclaw#87982)) - File transfer: handle late tar stdin pipe errors after archive validation or unpacking has already settled. - Performance: trust install-record caches between reloads, prefer native JSON parsing, reuse unchanged tool-search catalogs, reuse gateway session and plugin metadata paths, skip unchanged store serialization, patch single-entry session writes, add precomputed session patch writers, reduce store clone allocations, cache manifest model catalog rows and auto-enabled plugin config, avoid full session snapshots for entry reads, defer configured Slack full startup, prefer bundled plugin dist entries, and slim current metadata identity caches. ([#​87760](openclaw/openclaw#87760)) - Docker/release/QA: package runtime workspace templates, stream cross-OS served artifacts, preserve sparse Crabbox run artifacts, isolate npm plugin installs per package, reject incompatible package plugin API installs, drop the leftover root Sharp dependency from package manifests after the Rastermill migration, bound OpenClaw instance logs, plugin gauntlet relay logs, MCP channel buffers, kitchen-sink scans, agent-turn assertions, QA-Lab credential broker calls, QA Matrix substrate requests, and release scenario logs, and keep release/google live guards current. ([#​87647](openclaw/openclaw#87647), [#​87477](openclaw/openclaw#87477)) Thanks [@​rohitjavvadi](https://github.com/rohitjavvadi) and [@​vincentkoc](https://github.com/vincentkoc). - Release/CI: bound manual git fetches, ClawHub verifier responses, ClawHub owner metadata, dependency-guard error bodies, Parallels limits, startup/test/memory budget parsing, and diffs viewer build warnings so release lanes fail with useful proof instead of hanging. ([#​87839](openclaw/openclaw#87839)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xMDEuMSIsInVwZGF0ZWRJblZlciI6IjQzLjEwMS4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL3BhdGNoIl19--> Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/759
Fix claude-cli transcript resume so session-id rotation and transcript flush timing do not drop valid resume state. - Capture the latest claude-cli session_id from JSONL output. - Resolve Claude project transcript paths through the shared canonical project-dir resolver. - Probe transcript content from the actual CLI process cwd. - Thanks @benjamin1492!
Fix claude-cli transcript resume so session-id rotation and transcript flush timing do not drop valid resume state. - Capture the latest claude-cli session_id from JSONL output. - Resolve Claude project transcript paths through the shared canonical project-dir resolver. - Probe transcript content from the actual CLI process cwd. - Thanks @benjamin1492!
Summary
Closes #81042.
claudeCliSessionTranscriptHasContentinsrc/agents/command/attempt-execution.helpers.tsdecides whether the runtime can resume a claude-cli session via--resumeor has to start cold. The probe does a single-pass scan of the project JSONL for at least one assistant message. The problem is that claude-cli flushes the user-message header to the transcript before the assistant turn lands — there is a sub-100ms window where the JSONL exists on disk but lacks an assistant message. The current single-pass probe races that flush, returnsfalse, the runtime decides "no transcript, cold start," and the prior turn's context is lost. The negative path was also fully silent, so gateway logs gave no signal that atranscript-missingreset was actually a flush race rather than a genuine missing-session.Two small changes:
fileExistsandhasAssistant. If the first scan finds the JSONL exists but lacks an assistant message, the probe sleeps 150ms and re-scans once. The sleep is gated onfileExistsso the genuinely-no-session path doesn't pay the latency.cliBackendLog.warnwithsessionId,homeDir, and the per-project diagnostic. This makes the failure mode visible instead of silent.Closes
Closes #81042
Test plan
src/agents/command/attempt-execution.test.tsis the established home for these helpers; new tests extend that file:trueimmediately, no retry sleep observed. Spies onglobalThis.setTimeoutand asserts no 150ms delay was scheduled.true. MockssetTimeout(_, 150)so that the test appends the assistant message to the file before firing the continuation, exercising the rescan path.false, warn fires once with"after 150ms retry"and includes the sessionId.false, warn fires with"no matching jsonl"andprojectCount=<n>.pnpm tsgoandpnpm check:test-typesboth pass.Affected scope
claude-clibackend only, on the resume path. First-turn / cold-start callers never hit this probe.Risk
transcript-missingrecovery, so the latency budget is essentially free. The truly-no-session path (no.claude/projects/*/sessionId.jsonlanywhere) keeps its current latency.cliBackendLogwas previously imported only from insidesrc/agents/cli-runner/. The new import insrc/agents/command/attempt-execution.helpers.tsis a one-line cross-module reference into../cli-runner/log.js—log.tshas no transitive imports fromcommand/so there's no cycle. (Verified withpnpm tsgoand tests.)Real Behavior Proof
Behavior or issue addressed: Closes #81042.
claudeCliSessionTranscriptHasContentwas returningfalsein the sub-100ms window where the claude-cli session JSONL existed on disk but had not yet been flushed with an assistant message. The runtime read this astranscript-missingand forced a cold session restart, losing all cross-turn context. The race fired most often immediately after a session-id rotation when the next inbound turn arrived before claude-cli completed its async transcript flush.Real environment tested: Live OpenClaw 2026.5.7 install on Ubuntu 24.04 (
Linux 6.17.0-23-generic x86_64), Node v22.22.0, claude-cli backend (Claude Max subscription), Telegram bot channel. Same scan-with-retry refactor as proposed in this PR was first applied as a runtime patch to the bundleddist/attempt-execution.helpers-CENmB56f.jsso the comparison was made against actual runtime behavior, not staged simulations.Exact steps or command run after this patch:
claudeCliSessionTranscriptHasContent(matches this PR's diff: split scan into a helper returning{ hasAssistant, fileExists, ... }per project, retry once after 150ms whenfileExists && !hasAssistant, log acliBackendLog.warndiagnostic on the negative path).systemctl --user restart openclaw-gatewaymissing-transcriptor restart events:Evidence after fix: Redacted live runtime journal output captured directly from
journalctl --user -u openclaw-gatewayon the actual install.Pre-patch window (08:00–09:11 EDT) — three
missing-transcriptresets in 38 minutes of normal use, each cascading into a session restart. Post-hoc inspection of the JSONL files for these session ids confirmed they did exist on disk; the probe was racing the flush:User-visible Telegram conversation excerpt at the moment of the 08:45:08 reset, showing the cross-turn amnesia symptom from the user's perspective:
Post-patch window (09:11+ EDT) — full output of the same journal query:
Live Telegram screenshot from the affected install, captured pre-patch on 2026-05-12. The user asks "still working on it?" 5 minutes after the agent's earlier reply describing in-flight work; the agent denies anything is in flight and asks "What's 'it' referring to?". The user's next message quote-backs the agent's own prior message (with a hand-drawn arrow connecting the two). Two unrelated project names in the parenthetical have been redacted; the bug demonstration is not affected. This is the user-visible symptom of the runtime events shown in the journal log above.
Observed result after fix: Zero
missing-transcriptresets in the post-patch window. Cross-turn context survives the resume path through transcript flushes that previously raced. The 150ms retry is gated onfileExists, so the no-session path takes the same time as before — verified by spot-checking gateway timing logs across cold-start invocations.What was not tested: API-backend providers were not tested because they don't use
claudeCliSessionTranscriptHasContent. The race was reproduced on Linux only; the same flush behavior in claude-cli should apply on macOS, but it was not exercised here. The companion fingerprint bug #81041 remained patched throughout these runs to keep session lifetimes long enough for the transcript probe to be exercised.