Skip to content

test(codex): pin completion-idle timeout thread reset#90027

Merged
kevinslin merged 1 commit into
openclaw:mainfrom
harjothkhara:test/codex-89974-timeout-thread-abandonment
Jun 6, 2026
Merged

test(codex): pin completion-idle timeout thread reset#90027
kevinslin merged 1 commit into
openclaw:mainfrom
harjothkhara:test/codex-89974-timeout-thread-abandonment

Conversation

@harjothkhara

@harjothkhara harjothkhara commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Summary

Regression coverage for issue 89974 ("Codex app-server turn idle timeout is surfaced as user interruption"). No runtime behavior changes.

  • Confirms that after a turn_completion_idle_timeout, OpenClaw clears the timed-out Codex app-server thread binding.
  • Confirms the next turn starts a fresh thread (thread/start) instead of resuming the thread that may still contain Codex's generic <turn_aborted> / user-interrupted marker.

The "user interrupted the previous turn on purpose" wording is Codex's own <turn_aborted> rollout marker; OpenClaw cannot change it (turn/interrupt carries no reason), only avoid replaying it. The behavior that abandons the timed-out thread landed in #87781; this test pins it for the exact issue shape.

Collision and current state

  • No direct competing PR currently claims or closes issue 89974; this PR is the only direct regression-coverage PR found for that issue.
  • fix(codex): prevent false completion stalls during native streams #87781 already fixed the runtime thread-abandonment behavior this PR covers. This PR is intentionally a test-only guardrail, not a second runtime fix.
  • This PR intentionally avoids a closing keyword for issue 89974 because it pins one already-fixed path and should not by itself close the broader issue.

Proof and merge path

This verifies the behavior at the code/test level; no live end-to-end repro was run. Because the PR is test-only and introduces no new runtime behavior, the remaining proof-policy blocker needs maintainer judgment, a proof: override label, or separately supplied redacted live logs showing the after-timeout next turn starts fresh.

I am not broadening this PR into a runtime classification/copy fix: that would change the scope, require a different proof lane, and is separate from pinning the #87781 thread-reset behavior.

Validation

  • new regression test: passed
  • run-attempt.turn-watches.test.ts: 56 passed
  • run.codex-app-server-recovery.test.ts: 10 passed
  • oxfmt / oxlint / tsgo / git diff --check: passed

Refs issue 89974. Behavior fixed by #87781.

@openclaw-barnacle openclaw-barnacle Bot added extensions: codex size: S triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels Jun 3, 2026
@clawsweeper

clawsweeper Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 4, 2026, 1:35 AM ET / 05:35 UTC.

Summary
Adds a Codex app-server Vitest regression test asserting that a completion-idle timeout clears the resumed thread binding and that the next turn starts a fresh thread.

PR surface: Tests +53. Total +53 across 1 file.

Reproducibility: no. high-confidence live current-main reproduction was established. The source path is clear and the proposed test exercises the fixed timeout-to-fresh-thread behavior, but the contributor supplied only code/test validation.

Review metrics: none identified.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🐚 platinum hermit
Result: blocked until real behavior proof from a real setup is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P2] Add redacted terminal/log/live output showing an after-timeout follow-up starts fresh, or ask a maintainer to apply a proof override.

Proof guidance:

  • [P1] Needs real behavior proof before merge: The PR body explicitly says no live end-to-end repro was run; supplied evidence is code/test validation only, so real behavior proof or a maintainer proof override is still required. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

  • [P1] External real behavior proof is still code/test-only; the PR body explicitly says no live end-to-end repro was run, so maintainers need a proof override or redacted live logs before merge.

Maintainer options:

  1. Decide the mitigation before merge
    Merge the test after a maintainer proof override or redacted live evidence, while keeping the runtime behavior from the merged completion-stall recovery work as the implementation.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • [P1] A human maintainer needs to decide whether this test-only external PR can merge with a proof override or should wait for redacted live evidence; there is no narrow automated code repair to make.

Security
Cleared: The diff only changes a Vitest test file and uses existing local helpers, with no dependency, workflow, secret, runtime, or supply-chain surface changed.

Review details

Best possible solution:

Merge the test after a maintainer proof override or redacted live evidence, while keeping the runtime behavior from the merged completion-stall recovery work as the implementation.

Do we have a high-confidence way to reproduce the issue?

No high-confidence live current-main reproduction was established. The source path is clear and the proposed test exercises the fixed timeout-to-fresh-thread behavior, but the contributor supplied only code/test validation.

Is this the best way to solve the issue?

Yes for this PR's scope. A focused regression test in the Codex app-server turn-watch suite is the narrowest maintainable way to pin the already-fixed thread reset without changing runtime behavior.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 5a10f46c56a0.

Label changes

Label justifications:

  • P3: This is a low-risk test-only regression guard with no runtime behavior change, so the remaining maintainer work is proof judgment rather than urgent repair.
  • rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🐚 platinum hermit.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body explicitly says no live end-to-end repro was run; supplied evidence is code/test validation only, so real behavior proof or a maintainer proof override is still required. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
Evidence reviewed

PR surface:

Tests +53. Total +53 across 1 file.

View PR surface stats
Area Files Added Removed Net
Source 0 0 0 0
Tests 1 54 1 +53
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 1 54 1 +53

What I checked:

Likely related people:

  • steipete: Current shallow blame points the central Codex app-server files to Peter Steinberger, and the linked merged runtime-fix PR commits were committed/merged by steipete. (role: recent area contributor and merger; confidence: high; commits: 6f08a1a3dd2a, 5f35ccbdf018, 2d0ff138b698; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/session-binding.ts, extensions/codex/src/app-server/thread-lifecycle.ts)
  • keshavbotagent: Authored the earlier merged completion-stall recovery work in fix(codex): prevent false completion stalls during native streams #87781 that this PR is explicitly pinning with regression coverage. (role: linked runtime fix author; confidence: high; commits: f434af936ecb, 3eb0a1f3f860, 7a8dfeaf0a4c; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/run-attempt.turn-watches.test.ts, src/agents/embedded-agent-runner/run.codex-app-server-recovery.test.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@harjothkhara harjothkhara force-pushed the test/codex-89974-timeout-thread-abandonment branch from ee5415e to ed8c9a1 Compare June 3, 2026 21:48
@harjothkhara harjothkhara changed the title test(codex): cover thread abandonment after watchdog idle timeout (#89974) test(codex): cover thread abandonment after completion-idle timeout Jun 3, 2026
@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 3, 2026
@openclaw-barnacle openclaw-barnacle Bot added triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. and removed triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels Jun 3, 2026
@harjothkhara

harjothkhara commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

This PR is intentionally test-only: it adds regression coverage for #89974 and does not change runtime behavior.

The behavior under test appears to have already been fixed by #87781. This PR only pins the regression path: after turn_completion_idle_timeout, OpenClaw clears the timed-out Codex app-server thread binding, so the next turn starts a fresh thread instead of resuming the one that may contain Codex's generic <turn_aborted> / "user interrupted" marker.

Because this is a test-only external PR, I cannot honestly provide new live runtime "real behavior proof" beyond the code/test verification already listed. The red "Real behavior proof" check appears to require maintainer judgment or a proof: override label if maintainers want this regression test merged.

@clawsweeper clawsweeper Bot added the P3 Low-priority cleanup, docs, polish, ergonomics, or speculative work. label Jun 3, 2026
@harjothkhara

harjothkhara commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

Merge-state note: this PR is mergeable with no conflicts. GitHub reports mergeable_state: unstable, but the red Real behavior proof check appears to be the only non-green signal and is not a blocking required check.

This PR is intentionally test-only. It pins the #89974 regression fixed by #87781, so there is no live runtime behavior proof to add. A proof: override clears it, or this can be merged as-is if maintainers are comfortable. Happy to close instead if you'd rather keep just the #87781 fix.

@harjothkhara harjothkhara changed the title test(codex): cover thread abandonment after completion-idle timeout test(codex): pin completion-idle timeout thread reset Jun 4, 2026
Regression coverage for openclaw#89974. Confirms that after a
turn_completion_idle_timeout, OpenClaw clears the timed-out Codex
app-server thread binding and the next turn starts a fresh thread instead
of resuming the thread that may hold Codex's generic <turn_aborted> /
user-interrupted marker. No runtime behavior changes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@harjothkhara harjothkhara force-pushed the test/codex-89974-timeout-thread-abandonment branch from ed8c9a1 to 2a6bada Compare June 4, 2026 05:28
@harjothkhara

Copy link
Copy Markdown
Contributor Author

On the Real behavior proof gate

This PR is intentionally test-only (+54/-1, a single regression test) with no runtime change. The behavior it pins — clearing a timed-out Codex thread binding so the next turn starts a fresh thread/start rather than resuming a thread still carrying Codex's <turn_aborted> marker — already shipped in #87781 (merged 2026-05-29).

Because there's no new runtime code here, there's no new runtime behavior to capture live; a "live repro" would only re-demonstrate #87781's merged fix. The test drives the real runCodexAppServerAttempt path with only the external Codex app-server transport stubbed (the same boundary the rest of this file stubs), exercising the genuine idle-timeout → binding-clear → fresh-thread logic.

Per the proof policy this looks like a fit for a maintainer proof: override (test-only guardrail for already-merged behavior). Could a maintainer apply it — or advise if you'd prefer this folded into the #87781 coverage or closed? Happy to adjust scope either way.

@kevinslin

Copy link
Copy Markdown
Contributor

lgtm. thanks for the coverage

@kevinslin kevinslin merged commit e5d1fad into openclaw:main Jun 6, 2026
150 of 151 checks passed
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request Jun 6, 2026
…penclaw#90027)

Regression coverage for openclaw#89974. Confirms that after a
turn_completion_idle_timeout, OpenClaw clears the timed-out Codex
app-server thread binding and the next turn starts a fresh thread instead
of resuming the thread that may hold Codex's generic <turn_aborted> /
user-interrupted marker. No runtime behavior changes.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
849261680 pushed a commit to 849261680/openclaw that referenced this pull request Jun 7, 2026
…penclaw#90027)

Regression coverage for openclaw#89974. Confirms that after a
turn_completion_idle_timeout, OpenClaw clears the timed-out Codex
app-server thread binding and the next turn starts a fresh thread instead
of resuming the thread that may hold Codex's generic <turn_aborted> /
user-interrupted marker. No runtime behavior changes.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
wangmiao0668000666 pushed a commit to wangmiao0668000666/openclaw that referenced this pull request Jun 9, 2026
…penclaw#90027)

Regression coverage for openclaw#89974. Confirms that after a
turn_completion_idle_timeout, OpenClaw clears the timed-out Codex
app-server thread binding and the next turn starts a fresh thread instead
of resuming the thread that may hold Codex's generic <turn_aborted> /
user-interrupted marker. No runtime behavior changes.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
eleboucher pushed a commit to eleboucher/homelab that referenced this pull request Jun 9, 2026
…26.6.5) (#963)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [ghcr.io/openclaw/openclaw](https://openclaw.ai) ([source](https://github.com/openclaw/openclaw)) | patch | `2026.6.1` → `2026.6.5` |

---

### Release Notes

<details>
<summary>openclaw/openclaw (ghcr.io/openclaw/openclaw)</summary>

### [`v2026.6.5`](https://github.com/openclaw/openclaw/blob/HEAD/CHANGELOG.md#202665)

[Compare Source](openclaw/openclaw@v2026.6.1...v2026.6.5)

##### Highlights

- QQBot now strips model reasoning/thinking scaffolding before native delivery, preventing raw `<thinking>` content from leaking into channel replies. ([#&#8203;89913](openclaw/openclaw#89913), [#&#8203;90132](openclaw/openclaw#90132)) Thanks [@&#8203;openperf](https://github.com/openperf).
- MCP tool results now coerce `resource_link`, `resource`, `audio`, malformed image, and future non-text/image blocks at the materialize boundary, preventing Anthropic 400s and poisoned session history after a tool returns richer MCP content. ([#&#8203;90710](openclaw/openclaw#90710), [#&#8203;90728](openclaw/openclaw#90728)) Thanks [@&#8203;RanSHammer](https://github.com/RanSHammer) and [@&#8203;849261680](https://github.com/849261680).
- Anthropic extended-thinking sessions recover after prompt-cache expiry or Gateway restart because stream start events wait for `message_start`, letting pre-generation signature errors trigger the existing recovery retry. ([#&#8203;90667](openclaw/openclaw#90667), [#&#8203;90697](openclaw/openclaw#90697)) Thanks [@&#8203;openperf](https://github.com/openperf).
- Parallel is now a bundled `web_search` provider with `PARALLEL_API_KEY` discovery, guarded endpoint handling, cache-safe session ids, onboarding picker support, and docs. ([#&#8203;85158](openclaw/openclaw#85158)) Thanks [@&#8203;NormallyGaussian](https://github.com/NormallyGaussian).
- Google Vertex ADC users get static catalog rows and runtime model resolution again, while single-provider cooldown recovery and memory adapter status checks are more reliable. ([#&#8203;90506](openclaw/openclaw#90506), [#&#8203;90609](openclaw/openclaw#90609), [#&#8203;90717](openclaw/openclaw#90717), [#&#8203;90816](openclaw/openclaw#90816)) Thanks [@&#8203;849261680](https://github.com/849261680).
- Matrix can preflight voice notes before mention gating, preserve thread reads/replies through Matrix relations pagination, and carry QA coverage for voice and thread flows. ([#&#8203;78016](openclaw/openclaw#78016), [#&#8203;90415](openclaw/openclaw#90415))
- Auth and plugin install state is more durable: auth profiles now live in SQLite, official npm plugin install records keep their trusted pins, and prerelease fallback integrity checks avoid carrying stale integrity forward. ([#&#8203;89102](openclaw/openclaw#89102), [#&#8203;88585](openclaw/openclaw#88585))
- macOS node mode no longer silently self-reconnects away from a healthy direct Gateway session, reducing unexpected companion app session churn. ([#&#8203;90668](openclaw/openclaw#90668), [#&#8203;90815](openclaw/openclaw#90815)) Thanks [@&#8203;vrurg](https://github.com/vrurg).
- Upgrade and service paths are safer: cron legacy JSON stores migrate during doctor preflight, service env placeholders no longer mask state-dir secrets, WhatsApp startup waits are bounded, and disabled WhatsApp accounts tear down on config reload. ([#&#8203;90072](openclaw/openclaw#90072), [#&#8203;90208](openclaw/openclaw#90208), [#&#8203;90277](openclaw/openclaw#90277), [#&#8203;90488](openclaw/openclaw#90488), [#&#8203;90486](openclaw/openclaw#90486), [#&#8203;87951](openclaw/openclaw#87951), [#&#8203;87965](openclaw/openclaw#87965)) Thanks [@&#8203;MonkeyLeeT](https://github.com/MonkeyLeeT), [@&#8203;sallyom](https://github.com/sallyom), [@&#8203;mcaxtr](https://github.com/mcaxtr), and [@&#8203;MukundaKatta](https://github.com/MukundaKatta).

##### Changes

- Search/providers: add the Parallel bundled web-search plugin, live provider tests, registration contracts, onboarding/docs wiring, and guarded `api.parallel.ai/v1/search` support. ([#&#8203;85158](openclaw/openclaw#85158)) Thanks [@&#8203;NormallyGaussian](https://github.com/NormallyGaussian).
- Matrix/channels: add voice-message preflight and thread-aware read/reply behavior, including Matrix QA scenario wiring and docs for voice-message behavior. ([#&#8203;78016](openclaw/openclaw#78016), [#&#8203;90415](openclaw/openclaw#90415))
- Skills/ClawHub: install ClawHub skills backed by GitHub repositories through the resolved install API, download the pinned GitHub commit, keep install-policy checks, and report install telemetry after success. ([#&#8203;90478](openclaw/openclaw#90478)) Thanks [@&#8203;Patrick-Erichsen](https://github.com/Patrick-Erichsen).
- Google Chat/channels: add native approval card actions and click handling so Google Chat approvals use platform-native cards instead of generic message flow.
- Mobile: Android provider/model screens now surface expiring, unavailable, unresolved, and attention states more clearly, while iOS settings and Talk tabs keep diagnostics, gateway rows, attachment labels, and unavailable Talk controls reachable.
- Memory: QMD search can use the new rerank toggle, and memory adapter status uses the resolved default model identity when checking plain status. ([#&#8203;61834](openclaw/openclaw#61834))
- Docs/tooling: add Parallel search docs, refresh weather-skill guidance toward `web_fetch`, clarify legacy `openai-codex` auth, document release/test helper scripts, and tighten changed-test routing docs for CI/debugging work. ([#&#8203;90028](openclaw/openclaw#90028), [#&#8203;90250](openclaw/openclaw#90250)) Thanks [@&#8203;fuller-stack-dev](https://github.com/fuller-stack-dev).
- Release/process: switch release trains to `YYYY.M.PATCH` monthly patch numbering, keep pre-transition tags compatible, and pin the June 2026 floor at `2026.6.5` after the published beta.
- Platform maintenance: refresh Android, Swift/macOS, Docker, CodeQL, Buildx, Docker build/push, and Codex Action dependencies for this release train. ([#&#8203;74980](openclaw/openclaw#74980), [#&#8203;81757](openclaw/openclaw#81757), [#&#8203;86481](openclaw/openclaw#86481), [#&#8203;86483](openclaw/openclaw#86483), [#&#8203;90601](openclaw/openclaw#90601))
- QQBot: add `/bot-group-allways on|off` slash command (with named-account and default-account support) to toggle whether group messages require an `@mention` before the bot replies, and clear the runtime config snapshot after the write so the new account-level `defaultRequireMention` takes effect immediately without restart. ([#&#8203;91423](openclaw/openclaw#91423)) Thanks [@&#8203;cxyhhhhh](https://github.com/cxyhhhhh).

##### Fixes

- Channel content boundaries: QQBot now strips reasoning/thinking tags before sending, preserving final answers while hiding internal model narration from users. ([#&#8203;89913](openclaw/openclaw#89913), [#&#8203;90132](openclaw/openclaw#90132)) Thanks [@&#8203;openperf](https://github.com/openperf).
- Agents/MCP/providers: coerce non-text/image MCP tool-result blocks before they reach provider converters, preserving valid images and turning richer MCP content into text instead of malformed image blocks. ([#&#8203;90710](openclaw/openclaw#90710), [#&#8203;90728](openclaw/openclaw#90728)) Thanks [@&#8203;RanSHammer](https://github.com/RanSHammer) and [@&#8203;849261680](https://github.com/849261680).
- Anthropic/Codex/ACP/agent recovery: defer Anthropic stream start events until `message_start`, strip stale compaction thinking signatures before Anthropic replay, detect unsigned thinking-only stalls, refresh prompt fences after compaction writes, reject empty completion handoffs, preserve parent streaming-off overrides/shared progress commentary, forward heartbeat metadata to context-engine hooks, and cover Codex session/thread migration edge cases. ([#&#8203;90667](openclaw/openclaw#90667), [#&#8203;90697](openclaw/openclaw#90697), [#&#8203;90163](openclaw/openclaw#90163), [#&#8203;90108](openclaw/openclaw#90108), [#&#8203;89874](openclaw/openclaw#89874), [#&#8203;89505](openclaw/openclaw#89505), [#&#8203;90632](openclaw/openclaw#90632), [#&#8203;89302](openclaw/openclaw#89302), [#&#8203;90729](openclaw/openclaw#90729), [#&#8203;90317](openclaw/openclaw#90317), [#&#8203;90319](openclaw/openclaw#90319)) Thanks [@&#8203;openperf](https://github.com/openperf), [@&#8203;100yenadmin](https://github.com/100yenadmin), and [@&#8203;ooiuuii](https://github.com/ooiuuii).
- Provider/model resolution: preserve Google Vertex ADC auth markers in generated catalogs, re-probe a single-provider primary after cooldown, share Codex model visibility, fail closed for unknown model auth, preserve Codex alias availability, keep unresolved profile refs unknown, and avoid resolving auth while listing models. ([#&#8203;90506](openclaw/openclaw#90506), [#&#8203;90609](openclaw/openclaw#90609), [#&#8203;90717](openclaw/openclaw#90717), [#&#8203;90702](openclaw/openclaw#90702)) Thanks [@&#8203;849261680](https://github.com/849261680).
- Gateway/macOS/mobile: avoid duplicate Gateway probe warnings by identity, rate-limit node pairing requests while preserving paired-node reconnects, keep macOS node mode on a healthy direct Gateway session, keep iOS diagnostics and gateway rows reachable, and avoid Linux ARM Gradle resource tasks during Android builds. ([#&#8203;85791](openclaw/openclaw#85791), [#&#8203;90147](openclaw/openclaw#90147), [#&#8203;90668](openclaw/openclaw#90668), [#&#8203;90815](openclaw/openclaw#90815)) Thanks [@&#8203;giodl73-repo](https://github.com/giodl73-repo) and [@&#8203;vrurg](https://github.com/vrurg).
- TUI/chat/Workboard/auto-reply: optimistic user messages stay stable across stale history reloads, runId reassignment, and abort windows instead of disappearing, jumping, or lingering as ghost rows; Workboard stale lifecycle bulk updates no longer overwrite newer status/provenance; message-tool sends now count as delivery. ([#&#8203;86205](openclaw/openclaw#86205), [#&#8203;89600](openclaw/openclaw#89600), [#&#8203;88592](openclaw/openclaw#88592), [#&#8203;90123](openclaw/openclaw#90123)) Thanks [@&#8203;RomneyDa](https://github.com/RomneyDa).
- Cron/update/service env: doctor config preflight now migrates legacy cron JSON stores into SQLite before runtime reads, service env planning skips unresolved placeholders that would mask state-dir `.env` values, and session transcript rewrites keep registry markers/discriminants consistent. ([#&#8203;90072](openclaw/openclaw#90072), [#&#8203;90208](openclaw/openclaw#90208), [#&#8203;90277](openclaw/openclaw#90277), [#&#8203;90488](openclaw/openclaw#90488)) Thanks [@&#8203;MonkeyLeeT](https://github.com/MonkeyLeeT) and [@&#8203;sallyom](https://github.com/sallyom).
- Security/config/tooling: guard MCP HTTP redirects, protect global agent config defaults, and keep release/test/tooling proof failures bounded and explicit. ([#&#8203;89732](openclaw/openclaw#89732), [#&#8203;90145](openclaw/openclaw#90145))
- Channels: WhatsApp restarts when per-account config changes, bounds background startup waits, closes failed sockets, and preserves reconnect behavior; Mattermost slash commands keep their state on `globalThis`; Feishu streaming cards preserve full merged content; voice-call tracks Twilio streams after connect; ClickClack reply tools respect `toolsAllow`. ([#&#8203;87951](openclaw/openclaw#87951), [#&#8203;87965](openclaw/openclaw#87965), [#&#8203;90486](openclaw/openclaw#90486), [#&#8203;68113](openclaw/openclaw#68113), [#&#8203;90534](openclaw/openclaw#90534), [#&#8203;90181](openclaw/openclaw#90181), [#&#8203;90607](openclaw/openclaw#90607), [#&#8203;89500](openclaw/openclaw#89500)) Thanks [@&#8203;MukundaKatta](https://github.com/MukundaKatta), [@&#8203;mcaxtr](https://github.com/mcaxtr), [@&#8203;infoanton](https://github.com/infoanton), [@&#8203;mushuiyu886](https://github.com/mushuiyu886), and [@&#8203;sahibzada-allahyar](https://github.com/sahibzada-allahyar).
- Feishu: retry transient send rate-limit errors (HTTP 429, per-chat code 230020, tenant-level code 11232) with linear backoff, including SDK responses that fulfill with rate-limit bodies instead of throwing, and route streaming-card sends through the retry wrapper. ([#&#8203;89659](openclaw/openclaw#89659)) Thanks [@&#8203;ladygege](https://github.com/ladygege).
- Release/CI/E2E: main CI guard drift, PR merge diff scoping, live Docker credential staging, base-image qualification, installer Docker classification, Playwright dependency install recovery, API-key auth for Codex live Docker lanes, Parallels option terminators, and JSON-mode progress handling are tighter so release proof fails cleaner. ([#&#8203;90532](openclaw/openclaw#90532), [#&#8203;90287](openclaw/openclaw#90287), [#&#8203;90058](openclaw/openclaw#90058)) Thanks [@&#8203;RomneyDa](https://github.com/RomneyDa), [@&#8203;hxy91819](https://github.com/hxy91819), and [@&#8203;mrunalp](https://github.com/mrunalp).
- Release/CI/E2E: Docker E2E and live Docker harness runs now apply default memory, CPU, and process ceilings while preserving explicit per-lane overrides.
- Release/CI/E2E: plugin lifecycle matrix resource sampling now fails phases that exceed RSS, wall-clock, or CPU ceilings instead of only logging the measurements.
- Release/CI/E2E: Codex npm plugin live assertions now cap transcript discovery and diagnostic log reads so failure proof stays bounded.
- Tests/state isolation: QA Lab valid-tool-call metrics now require runtime tool-call evidence when runtime parity data is available instead of counting tool-backed scenario pass status alone.
- Tests/state isolation: QA Lab runtime parity now fails planned-only tool-call rows without matching tool results instead of treating matching mock plans as real tool evidence.
- Tests/state isolation: provider, media, auth, cron, task, session, sandbox, Gateway, and Codex timeout fixtures now scope more home/state/env data per test, reducing cross-test leakage and making release validation failures less noisy. ([#&#8203;90027](openclaw/openclaw#90027), [#&#8203;89974](openclaw/openclaw#89974))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these updates again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xMDEuMSIsInVwZGF0ZWRJblZlciI6IjQzLjEwMS4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL3BhdGNoIl19-->

Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/963
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: codex P3 Low-priority cleanup, docs, polish, ergonomics, or speculative work. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. size: S status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants