test(codex): pin completion-idle timeout thread reset by harjothkhara · Pull Request #90027 · openclaw/openclaw

harjothkhara · 2026-06-03T21:41:52Z

Summary

Regression coverage for issue 89974 ("Codex app-server turn idle timeout is surfaced as user interruption"). No runtime behavior changes.

Confirms that after a turn_completion_idle_timeout, OpenClaw clears the timed-out Codex app-server thread binding.
Confirms the next turn starts a fresh thread (thread/start) instead of resuming the thread that may still contain Codex's generic <turn_aborted> / user-interrupted marker.

The "user interrupted the previous turn on purpose" wording is Codex's own <turn_aborted> rollout marker; OpenClaw cannot change it (turn/interrupt carries no reason), only avoid replaying it. The behavior that abandons the timed-out thread landed in #87781; this test pins it for the exact issue shape.

Collision and current state

No direct competing PR currently claims or closes issue 89974; this PR is the only direct regression-coverage PR found for that issue.
fix(codex): prevent false completion stalls during native streams #87781 already fixed the runtime thread-abandonment behavior this PR covers. This PR is intentionally a test-only guardrail, not a second runtime fix.
This PR intentionally avoids a closing keyword for issue 89974 because it pins one already-fixed path and should not by itself close the broader issue.

Proof and merge path

This verifies the behavior at the code/test level; no live end-to-end repro was run. Because the PR is test-only and introduces no new runtime behavior, the remaining proof-policy blocker needs maintainer judgment, a proof: override label, or separately supplied redacted live logs showing the after-timeout next turn starts fresh.

I am not broadening this PR into a runtime classification/copy fix: that would change the scope, require a different proof lane, and is separate from pinning the #87781 thread-reset behavior.

Validation

new regression test: passed
run-attempt.turn-watches.test.ts: 56 passed
run.codex-app-server-recovery.test.ts: 10 passed
oxfmt / oxlint / tsgo / git diff --check: passed

Refs issue 89974. Behavior fixed by #87781.

clawsweeper · 2026-06-03T21:43:50Z

Codex review: needs real behavior proof before merge. Reviewed June 4, 2026, 1:35 AM ET / 05:35 UTC.

Summary
Adds a Codex app-server Vitest regression test asserting that a completion-idle timeout clears the resumed thread binding and that the next turn starts a fresh thread.

PR surface: Tests +53. Total +53 across 1 file.

Reproducibility: no. high-confidence live current-main reproduction was established. The source path is clear and the proposed test exercises the fixed timeout-to-fresh-thread behavior, but the contributor supplied only code/test validation.

Review metrics: none identified.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🐚 platinum hermit
Result: blocked until real behavior proof from a real setup is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

[P2] Add redacted terminal/log/live output showing an after-timeout follow-up starts fresh, or ask a maintainer to apply a proof override.

Proof guidance:

[P1] Needs real behavior proof before merge: The PR body explicitly says no live end-to-end repro was run; supplied evidence is code/test validation only, so real behavior proof or a maintainer proof override is still required. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

[P1] External real behavior proof is still code/test-only; the PR body explicitly says no live end-to-end repro was run, so maintainers need a proof override or redacted live logs before merge.

Maintainer options:

Decide the mitigation before merge
Merge the test after a maintainer proof override or redacted live evidence, while keeping the runtime behavior from the merged completion-stall recovery work as the implementation.
Pause or close
Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

[P1] A human maintainer needs to decide whether this test-only external PR can merge with a proof override or should wait for redacted live evidence; there is no narrow automated code repair to make.

Security
Cleared: The diff only changes a Vitest test file and uses existing local helpers, with no dependency, workflow, secret, runtime, or supply-chain surface changed.

Review details

Best possible solution:

Merge the test after a maintainer proof override or redacted live evidence, while keeping the runtime behavior from the merged completion-stall recovery work as the implementation.

Do we have a high-confidence way to reproduce the issue?

No high-confidence live current-main reproduction was established. The source path is clear and the proposed test exercises the fixed timeout-to-fresh-thread behavior, but the contributor supplied only code/test validation.

Is this the best way to solve the issue?

Yes for this PR's scope. A focused regression test in the Codex app-server turn-watch suite is the narrowest maintainable way to pin the already-fixed thread reset without changing runtime behavior.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 5a10f46c56a0.

Label changes

Label justifications:

P3: This is a low-risk test-only regression guard with no runtime behavior change, so the remaining maintainer work is proof judgment rather than urgent repair.
rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🐚 platinum hermit.
status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body explicitly says no live end-to-end repro was run; supplied evidence is code/test validation only, so real behavior proof or a maintainer proof override is still required. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Evidence reviewed

PR surface:

Tests +53. Total +53 across 1 file.

View PR surface stats

Area	Files	Added	Removed	Net
Source	0	0	0	0
Tests	1	54	1	+53
Docs	0	0	0	0
Config	0	0	0	0
Generated	0	0	0	0
Other	0	0	0	0
Total	1	54	1	+53

What I checked:

Changed surface: The PR modifies only run-attempt.turn-watches.test.ts, adding a two-turn regression test plus imports for existing harness and binding helpers. (extensions/codex/src/app-server/run-attempt.turn-watches.test.ts:2869, 2a6badaa9523)
Current timeout cleanup behavior: On current main, timed-out native turns clear the binding for the active thread before retiring the app-server client, which is the runtime behavior the PR pins. (extensions/codex/src/app-server/run-attempt.ts:2198, 5a10f46c56a0)
Binding clear is thread-scoped: clearCodexAppServerBindingForThread preserves a newer/different binding and only unlinks the binding when the stored thread id matches. (extensions/codex/src/app-server/session-binding.ts:309, 5a10f46c56a0)
Fresh start follows missing binding: startOrResumeThread reads the binding, resumes when one is present, and otherwise falls through to thread/start, matching the PR's second-turn assertion. (extensions/codex/src/app-server/thread-lifecycle.ts:335, 5a10f46c56a0)
Linked runtime fix provenance: The linked merged PR fix(codex): prevent false completion stalls during native streams #87781 includes commits for completion-idle recovery and the thread-scoped binding clear; the release compare shows v2026.6.1 contains its merge commit. (extensions/codex/src/app-server/run-attempt.ts:2198, 5a10f46c56a0)
Upstream Codex contract: Codex upstream owns the <turn_aborted> guidance text, and the app-server turn/interrupt client call carries only threadId and turnId, not an abort reason. (../codex/codex-rs/core/src/context/turn_aborted.rs:9, ad2012d645b7)

Likely related people:

steipete: Current shallow blame points the central Codex app-server files to Peter Steinberger, and the linked merged runtime-fix PR commits were committed/merged by steipete. (role: recent area contributor and merger; confidence: high; commits: 6f08a1a3dd2a, 5f35ccbdf018, 2d0ff138b698; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/session-binding.ts, extensions/codex/src/app-server/thread-lifecycle.ts)
keshavbotagent: Authored the earlier merged completion-stall recovery work in fix(codex): prevent false completion stalls during native streams #87781 that this PR is explicitly pinning with regression coverage. (role: linked runtime fix author; confidence: high; commits: f434af936ecb, 3eb0a1f3f860, 7a8dfeaf0a4c; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/run-attempt.turn-watches.test.ts, src/agents/embedded-agent-runner/run.codex-app-server-recovery.test.ts)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

harjothkhara · 2026-06-03T21:57:38Z

This PR is intentionally test-only: it adds regression coverage for #89974 and does not change runtime behavior.

The behavior under test appears to have already been fixed by #87781. This PR only pins the regression path: after turn_completion_idle_timeout, OpenClaw clears the timed-out Codex app-server thread binding, so the next turn starts a fresh thread instead of resuming the one that may contain Codex's generic <turn_aborted> / "user interrupted" marker.

Because this is a test-only external PR, I cannot honestly provide new live runtime "real behavior proof" beyond the code/test verification already listed. The red "Real behavior proof" check appears to require maintainer judgment or a proof: override label if maintainers want this regression test merged.

harjothkhara · 2026-06-04T00:23:34Z

Merge-state note: this PR is mergeable with no conflicts. GitHub reports mergeable_state: unstable, but the red Real behavior proof check appears to be the only non-green signal and is not a blocking required check.

This PR is intentionally test-only. It pins the #89974 regression fixed by #87781, so there is no live runtime behavior proof to add. A proof: override clears it, or this can be merged as-is if maintainers are comfortable. Happy to close instead if you'd rather keep just the #87781 fix.

Regression coverage for openclaw#89974. Confirms that after a turn_completion_idle_timeout, OpenClaw clears the timed-out Codex app-server thread binding and the next turn starts a fresh thread instead of resuming the thread that may hold Codex's generic <turn_aborted> / user-interrupted marker. No runtime behavior changes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

harjothkhara · 2026-06-05T18:26:52Z

On the Real behavior proof gate

This PR is intentionally test-only (+54/-1, a single regression test) with no runtime change. The behavior it pins — clearing a timed-out Codex thread binding so the next turn starts a fresh thread/start rather than resuming a thread still carrying Codex's <turn_aborted> marker — already shipped in #87781 (merged 2026-05-29).

Because there's no new runtime code here, there's no new runtime behavior to capture live; a "live repro" would only re-demonstrate #87781's merged fix. The test drives the real runCodexAppServerAttempt path with only the external Codex app-server transport stubbed (the same boundary the rest of this file stubs), exercising the genuine idle-timeout → binding-clear → fresh-thread logic.

Per the proof policy this looks like a fit for a maintainer proof: override (test-only guardrail for already-merged behavior). Could a maintainer apply it — or advise if you'd prefer this folded into the #87781 coverage or closed? Happy to adjust scope either way.

kevinslin · 2026-06-06T00:41:48Z

lgtm. thanks for the coverage

…penclaw#90027) Regression coverage for openclaw#89974. Confirms that after a turn_completion_idle_timeout, OpenClaw clears the timed-out Codex app-server thread binding and the next turn starts a fresh thread instead of resuming the thread that may hold Codex's generic <turn_aborted> / user-interrupted marker. No runtime behavior changes. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

…26.6.5) (#963) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [ghcr.io/openclaw/openclaw](https://openclaw.ai) ([source](https://github.com/openclaw/openclaw)) | patch | `2026.6.1` → `2026.6.5` | --- ### Release Notes <details> <summary>openclaw/openclaw (ghcr.io/openclaw/openclaw)</summary> ### [`v2026.6.5`](https://github.com/openclaw/openclaw/blob/HEAD/CHANGELOG.md#202665) [Compare Source](openclaw/openclaw@v2026.6.1...v2026.6.5) ##### Highlights - QQBot now strips model reasoning/thinking scaffolding before native delivery, preventing raw `<thinking>` content from leaking into channel replies. ([#89913](openclaw/openclaw#89913), [#90132](openclaw/openclaw#90132)) Thanks [@openperf](https://github.com/openperf). - MCP tool results now coerce `resource_link`, `resource`, `audio`, malformed image, and future non-text/image blocks at the materialize boundary, preventing Anthropic 400s and poisoned session history after a tool returns richer MCP content. ([#90710](openclaw/openclaw#90710), [#90728](openclaw/openclaw#90728)) Thanks [@RanSHammer](https://github.com/RanSHammer) and [@849261680](https://github.com/849261680). - Anthropic extended-thinking sessions recover after prompt-cache expiry or Gateway restart because stream start events wait for `message_start`, letting pre-generation signature errors trigger the existing recovery retry. ([#90667](openclaw/openclaw#90667), [#90697](openclaw/openclaw#90697)) Thanks [@openperf](https://github.com/openperf). - Parallel is now a bundled `web_search` provider with `PARALLEL_API_KEY` discovery, guarded endpoint handling, cache-safe session ids, onboarding picker support, and docs. ([#85158](openclaw/openclaw#85158)) Thanks [@NormallyGaussian](https://github.com/NormallyGaussian). - Google Vertex ADC users get static catalog rows and runtime model resolution again, while single-provider cooldown recovery and memory adapter status checks are more reliable. ([#90506](openclaw/openclaw#90506), [#90609](openclaw/openclaw#90609), [#90717](openclaw/openclaw#90717), [#90816](openclaw/openclaw#90816)) Thanks [@849261680](https://github.com/849261680). - Matrix can preflight voice notes before mention gating, preserve thread reads/replies through Matrix relations pagination, and carry QA coverage for voice and thread flows. ([#78016](openclaw/openclaw#78016), [#90415](openclaw/openclaw#90415)) - Auth and plugin install state is more durable: auth profiles now live in SQLite, official npm plugin install records keep their trusted pins, and prerelease fallback integrity checks avoid carrying stale integrity forward. ([#89102](openclaw/openclaw#89102), [#88585](openclaw/openclaw#88585)) - macOS node mode no longer silently self-reconnects away from a healthy direct Gateway session, reducing unexpected companion app session churn. ([#90668](openclaw/openclaw#90668), [#90815](openclaw/openclaw#90815)) Thanks [@vrurg](https://github.com/vrurg). - Upgrade and service paths are safer: cron legacy JSON stores migrate during doctor preflight, service env placeholders no longer mask state-dir secrets, WhatsApp startup waits are bounded, and disabled WhatsApp accounts tear down on config reload. ([#90072](openclaw/openclaw#90072), [#90208](openclaw/openclaw#90208), [#90277](openclaw/openclaw#90277), [#90488](openclaw/openclaw#90488), [#90486](openclaw/openclaw#90486), [#87951](openclaw/openclaw#87951), [#87965](openclaw/openclaw#87965)) Thanks [@MonkeyLeeT](https://github.com/MonkeyLeeT), [@sallyom](https://github.com/sallyom), [@mcaxtr](https://github.com/mcaxtr), and [@MukundaKatta](https://github.com/MukundaKatta). ##### Changes - Search/providers: add the Parallel bundled web-search plugin, live provider tests, registration contracts, onboarding/docs wiring, and guarded `api.parallel.ai/v1/search` support. ([#85158](openclaw/openclaw#85158)) Thanks [@NormallyGaussian](https://github.com/NormallyGaussian). - Matrix/channels: add voice-message preflight and thread-aware read/reply behavior, including Matrix QA scenario wiring and docs for voice-message behavior. ([#78016](openclaw/openclaw#78016), [#90415](openclaw/openclaw#90415)) - Skills/ClawHub: install ClawHub skills backed by GitHub repositories through the resolved install API, download the pinned GitHub commit, keep install-policy checks, and report install telemetry after success. ([#90478](openclaw/openclaw#90478)) Thanks [@Patrick-Erichsen](https://github.com/Patrick-Erichsen). - Google Chat/channels: add native approval card actions and click handling so Google Chat approvals use platform-native cards instead of generic message flow. - Mobile: Android provider/model screens now surface expiring, unavailable, unresolved, and attention states more clearly, while iOS settings and Talk tabs keep diagnostics, gateway rows, attachment labels, and unavailable Talk controls reachable. - Memory: QMD search can use the new rerank toggle, and memory adapter status uses the resolved default model identity when checking plain status. ([#61834](openclaw/openclaw#61834)) - Docs/tooling: add Parallel search docs, refresh weather-skill guidance toward `web_fetch`, clarify legacy `openai-codex` auth, document release/test helper scripts, and tighten changed-test routing docs for CI/debugging work. ([#90028](openclaw/openclaw#90028), [#90250](openclaw/openclaw#90250)) Thanks [@fuller-stack-dev](https://github.com/fuller-stack-dev). - Release/process: switch release trains to `YYYY.M.PATCH` monthly patch numbering, keep pre-transition tags compatible, and pin the June 2026 floor at `2026.6.5` after the published beta. - Platform maintenance: refresh Android, Swift/macOS, Docker, CodeQL, Buildx, Docker build/push, and Codex Action dependencies for this release train. ([#74980](openclaw/openclaw#74980), [#81757](openclaw/openclaw#81757), [#86481](openclaw/openclaw#86481), [#86483](openclaw/openclaw#86483), [#90601](openclaw/openclaw#90601)) - QQBot: add `/bot-group-allways on|off` slash command (with named-account and default-account support) to toggle whether group messages require an `@mention` before the bot replies, and clear the runtime config snapshot after the write so the new account-level `defaultRequireMention` takes effect immediately without restart. ([#91423](openclaw/openclaw#91423)) Thanks [@cxyhhhhh](https://github.com/cxyhhhhh). ##### Fixes - Channel content boundaries: QQBot now strips reasoning/thinking tags before sending, preserving final answers while hiding internal model narration from users. ([#89913](openclaw/openclaw#89913), [#90132](openclaw/openclaw#90132)) Thanks [@openperf](https://github.com/openperf). - Agents/MCP/providers: coerce non-text/image MCP tool-result blocks before they reach provider converters, preserving valid images and turning richer MCP content into text instead of malformed image blocks. ([#90710](openclaw/openclaw#90710), [#90728](openclaw/openclaw#90728)) Thanks [@RanSHammer](https://github.com/RanSHammer) and [@849261680](https://github.com/849261680). - Anthropic/Codex/ACP/agent recovery: defer Anthropic stream start events until `message_start`, strip stale compaction thinking signatures before Anthropic replay, detect unsigned thinking-only stalls, refresh prompt fences after compaction writes, reject empty completion handoffs, preserve parent streaming-off overrides/shared progress commentary, forward heartbeat metadata to context-engine hooks, and cover Codex session/thread migration edge cases. ([#90667](openclaw/openclaw#90667), [#90697](openclaw/openclaw#90697), [#90163](openclaw/openclaw#90163), [#90108](openclaw/openclaw#90108), [#89874](openclaw/openclaw#89874), [#89505](openclaw/openclaw#89505), [#90632](openclaw/openclaw#90632), [#89302](openclaw/openclaw#89302), [#90729](openclaw/openclaw#90729), [#90317](openclaw/openclaw#90317), [#90319](openclaw/openclaw#90319)) Thanks [@openperf](https://github.com/openperf), [@100yenadmin](https://github.com/100yenadmin), and [@ooiuuii](https://github.com/ooiuuii). - Provider/model resolution: preserve Google Vertex ADC auth markers in generated catalogs, re-probe a single-provider primary after cooldown, share Codex model visibility, fail closed for unknown model auth, preserve Codex alias availability, keep unresolved profile refs unknown, and avoid resolving auth while listing models. ([#90506](openclaw/openclaw#90506), [#90609](openclaw/openclaw#90609), [#90717](openclaw/openclaw#90717), [#90702](openclaw/openclaw#90702)) Thanks [@849261680](https://github.com/849261680). - Gateway/macOS/mobile: avoid duplicate Gateway probe warnings by identity, rate-limit node pairing requests while preserving paired-node reconnects, keep macOS node mode on a healthy direct Gateway session, keep iOS diagnostics and gateway rows reachable, and avoid Linux ARM Gradle resource tasks during Android builds. ([#85791](openclaw/openclaw#85791), [#90147](openclaw/openclaw#90147), [#90668](openclaw/openclaw#90668), [#90815](openclaw/openclaw#90815)) Thanks [@giodl73-repo](https://github.com/giodl73-repo) and [@vrurg](https://github.com/vrurg). - TUI/chat/Workboard/auto-reply: optimistic user messages stay stable across stale history reloads, runId reassignment, and abort windows instead of disappearing, jumping, or lingering as ghost rows; Workboard stale lifecycle bulk updates no longer overwrite newer status/provenance; message-tool sends now count as delivery. ([#86205](openclaw/openclaw#86205), [#89600](openclaw/openclaw#89600), [#88592](openclaw/openclaw#88592), [#90123](openclaw/openclaw#90123)) Thanks [@RomneyDa](https://github.com/RomneyDa). - Cron/update/service env: doctor config preflight now migrates legacy cron JSON stores into SQLite before runtime reads, service env planning skips unresolved placeholders that would mask state-dir `.env` values, and session transcript rewrites keep registry markers/discriminants consistent. ([#90072](openclaw/openclaw#90072), [#90208](openclaw/openclaw#90208), [#90277](openclaw/openclaw#90277), [#90488](openclaw/openclaw#90488)) Thanks [@MonkeyLeeT](https://github.com/MonkeyLeeT) and [@sallyom](https://github.com/sallyom). - Security/config/tooling: guard MCP HTTP redirects, protect global agent config defaults, and keep release/test/tooling proof failures bounded and explicit. ([#89732](openclaw/openclaw#89732), [#90145](openclaw/openclaw#90145)) - Channels: WhatsApp restarts when per-account config changes, bounds background startup waits, closes failed sockets, and preserves reconnect behavior; Mattermost slash commands keep their state on `globalThis`; Feishu streaming cards preserve full merged content; voice-call tracks Twilio streams after connect; ClickClack reply tools respect `toolsAllow`. ([#87951](openclaw/openclaw#87951), [#87965](openclaw/openclaw#87965), [#90486](openclaw/openclaw#90486), [#68113](openclaw/openclaw#68113), [#90534](openclaw/openclaw#90534), [#90181](openclaw/openclaw#90181), [#90607](openclaw/openclaw#90607), [#89500](openclaw/openclaw#89500)) Thanks [@MukundaKatta](https://github.com/MukundaKatta), [@mcaxtr](https://github.com/mcaxtr), [@infoanton](https://github.com/infoanton), [@mushuiyu886](https://github.com/mushuiyu886), and [@sahibzada-allahyar](https://github.com/sahibzada-allahyar). - Feishu: retry transient send rate-limit errors (HTTP 429, per-chat code 230020, tenant-level code 11232) with linear backoff, including SDK responses that fulfill with rate-limit bodies instead of throwing, and route streaming-card sends through the retry wrapper. ([#89659](openclaw/openclaw#89659)) Thanks [@ladygege](https://github.com/ladygege). - Release/CI/E2E: main CI guard drift, PR merge diff scoping, live Docker credential staging, base-image qualification, installer Docker classification, Playwright dependency install recovery, API-key auth for Codex live Docker lanes, Parallels option terminators, and JSON-mode progress handling are tighter so release proof fails cleaner. ([#90532](openclaw/openclaw#90532), [#90287](openclaw/openclaw#90287), [#90058](openclaw/openclaw#90058)) Thanks [@RomneyDa](https://github.com/RomneyDa), [@hxy91819](https://github.com/hxy91819), and [@mrunalp](https://github.com/mrunalp). - Release/CI/E2E: Docker E2E and live Docker harness runs now apply default memory, CPU, and process ceilings while preserving explicit per-lane overrides. - Release/CI/E2E: plugin lifecycle matrix resource sampling now fails phases that exceed RSS, wall-clock, or CPU ceilings instead of only logging the measurements. - Release/CI/E2E: Codex npm plugin live assertions now cap transcript discovery and diagnostic log reads so failure proof stays bounded. - Tests/state isolation: QA Lab valid-tool-call metrics now require runtime tool-call evidence when runtime parity data is available instead of counting tool-backed scenario pass status alone. - Tests/state isolation: QA Lab runtime parity now fails planned-only tool-call rows without matching tool results instead of treating matching mock plans as real tool evidence. - Tests/state isolation: provider, media, auth, cron, task, session, sandbox, Gateway, and Codex timeout fixtures now scope more home/state/env data per test, reducing cross-test leakage and making release validation failures less noisy. ([#90027](openclaw/openclaw#90027), [#89974](openclaw/openclaw#89974)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).  Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/963

openclaw-barnacle Bot added extensions: codex size: S triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels Jun 3, 2026

harjothkhara force-pushed the test/codex-89974-timeout-thread-abandonment branch from ee5415e to ed8c9a1 Compare June 3, 2026 21:48

harjothkhara changed the title ~~test(codex): cover thread abandonment after watchdog idle timeout (#89974)~~ test(codex): cover thread abandonment after completion-idle timeout Jun 3, 2026

harjothkhara mentioned this pull request Jun 3, 2026

Codex app-server turn idle timeout is surfaced as user interruption #89974

Open

clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 3, 2026

openclaw-barnacle Bot added triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. and removed triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels Jun 3, 2026

clawsweeper Bot added the P3 Low-priority cleanup, docs, polish, ergonomics, or speculative work. label Jun 3, 2026

harjothkhara changed the title ~~test(codex): cover thread abandonment after completion-idle timeout~~ test(codex): pin completion-idle timeout thread reset Jun 4, 2026

harjothkhara force-pushed the test/codex-89974-timeout-thread-abandonment branch from ed8c9a1 to 2a6bada Compare June 4, 2026 05:28

kevinslin merged commit e5d1fad into openclaw:main Jun 6, 2026
150 of 151 checks passed

Haderach-Ram mentioned this pull request Jun 6, 2026

Ecosystem Digest — 2026-06-06 Haderach-Ram/openclaw-radar#30

Open

github-actions Bot mentioned this pull request Jun 6, 2026

📡 Upstream Digest — 2026-06-06 02:27 UTC curtismercier/openclaw-mods#1022

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(codex): pin completion-idle timeout thread reset#90027

test(codex): pin completion-idle timeout thread reset#90027
kevinslin merged 1 commit into
openclaw:mainfrom
harjothkhara:test/codex-89974-timeout-thread-abandonment

harjothkhara commented Jun 3, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

harjothkhara commented Jun 3, 2026 •

edited

Loading

Uh oh!

harjothkhara commented Jun 4, 2026 •

edited

Loading

Uh oh!

harjothkhara commented Jun 5, 2026

Uh oh!

kevinslin commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

harjothkhara commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Collision and current state

Proof and merge path

Validation

Uh oh!

clawsweeper Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

harjothkhara commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

harjothkhara commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

harjothkhara commented Jun 5, 2026

Uh oh!

kevinslin commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

harjothkhara commented Jun 3, 2026 •

edited

Loading

clawsweeper Bot commented Jun 3, 2026 •

edited

Loading

harjothkhara commented Jun 3, 2026 •

edited

Loading

harjothkhara commented Jun 4, 2026 •

edited

Loading