test(codex): pin completion-idle timeout thread reset#90027
Conversation
|
Codex review: needs real behavior proof before merge. Reviewed June 4, 2026, 1:35 AM ET / 05:35 UTC. Summary PR surface: Tests +53. Total +53 across 1 file. Reproducibility: no. high-confidence live current-main reproduction was established. The source path is clear and the proposed test exercises the fixed timeout-to-fresh-thread behavior, but the contributor supplied only code/test validation. Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Merge the test after a maintainer proof override or redacted live evidence, while keeping the runtime behavior from the merged completion-stall recovery work as the implementation. Do we have a high-confidence way to reproduce the issue? No high-confidence live current-main reproduction was established. The source path is clear and the proposed test exercises the fixed timeout-to-fresh-thread behavior, but the contributor supplied only code/test validation. Is this the best way to solve the issue? Yes for this PR's scope. A focused regression test in the Codex app-server turn-watch suite is the narrowest maintainable way to pin the already-fixed thread reset without changing runtime behavior. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 5a10f46c56a0. Label changesLabel justifications:
Evidence reviewedPR surface: Tests +53. Total +53 across 1 file. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
ee5415e to
ed8c9a1
Compare
|
This PR is intentionally test-only: it adds regression coverage for #89974 and does not change runtime behavior. The behavior under test appears to have already been fixed by #87781. This PR only pins the regression path: after Because this is a test-only external PR, I cannot honestly provide new live runtime "real behavior proof" beyond the code/test verification already listed. The red "Real behavior proof" check appears to require maintainer judgment or a |
|
Merge-state note: this PR is mergeable with no conflicts. GitHub reports This PR is intentionally test-only. It pins the #89974 regression fixed by #87781, so there is no live runtime behavior proof to add. A |
Regression coverage for openclaw#89974. Confirms that after a turn_completion_idle_timeout, OpenClaw clears the timed-out Codex app-server thread binding and the next turn starts a fresh thread instead of resuming the thread that may hold Codex's generic <turn_aborted> / user-interrupted marker. No runtime behavior changes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ed8c9a1 to
2a6bada
Compare
|
On the Real behavior proof gate This PR is intentionally test-only ( Because there's no new runtime code here, there's no new runtime behavior to capture live; a "live repro" would only re-demonstrate #87781's merged fix. The test drives the real Per the proof policy this looks like a fit for a maintainer |
|
lgtm. thanks for the coverage |
…penclaw#90027) Regression coverage for openclaw#89974. Confirms that after a turn_completion_idle_timeout, OpenClaw clears the timed-out Codex app-server thread binding and the next turn starts a fresh thread instead of resuming the thread that may hold Codex's generic <turn_aborted> / user-interrupted marker. No runtime behavior changes. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…penclaw#90027) Regression coverage for openclaw#89974. Confirms that after a turn_completion_idle_timeout, OpenClaw clears the timed-out Codex app-server thread binding and the next turn starts a fresh thread instead of resuming the thread that may hold Codex's generic <turn_aborted> / user-interrupted marker. No runtime behavior changes. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…penclaw#90027) Regression coverage for openclaw#89974. Confirms that after a turn_completion_idle_timeout, OpenClaw clears the timed-out Codex app-server thread binding and the next turn starts a fresh thread instead of resuming the thread that may hold Codex's generic <turn_aborted> / user-interrupted marker. No runtime behavior changes. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…26.6.5) (#963) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [ghcr.io/openclaw/openclaw](https://openclaw.ai) ([source](https://github.com/openclaw/openclaw)) | patch | `2026.6.1` → `2026.6.5` | --- ### Release Notes <details> <summary>openclaw/openclaw (ghcr.io/openclaw/openclaw)</summary> ### [`v2026.6.5`](https://github.com/openclaw/openclaw/blob/HEAD/CHANGELOG.md#202665) [Compare Source](openclaw/openclaw@v2026.6.1...v2026.6.5) ##### Highlights - QQBot now strips model reasoning/thinking scaffolding before native delivery, preventing raw `<thinking>` content from leaking into channel replies. ([#​89913](openclaw/openclaw#89913), [#​90132](openclaw/openclaw#90132)) Thanks [@​openperf](https://github.com/openperf). - MCP tool results now coerce `resource_link`, `resource`, `audio`, malformed image, and future non-text/image blocks at the materialize boundary, preventing Anthropic 400s and poisoned session history after a tool returns richer MCP content. ([#​90710](openclaw/openclaw#90710), [#​90728](openclaw/openclaw#90728)) Thanks [@​RanSHammer](https://github.com/RanSHammer) and [@​849261680](https://github.com/849261680). - Anthropic extended-thinking sessions recover after prompt-cache expiry or Gateway restart because stream start events wait for `message_start`, letting pre-generation signature errors trigger the existing recovery retry. ([#​90667](openclaw/openclaw#90667), [#​90697](openclaw/openclaw#90697)) Thanks [@​openperf](https://github.com/openperf). - Parallel is now a bundled `web_search` provider with `PARALLEL_API_KEY` discovery, guarded endpoint handling, cache-safe session ids, onboarding picker support, and docs. ([#​85158](openclaw/openclaw#85158)) Thanks [@​NormallyGaussian](https://github.com/NormallyGaussian). - Google Vertex ADC users get static catalog rows and runtime model resolution again, while single-provider cooldown recovery and memory adapter status checks are more reliable. ([#​90506](openclaw/openclaw#90506), [#​90609](openclaw/openclaw#90609), [#​90717](openclaw/openclaw#90717), [#​90816](openclaw/openclaw#90816)) Thanks [@​849261680](https://github.com/849261680). - Matrix can preflight voice notes before mention gating, preserve thread reads/replies through Matrix relations pagination, and carry QA coverage for voice and thread flows. ([#​78016](openclaw/openclaw#78016), [#​90415](openclaw/openclaw#90415)) - Auth and plugin install state is more durable: auth profiles now live in SQLite, official npm plugin install records keep their trusted pins, and prerelease fallback integrity checks avoid carrying stale integrity forward. ([#​89102](openclaw/openclaw#89102), [#​88585](openclaw/openclaw#88585)) - macOS node mode no longer silently self-reconnects away from a healthy direct Gateway session, reducing unexpected companion app session churn. ([#​90668](openclaw/openclaw#90668), [#​90815](openclaw/openclaw#90815)) Thanks [@​vrurg](https://github.com/vrurg). - Upgrade and service paths are safer: cron legacy JSON stores migrate during doctor preflight, service env placeholders no longer mask state-dir secrets, WhatsApp startup waits are bounded, and disabled WhatsApp accounts tear down on config reload. ([#​90072](openclaw/openclaw#90072), [#​90208](openclaw/openclaw#90208), [#​90277](openclaw/openclaw#90277), [#​90488](openclaw/openclaw#90488), [#​90486](openclaw/openclaw#90486), [#​87951](openclaw/openclaw#87951), [#​87965](openclaw/openclaw#87965)) Thanks [@​MonkeyLeeT](https://github.com/MonkeyLeeT), [@​sallyom](https://github.com/sallyom), [@​mcaxtr](https://github.com/mcaxtr), and [@​MukundaKatta](https://github.com/MukundaKatta). ##### Changes - Search/providers: add the Parallel bundled web-search plugin, live provider tests, registration contracts, onboarding/docs wiring, and guarded `api.parallel.ai/v1/search` support. ([#​85158](openclaw/openclaw#85158)) Thanks [@​NormallyGaussian](https://github.com/NormallyGaussian). - Matrix/channels: add voice-message preflight and thread-aware read/reply behavior, including Matrix QA scenario wiring and docs for voice-message behavior. ([#​78016](openclaw/openclaw#78016), [#​90415](openclaw/openclaw#90415)) - Skills/ClawHub: install ClawHub skills backed by GitHub repositories through the resolved install API, download the pinned GitHub commit, keep install-policy checks, and report install telemetry after success. ([#​90478](openclaw/openclaw#90478)) Thanks [@​Patrick-Erichsen](https://github.com/Patrick-Erichsen). - Google Chat/channels: add native approval card actions and click handling so Google Chat approvals use platform-native cards instead of generic message flow. - Mobile: Android provider/model screens now surface expiring, unavailable, unresolved, and attention states more clearly, while iOS settings and Talk tabs keep diagnostics, gateway rows, attachment labels, and unavailable Talk controls reachable. - Memory: QMD search can use the new rerank toggle, and memory adapter status uses the resolved default model identity when checking plain status. ([#​61834](openclaw/openclaw#61834)) - Docs/tooling: add Parallel search docs, refresh weather-skill guidance toward `web_fetch`, clarify legacy `openai-codex` auth, document release/test helper scripts, and tighten changed-test routing docs for CI/debugging work. ([#​90028](openclaw/openclaw#90028), [#​90250](openclaw/openclaw#90250)) Thanks [@​fuller-stack-dev](https://github.com/fuller-stack-dev). - Release/process: switch release trains to `YYYY.M.PATCH` monthly patch numbering, keep pre-transition tags compatible, and pin the June 2026 floor at `2026.6.5` after the published beta. - Platform maintenance: refresh Android, Swift/macOS, Docker, CodeQL, Buildx, Docker build/push, and Codex Action dependencies for this release train. ([#​74980](openclaw/openclaw#74980), [#​81757](openclaw/openclaw#81757), [#​86481](openclaw/openclaw#86481), [#​86483](openclaw/openclaw#86483), [#​90601](openclaw/openclaw#90601)) - QQBot: add `/bot-group-allways on|off` slash command (with named-account and default-account support) to toggle whether group messages require an `@mention` before the bot replies, and clear the runtime config snapshot after the write so the new account-level `defaultRequireMention` takes effect immediately without restart. ([#​91423](openclaw/openclaw#91423)) Thanks [@​cxyhhhhh](https://github.com/cxyhhhhh). ##### Fixes - Channel content boundaries: QQBot now strips reasoning/thinking tags before sending, preserving final answers while hiding internal model narration from users. ([#​89913](openclaw/openclaw#89913), [#​90132](openclaw/openclaw#90132)) Thanks [@​openperf](https://github.com/openperf). - Agents/MCP/providers: coerce non-text/image MCP tool-result blocks before they reach provider converters, preserving valid images and turning richer MCP content into text instead of malformed image blocks. ([#​90710](openclaw/openclaw#90710), [#​90728](openclaw/openclaw#90728)) Thanks [@​RanSHammer](https://github.com/RanSHammer) and [@​849261680](https://github.com/849261680). - Anthropic/Codex/ACP/agent recovery: defer Anthropic stream start events until `message_start`, strip stale compaction thinking signatures before Anthropic replay, detect unsigned thinking-only stalls, refresh prompt fences after compaction writes, reject empty completion handoffs, preserve parent streaming-off overrides/shared progress commentary, forward heartbeat metadata to context-engine hooks, and cover Codex session/thread migration edge cases. ([#​90667](openclaw/openclaw#90667), [#​90697](openclaw/openclaw#90697), [#​90163](openclaw/openclaw#90163), [#​90108](openclaw/openclaw#90108), [#​89874](openclaw/openclaw#89874), [#​89505](openclaw/openclaw#89505), [#​90632](openclaw/openclaw#90632), [#​89302](openclaw/openclaw#89302), [#​90729](openclaw/openclaw#90729), [#​90317](openclaw/openclaw#90317), [#​90319](openclaw/openclaw#90319)) Thanks [@​openperf](https://github.com/openperf), [@​100yenadmin](https://github.com/100yenadmin), and [@​ooiuuii](https://github.com/ooiuuii). - Provider/model resolution: preserve Google Vertex ADC auth markers in generated catalogs, re-probe a single-provider primary after cooldown, share Codex model visibility, fail closed for unknown model auth, preserve Codex alias availability, keep unresolved profile refs unknown, and avoid resolving auth while listing models. ([#​90506](openclaw/openclaw#90506), [#​90609](openclaw/openclaw#90609), [#​90717](openclaw/openclaw#90717), [#​90702](openclaw/openclaw#90702)) Thanks [@​849261680](https://github.com/849261680). - Gateway/macOS/mobile: avoid duplicate Gateway probe warnings by identity, rate-limit node pairing requests while preserving paired-node reconnects, keep macOS node mode on a healthy direct Gateway session, keep iOS diagnostics and gateway rows reachable, and avoid Linux ARM Gradle resource tasks during Android builds. ([#​85791](openclaw/openclaw#85791), [#​90147](openclaw/openclaw#90147), [#​90668](openclaw/openclaw#90668), [#​90815](openclaw/openclaw#90815)) Thanks [@​giodl73-repo](https://github.com/giodl73-repo) and [@​vrurg](https://github.com/vrurg). - TUI/chat/Workboard/auto-reply: optimistic user messages stay stable across stale history reloads, runId reassignment, and abort windows instead of disappearing, jumping, or lingering as ghost rows; Workboard stale lifecycle bulk updates no longer overwrite newer status/provenance; message-tool sends now count as delivery. ([#​86205](openclaw/openclaw#86205), [#​89600](openclaw/openclaw#89600), [#​88592](openclaw/openclaw#88592), [#​90123](openclaw/openclaw#90123)) Thanks [@​RomneyDa](https://github.com/RomneyDa). - Cron/update/service env: doctor config preflight now migrates legacy cron JSON stores into SQLite before runtime reads, service env planning skips unresolved placeholders that would mask state-dir `.env` values, and session transcript rewrites keep registry markers/discriminants consistent. ([#​90072](openclaw/openclaw#90072), [#​90208](openclaw/openclaw#90208), [#​90277](openclaw/openclaw#90277), [#​90488](openclaw/openclaw#90488)) Thanks [@​MonkeyLeeT](https://github.com/MonkeyLeeT) and [@​sallyom](https://github.com/sallyom). - Security/config/tooling: guard MCP HTTP redirects, protect global agent config defaults, and keep release/test/tooling proof failures bounded and explicit. ([#​89732](openclaw/openclaw#89732), [#​90145](openclaw/openclaw#90145)) - Channels: WhatsApp restarts when per-account config changes, bounds background startup waits, closes failed sockets, and preserves reconnect behavior; Mattermost slash commands keep their state on `globalThis`; Feishu streaming cards preserve full merged content; voice-call tracks Twilio streams after connect; ClickClack reply tools respect `toolsAllow`. ([#​87951](openclaw/openclaw#87951), [#​87965](openclaw/openclaw#87965), [#​90486](openclaw/openclaw#90486), [#​68113](openclaw/openclaw#68113), [#​90534](openclaw/openclaw#90534), [#​90181](openclaw/openclaw#90181), [#​90607](openclaw/openclaw#90607), [#​89500](openclaw/openclaw#89500)) Thanks [@​MukundaKatta](https://github.com/MukundaKatta), [@​mcaxtr](https://github.com/mcaxtr), [@​infoanton](https://github.com/infoanton), [@​mushuiyu886](https://github.com/mushuiyu886), and [@​sahibzada-allahyar](https://github.com/sahibzada-allahyar). - Feishu: retry transient send rate-limit errors (HTTP 429, per-chat code 230020, tenant-level code 11232) with linear backoff, including SDK responses that fulfill with rate-limit bodies instead of throwing, and route streaming-card sends through the retry wrapper. ([#​89659](openclaw/openclaw#89659)) Thanks [@​ladygege](https://github.com/ladygege). - Release/CI/E2E: main CI guard drift, PR merge diff scoping, live Docker credential staging, base-image qualification, installer Docker classification, Playwright dependency install recovery, API-key auth for Codex live Docker lanes, Parallels option terminators, and JSON-mode progress handling are tighter so release proof fails cleaner. ([#​90532](openclaw/openclaw#90532), [#​90287](openclaw/openclaw#90287), [#​90058](openclaw/openclaw#90058)) Thanks [@​RomneyDa](https://github.com/RomneyDa), [@​hxy91819](https://github.com/hxy91819), and [@​mrunalp](https://github.com/mrunalp). - Release/CI/E2E: Docker E2E and live Docker harness runs now apply default memory, CPU, and process ceilings while preserving explicit per-lane overrides. - Release/CI/E2E: plugin lifecycle matrix resource sampling now fails phases that exceed RSS, wall-clock, or CPU ceilings instead of only logging the measurements. - Release/CI/E2E: Codex npm plugin live assertions now cap transcript discovery and diagnostic log reads so failure proof stays bounded. - Tests/state isolation: QA Lab valid-tool-call metrics now require runtime tool-call evidence when runtime parity data is available instead of counting tool-backed scenario pass status alone. - Tests/state isolation: QA Lab runtime parity now fails planned-only tool-call rows without matching tool results instead of treating matching mock plans as real tool evidence. - Tests/state isolation: provider, media, auth, cron, task, session, sandbox, Gateway, and Codex timeout fixtures now scope more home/state/env data per test, reducing cross-test leakage and making release validation failures less noisy. ([#​90027](openclaw/openclaw#90027), [#​89974](openclaw/openclaw#89974)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xMDEuMSIsInVwZGF0ZWRJblZlciI6IjQzLjEwMS4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL3BhdGNoIl19--> Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/963
Summary
Regression coverage for issue 89974 ("Codex app-server turn idle timeout is surfaced as user interruption"). No runtime behavior changes.
turn_completion_idle_timeout, OpenClaw clears the timed-out Codex app-server thread binding.thread/start) instead of resuming the thread that may still contain Codex's generic<turn_aborted>/ user-interrupted marker.The "user interrupted the previous turn on purpose" wording is Codex's own
<turn_aborted>rollout marker; OpenClaw cannot change it (turn/interruptcarries no reason), only avoid replaying it. The behavior that abandons the timed-out thread landed in #87781; this test pins it for the exact issue shape.Collision and current state
Proof and merge path
This verifies the behavior at the code/test level; no live end-to-end repro was run. Because the PR is test-only and introduces no new runtime behavior, the remaining proof-policy blocker needs maintainer judgment, a
proof: overridelabel, or separately supplied redacted live logs showing the after-timeout next turn starts fresh.I am not broadening this PR into a runtime classification/copy fix: that would change the scope, require a different proof lane, and is separate from pinning the #87781 thread-reset behavior.
Validation
run-attempt.turn-watches.test.ts: 56 passedrun.codex-app-server-recovery.test.ts: 10 passedgit diff --check: passedRefs issue 89974. Behavior fixed by #87781.