Skip to content

[codex] keep long app-server turns alive through progress#78757

Closed
jonathangu wants to merge 97 commits intoopenclaw:mainfrom
jonathangu:guclaw/gpt55-codex-20260506
Closed

[codex] keep long app-server turns alive through progress#78757
jonathangu wants to merge 97 commits intoopenclaw:mainfrom
jonathangu:guclaw/gpt55-codex-20260506

Conversation

@jonathangu
Copy link
Copy Markdown

Fixes #78756.

Summary

This PR makes Codex app-server turns progress-aware instead of letting the per-request timeout kill an already-started turn after roughly 60 seconds.

The user-facing failure this addresses is a generic channel error after a Codex GPT-5.5 turn has already made progress or sent a visible update:

Something went wrong while processing your request. Please try again, or use /new to start a fresh session.

Root Cause

runCodexAppServerAttempt used the app-server request timeout as a hard timeout after turn/start succeeded. That means a turn could be accepted, perform useful work, then still be marked failed before turn/completed arrived.

The progress clock was also too broad: low-signal account/rate-limit notifications were treated like turn progress, while some useful long-turn phases still had only the short completion wait. Separately, the embedded runner could emit a generic timeout payload even after a messaging tool had already delivered a user-visible update.

Changes

  • Separate the accepted-turn hard timeout from the app-server request timeout.
  • Keep accepted Codex turns alive up to the terminal idle cap when meaningful progress is observed.
  • Increase the post-progress completion idle window from 60s to 5m.
  • Keep the terminal hard cap at 30m so wedged turns still stop.
  • Only refresh turn activity for meaningful current-turn notifications and server requests.
  • Exclude account/* notifications from progress accounting.
  • Suppress a generic timeout payload when didSendViaMessagingTool is already true.
  • Add focused regressions for the long-turn lifecycle and messaging-tool timeout suppression.

Validation

  • pnpm exec vitest run extensions/codex/src/app-server/run-attempt.test.ts → 60 passed
  • pnpm exec vitest run src/agents/pi-embedded-runner/run.overflow-compaction.loop.test.ts → 42 passed
  • pnpm tsgo:prod → passed
  • pnpm tsgo:test → passed
  • pnpm build → passed

Notes

This does not make stuck turns immortal. The intended policy is: extend while the current turn is doing real work, ignore low-signal account/status chatter for liveness, and keep a separate hard cap for genuinely wedged app-server turns.

steipete and others added 30 commits May 5, 2026 02:42
Normalize WhatsApp onboarding allowlist entries to digit-only WhatsApp IDs and reject invalid owner-phone inputs during prompt validation.

(cherry picked from commit 68a500c)
* fix(telegram): reuse preview for long text finals

* test(qa): cover long telegram finals

* fix(qa): satisfy extension lint

* fix(qa): keep telegram long final fixture to two chunks

* test(telegram): cover three chunk finals

* fix(telegram): force long final preview boundary

(cherry picked from commit e03fe1e)
Bind the default loopback gateway listener only to `127.0.0.1` on Windows so libuv dual-stack `::1` behavior cannot wedge localhost HTTP requests.

Also keeps non-Windows dual-loopback behavior covered, replaces the redundant Windows passthrough test with guard coverage, and adds the required changelog entry.

Fixes openclaw#69674.

Tests:
- pnpm exec oxfmt --check --threads=1 CHANGELOG.md src/gateway/net.ts src/gateway/net.test.ts
- pnpm test src/gateway/net.test.ts
- pnpm check:changed
- GitHub required checks: green

Thanks @SARAMALI15792.

Co-authored-by: saram ali <140950904+SARAMALI15792@users.noreply.github.com>
Co-authored-by: Brad Groux <3053586+BradGroux@users.noreply.github.com>
(cherry picked from commit 978bc53)
…isted] (openclaw#74161)

Summary:
- The PR updates agents skill prompt guidance to require exact `<location>` paths for single- and multi-skill selection, adds prompt assertions, and records the fix in the changelog.
- Reproducibility: yes. Static source reproduction is enough: current main lacks the exact-`<location>` guard  ... illsSection()`, while the PR diff adds it to both selection branches and asserts the resulting prompt text.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix: enforce exact skill paths for all skill matches

Validation:
- ClawSweeper review passed for head 743c984.
- Required merge gates passed before the squash merge.

Prepared head SHA: 743c984
Review: openclaw#74161 (comment)

Co-authored-by: tianguicheng <tianguicheng@xiaomi.com>
Co-authored-by: sallyom <somalley@redhat.com>
(cherry picked from commit c739088)
Accept drive-absolute Windows sandbox Docker bind sources in config and runtime validation while keeping blocked-path and allowed-root comparisons case-insensitive for Windows drive paths.

Also remove a stale WhatsApp setup import that blocked extension lint after the rebase.

Co-authored-by: 6607changchun <84566142+6607changchun@users.noreply.github.com>
Co-authored-by: Brad Groux <3053586+BradGroux@users.noreply.github.com>
(cherry picked from commit d02fbc6)
Adds cap_drop and no-new-privileges hardening for the bundled gateway Docker Compose services.\n\nThanks @VintageAyu.

(cherry picked from commit f9da484)
…penclaw#77280)

Merged via squash.

Prepared head SHA: f4188b4
Co-authored-by: openperf <80630709+openperf@users.noreply.github.com>
Co-authored-by: openperf <80630709+openperf@users.noreply.github.com>
Reviewed-by: @openperf

(cherry picked from commit 31da1fe)
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 7, 2026

Codex review: needs real behavior proof before merge.

Summary
The PR changes Codex app-server timeout/progress accounting and embedded-runner timeout fallback behavior, while also carrying broad release, workflow, docs, plugin, TUI, and unrelated runtime edits.

Reproducibility: yes. from source inspection, though I did not run a live Codex turn: current main arms a 60s completion idle timeout and a request-timeout hard abort after turn/start, while every notification refreshes activity. A focused fake app-server harness can reproduce by accepting turn/start, emitting meaningful progress, and delaying turn/completed past 60 seconds.

Real behavior proof
Needs real behavior proof before merge: The PR body only lists tests/typechecks/build output, and the visible comments contain no redacted live, terminal, log, recording, or linked-artifact proof of the fixed long-turn behavior; add proof to the PR body, then ClawSweeper should re-review automatically or a maintainer can comment @clawsweeper re-review.

Next step before merge
Human review is needed because this external draft PR is dirty/unmergeable, missing required real behavior proof, and changes release/workflow surfaces outside the bug boundary.

Security
Needs attention: The external bugfix branch includes unrelated release/publish workflow and build/package changes that affect supply-chain paths.

Review findings

  • [P2] Remove the stale release changelog rewrite — CHANGELOG.md:5
  • [P2] Remove unrelated release workflow changes — .github/workflows/openclaw-release-publish.yml:169
Review details

Best possible solution:

Land a narrow Codex app-server and embedded-runner fix with focused regression tests, a minimal current changelog entry if user-facing, unchanged release/workflow state, and redacted live or terminal proof of a long accepted turn continuing past 60 seconds without a generic timeout response.

Do we have a high-confidence way to reproduce the issue?

Yes from source inspection, though I did not run a live Codex turn: current main arms a 60s completion idle timeout and a request-timeout hard abort after turn/start, while every notification refreshes activity. A focused fake app-server harness can reproduce by accepting turn/start, emitting meaningful progress, and delaying turn/completed past 60 seconds.

Is this the best way to solve the issue?

No, not as submitted: the timeout/progress direction is reasonable, but the branch must be narrowed and rebased because it currently includes unrelated release/workflow/changelog churn and lacks real behavior proof. The safer path is a small Codex-only PR plus proof from a real long turn.

Full review comments:

  • [P2] Remove the stale release changelog rewrite — CHANGELOG.md:5
    This branch changes the active changelog header from Unreleased to a historical 2026.5.6 section and carries release-note churn unrelated to Codex timeouts. Merging it would overwrite current main's active unreleased entries, so keep the bugfix branch rebased and add only a narrow current changelog line if needed.
    Confidence: 0.9
  • [P2] Remove unrelated release workflow changes — .github/workflows/openclaw-release-publish.yml:169
    The Codex timeout fix should not alter release publishing order or child-workflow dispatch behavior. These workflow edits affect package publication and need separate release-owner review, so leave this file unchanged in the bugfix PR.
    Confidence: 0.86

Overall correctness: patch is incorrect
Overall confidence: 0.86

Security concerns:

  • [medium] Unrelated release workflow changes — .github/workflows/openclaw-release-publish.yml:169
    The PR changes .github/workflows/openclaw-release-publish.yml despite the stated Codex timeout scope. Release publish orchestration controls package publication, so this needs removal from the bugfix PR or separate release-owner review.
    Confidence: 0.88

What I checked:

  • Current main uses a 60s Codex completion idle window: CODEX_TURN_COMPLETION_IDLE_TIMEOUT_MS is still 60_000 on current main, matching the reported short post-progress timeout path. (extensions/codex/src/app-server/run-attempt.ts:120, 58fa23b4a2f2)
  • Current main treats every notification as progress: handleNotification calls touchTurnCompletionActivity before filtering by notification method, so account/rate-limit notifications currently refresh the same activity clock as turn/item progress. (extensions/codex/src/app-server/run-attempt.ts:895, 58fa23b4a2f2)
  • Current main keeps the accepted-turn hard timeout tied to request timeout: After turn/start, current main still schedules a hard timeout with Math.max(100, params.timeoutMs), which can abort an accepted long turn before turn/completed. (extensions/codex/src/app-server/run-attempt.ts:1201, 58fa23b4a2f2)
  • PR state is not merge-ready: GitHub API reports draft: true, mergeable: false, mergeable_state: dirty, 97 commits, 414 changed files, 11801 additions, and 1300 deletions. (494c0b440763)
  • Real behavior proof is missing: The PR body lists unit tests, typechecks, and build output, and the only visible PR comment is a ClawSweeper placeholder; there is no redacted live/terminal/log proof of a real long Codex app-server turn after the fix. (494c0b440763)
  • Branch scope is much broader than the Codex timeout fix: The provided PR file list includes release workflows, plugin publishing, package acceptance, docker-compose, version metadata, changelog, docs, plugin update code, TUI, video generation, and Codex app-server files. (494c0b440763)

Likely related people:

  • vincentkoc: Local blame for the relevant current-main Codex app-server timeout/progress lines and embedded-runner timeout fallback lines points to Vincent Koc in the available grafted history. (role: recent maintainer; confidence: medium; commits: 6587832f25; files: extensions/codex/src/app-server/run-attempt.ts, src/agents/pi-embedded-runner/run.ts)
  • steipete: Provided PR history shows repeated recent Codex, release, changelog, and workflow maintenance commits by steipete across the same broad surfaces this dirty branch touches. (role: adjacent maintainer; confidence: medium; commits: c03449678e95, 23319a3cc2b4, 2fc80754cf4f; files: extensions/codex/src/app-server/run-attempt.ts, src/agents/pi-embedded-runner/run.ts, CHANGELOG.md)

Remaining risk / open question:

  • The PR branch is draft and dirty/unmergeable, so review of the intended Codex fix is mixed with stale branch drift.
  • The external PR changes release and publish workflows, which are security-sensitive supply-chain paths unrelated to the stated timeout bug.
  • No real after-fix behavior proof shows a long Codex app-server turn progressing past 60 seconds without the generic channel failure.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 58fa23b4a2f2.

Re-review progress:

@hclsys
Copy link
Copy Markdown
Contributor

hclsys commented May 7, 2026

Independently confirmed on current main: the app-server wrapper's turn/completed idle window uses a fixed 60s clock with no reset on turn/item/agentMessage/delta or hook completions, and account/rateLimits/updated counts as activity. The separation of request-timeout vs. turn-progress idle window in this PR looks correct. Supportive.

@obviyus
Copy link
Copy Markdown
Contributor

obviyus commented May 9, 2026

Thanks for the PR. This branch buries the Codex timeout fix in unrelated release/plugin churn, so it is not reviewable as a focused Telegram PR. Please reopen as a narrow PR with only the intended fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling app: android App: android app: ios App: ios app: macos App: macos app: web-ui App: web-ui channel: bluebubbles Channel integration: bluebubbles channel: discord Channel integration: discord channel: feishu Channel integration: feishu channel: googlechat Channel integration: googlechat channel: imessage Channel integration: imessage channel: irc channel: line Channel integration: line channel: matrix Channel integration: matrix channel: mattermost Channel integration: mattermost channel: msteams Channel integration: msteams channel: nextcloud-talk Channel integration: nextcloud-talk channel: nostr Channel integration: nostr channel: qa-channel Channel integration: qa-channel channel: qqbot channel: signal Channel integration: signal channel: slack Channel integration: slack channel: synology-chat channel: telegram Channel integration: telegram channel: tlon Channel integration: tlon channel: twitch Channel integration: twitch channel: voice-call Channel integration: voice-call channel: whatsapp-web Channel integration: whatsapp-web channel: zalo Channel integration: zalo channel: zalouser Channel integration: zalouser cli CLI command changes commands Command implementations docker Docker and sandbox tooling docs Improvements or additions to documentation extensions: acpx extensions: anthropic extensions: arcee extensions: byteplus extensions: cerebras extensions: cloudflare-ai-gateway extensions: codex extensions: copilot-proxy Extension: copilot-proxy extensions: deepinfra extensions: deepseek extensions: diagnostics-otel Extension: diagnostics-otel extensions: diagnostics-prometheus extensions: duckduckgo extensions: fal extensions: gradium extensions: huggingface extensions: inworld Extension: inworld extensions: kilocode extensions: kimi-coding extensions: litellm extensions: llm-task Extension: llm-task extensions: lmstudio extensions: lobster Extension: lobster extensions: memory-core Extension: memory-core extensions: memory-lancedb Extension: memory-lancedb extensions: memory-wiki extensions: minimax extensions: moonshot extensions: nvidia extensions: open-prose Extension: open-prose extensions: openai extensions: qa-lab extensions: qianfan extensions: senseaudio extensions: stepfun extensions: synthetic extensions: tavily extensions: tencent extensions: together extensions: tokenjuice Changes to the bundled tokenjuice extension extensions: tts-local-cli extensions: venice extensions: vercel-ai-gateway extensions: volcengine extensions: webhooks extensions: xiaomi gateway Gateway runtime plugin: azure-speech Azure Speech plugin plugin: bonjour Plugin integration: bonjour plugin: file-transfer plugin: google-meet plugin: migrate-claude plugin: migrate-hermes scripts Repository scripts size: XL triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Codex app-server turns time out after 60s despite meaningful progress