Skip to content

fix(codex): prevent false completion stalls during native streams#87781

Merged
steipete merged 13 commits into
openclaw:mainfrom
keshavbotagent:fix/codex-raw-assistant-midturn-release
May 29, 2026
Merged

fix(codex): prevent false completion stalls during native streams#87781
steipete merged 13 commits into
openclaw:mainfrom
keshavbotagent:fix/codex-raw-assistant-midturn-release

Conversation

@keshavbotagent

@keshavbotagent keshavbotagent commented May 28, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Keep raw assistant progress on the documented turn/completed terminal guard; progress text is still not treated as a final answer.
  • Count forwarded native Codex response stream deltas, including response.custom_tool_call_input.delta, as active turn activity and attempt progress while a turn is already open.
  • Guard unscoped native response.*.delta notifications so they only refresh a turn when the shared app-server client has one active lease; concurrent leased turns no longer keep each other's watchdogs alive.
  • Keep the post-tool raw-assistant fallback timeout conservative at 5 * 60_000 / 300000 ms for app-server/native paths that do not forward stream deltas, and align runtime docs plus plugin metadata.
  • Preserve recovery behavior for actual completion stalls: replay only replay-safe attempts; for side-effecting/non-replay-safe stalls retire the stuck app-server client, clear stale native thread binding only when it still points at the timed-out thread, release the lane, and surface a recoverable incomplete-turn result.
  • After the one replay-safe stdio retry is exhausted, a repeated turn_completion_idle_timeout now falls through to the normal timeout payload path instead of rejecting the embedded run.

Why this shape

The failure is not that raw assistant progress needs to become deliverable final text. The failure is that native Codex can spend a long time actively streaming a large tool input/patch after a raw assistant progress item has completed, before any completed tool-call notification exists. Without counting those response.*.delta events as activity, OpenClaw can falsely fire turn.completion_idle_timeout while the model is still working.

This keeps the lifecycle semantics strict: final delivery still requires turn/completed; active native stream deltas only keep the watchdog from killing a live turn. The latest review fixes also avoid cross-talk for unscoped deltas and keep retry-exhausted completion-idle failures on the timeout-result path.

Real behavior proof

  • Behavior addressed: Codex-backed Telegram turns could be falsely aborted while native Codex was still streaming response deltas for a large tool input/patch after raw assistant progress, producing Codex stopped before confirming the turn was complete. Some work may already have been performed; verify the current state before retrying.
  • Real environment tested: Keshav's live OpenClaw gateway and Telegram group flow on 2026-05-29 IST.
  • Exact steps or command run after this patch: Keshav built and restarted the gateway from this PR branch after the native-delta runtime fix, then reattempted the Vietnam webpage edit flow that had repeatedly failed with the completion-idle watchdog. The latest pushed head 2aa61b24ea is rebased onto origin/main 040eba1cdc; the final commit on top is test-only stabilization after the runtime proof.
  • Evidence after fix: Copied live output from the Telegram operator thread:
2026-05-29 21:46 IST, Keshav G:
I built and restarted server
Then reattempted the vietnam webpage and yay it worked this time without errors

2026-05-29 23:41 IST, Keshav G:
I have build and restarted
Now do the tests, ci and submit latest proof from latest Heads
  • Observed result after fix: The previously repeated Vietnam webpage edit flow completed without the prior turn.completion_idle_timeout / Codex stopped before confirming the turn was complete error after the gateway was built and restarted from the PR branch.
  • What was not tested: The bot did not personally restart the live gateway; Keshav performed the build/restart. Latest GitHub Actions were still running when this proof was added, and local validation is listed below.

Review feedback addressed

Latest head: 2aa61b24ea on origin/main base 040eba1cdc.

  • Plugin config metadata and both Codex harness docs now agree with the runtime default: postToolRawAssistantCompletionIdleTimeoutMs defaults to 300000 ms.
  • Timeout cleanup now clears the persisted Codex app-server binding only if the stored binding still points at the timed-out thread, preserving transient-turn bindings.
  • Unscoped native response deltas now count as activity only when the shared app-server client reports a single active lease; scoped deltas for the active turn still count directly.
  • Retry-exhausted replay-safe stdio turn_completion_idle_timeout failures now surface the timeout payload instead of throwing the prompt error after the one retry has already been used.
  • Added focused regression coverage for binding-preserving cleanup, concurrent-lease unscoped deltas, native response delta activity, and retry-exhausted completion-idle timeout handling.

Tests / validation

  • git diff --check origin/main...HEAD passed on head 2aa61b24ea.
  • Targeted oxfmt --check passed on changed TypeScript/plugin files on head 2aa61b24ea.
  • Focused Vitest passed on head 2aa61b24ea: 6 files / 82 tests.

Focused Vitest command:

./node_modules/.bin/vitest run \
  extensions/codex/src/app-server/attempt-results.test.ts \
  extensions/codex/src/app-server/attempt-timeouts.test.ts \
  extensions/codex/src/app-server/run-attempt.turn-watches.test.ts \
  extensions/codex/src/app-server/session-binding.test.ts \
  src/agents/embedded-agent-runner/run.codex-app-server-recovery.test.ts \
  --reporter=dot

Related issue

@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation agents Agent runtime and tooling extensions: codex size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 28, 2026
@clawsweeper

clawsweeper Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed May 29, 2026, 6:37 PM ET / 22:37 UTC.

Summary
This PR changes Codex app-server turn watching and recovery so native response deltas refresh activity, completion-idle stalls retry or surface safely, matching-thread binding cleanup is used, and Codex docs/metadata/tests reflect the 300000 ms post-tool guard.

PR surface: Source +154, Tests +481, Docs +10. Total +645 across 21 files.

Reproducibility: yes. The related issue and PR discussion give a high-confidence log/live-flow path: Codex-backed Telegram turns reach raw assistant or item completion, then idle waiting for turn/completed and surface the completion-idle fallback; I did not rerun it locally.

Review metrics: 1 noteworthy metric.

  • Config/default surface: 1 changed default surface. The Codex plugin's postToolRawAssistantCompletionIdleTimeoutMs unset behavior now floors to 300000 ms, which affects existing operator timing without a config edit.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🐚 platinum hermit
Result: blocked until stronger real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Post redacted current-head live Telegram or gateway proof for 115f649.
  • [P2] Have a maintainer explicitly accept the longer timeout default and Codex session retry/cleanup behavior before merge.

Proof guidance:

  • [P1] Needs stronger real behavior proof before merge: The PR includes copied live Telegram output, but it names older head 2aa61b2 while current head 115f649 has later runtime commits, so exact-head real behavior proof is still needed; redact private IDs, paths, phone numbers, endpoints, and credentials before posting. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Mantis proof suggestion
A live Telegram lane would materially prove the user-visible Codex-backed completion path that unit tests cannot show. A maintainer can ask Mantis to capture proof by posting a new PR comment that starts with the OpenClaw Mantis account mention, followed by:

telegram live: verify a Codex-backed Telegram turn on head 115f6493 completes a large native tool-input stream without turn_completion_idle_timeout.

Risk before merge

  • [P1] The live proof in the PR body is not for current head 115f649, and later runtime commits changed the completion-idle retry and watchdog semantics.
  • [P2] Changing the unset post-tool raw-assistant timeout to a 300000 ms floor can make existing Codex setups wait longer before surfacing a stuck turn.
  • [P1] Retry, client retirement, and thread-binding cleanup affect Codex session continuity and availability when native app-server turns stall.

Maintainer options:

  1. Prove current head before merge (recommended)
    Rebuild/restart from 115f649 and post redacted live Telegram or gateway output showing the large native response-delta flow completes without the completion-idle timeout.
  2. Accept the longer timeout default
    Maintainers can intentionally accept the 300000 ms unset default after deciding that slower failure surfacing is preferable to false aborts for existing Codex users.
  3. Pause if live proof cannot be produced
    If current-head live proof is not feasible, keep this paused or split to a narrower low-risk diagnostics/recovery change before landing runtime behavior changes.

Next step before merge

  • [P1] Needs human maintainer handling for exact-head live proof and upgrade-sensitive Codex runtime/session semantics; I did not find a narrow code defect for an automated repair PR.

Security
Cleared: No concrete security or supply-chain regression was found; the diff does not add dependencies, secrets handling, workflow permissions, or downloaded code execution.

Review details

Best possible solution:

Land only after current head has redacted live Telegram/gateway proof and maintainers explicitly accept the longer timeout default plus retry/session cleanup semantics; keep narrower unsafe-stall diagnostics in #87793 if still needed.

Do we have a high-confidence way to reproduce the issue?

Yes. The related issue and PR discussion give a high-confidence log/live-flow path: Codex-backed Telegram turns reach raw assistant or item completion, then idle waiting for turn/completed and surface the completion-idle fallback; I did not rerun it locally.

Is this the best way to solve the issue?

Unclear until current-head proof is posted. The implementation follows the right Codex app-server boundary and has focused tests, but the changed default, retry, and session cleanup semantics need maintainer acceptance before this is the best merge path.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 63a06e312dfe.

Label changes

Label justifications:

  • P1: The PR addresses a current Codex-backed Telegram workflow regression that can leave real users with partial side effects and no final answer.
  • merge-risk: 🚨 compatibility: The PR changes default timeout behavior for an existing Codex plugin config field, affecting upgrades and existing unset configurations.
  • merge-risk: 🚨 session-state: The PR changes retry and binding cleanup behavior for Codex native threads, which can affect whether sessions resume or start fresh after stalls.
  • merge-risk: 🚨 availability: The PR changes watchdog, retry, and client-retirement paths that decide whether Codex turns stall, recover, or release the OpenClaw lane.
  • rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🐚 platinum hermit.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: The PR includes copied live Telegram output, but it names older head 2aa61b2 while current head 115f649 has later runtime commits, so exact-head real behavior proof is still needed; redact private IDs, paths, phone numbers, endpoints, and credentials before posting. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
  • mantis: telegram-visible-proof: Mantis should capture Telegram visible proof. The PR changes Telegram-visible Codex completion behavior and is well suited to a short live Telegram proof showing the turn completes instead of timing out.
Evidence reviewed

PR surface:

Source +154, Tests +481, Docs +10. Total +645 across 21 files.

View PR surface stats
Area Files Added Removed Net
Source 13 182 28 +154
Tests 6 515 34 +481
Docs 2 56 46 +10
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 21 753 108 +645

Acceptance criteria:

  • [P1] Build and restart the gateway from head 115f649, then rerun the affected Codex-backed Telegram flow and capture redacted live output.
  • [P1] Run the focused Codex app-server and embedded-runner Vitest files listed in the PR body after any further runtime edits.
  • [P1] Keep git diff --check origin/main...HEAD and targeted oxfmt checks green after any proof-refresh commits.

What I checked:

Likely related people:

  • @steipete: Current-main blame and recent history place the Codex app-server turn-watch/recovery implementation and related lifecycle work primarily in Peter Steinberger's recent commits. (role: recent area contributor; confidence: high; commits: e6b011823ebb, 659bcc5e5b59, 545490c5920d; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/attempt-turn-watches.ts, src/agents/embedded-agent-runner/run.ts)
  • @vincentkoc: Recent Codex app-server auth/startup routing commits touch adjacent runtime continuity and binding behavior that this PR also affects. (role: adjacent owner; confidence: medium; commits: f1cc8f0cfc7c, 859eb0666282; files: extensions/codex/src/app-server/run-attempt.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. labels May 28, 2026
@keshavbotagent keshavbotagent force-pushed the fix/codex-raw-assistant-midturn-release branch from 2575552 to a59d953 Compare May 28, 2026 21:19
@keshavbotagent

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a59d953faf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread extensions/codex/src/app-server/attempt-timeouts.ts Outdated

Copy link
Copy Markdown
Contributor Author

Post-fix live proof has been added to the PR body.

Redacted summary:

  • Gateway was rebuilt/restarted on this branch and observed running with service version 2026.5.28.
  • Runtime checkout was fix/codex-raw-assistant-midturn-release at a59d953faf.
  • Built dist contains the PR behavior: postToolRawAssistantCompletionIdleTimeoutMs default normalization to 60000, the post-tool completion idle watch, and structured turn_completion_idle_timeout failure metadata.
  • A fresh isolated Codex validation turn completed successfully after two read-only tool calls: model.completed and session.ended status=success at 2026-05-28T21:39:08.429Z.
  • Delivered result: LIVE_PROOF_OK branch=fix/codex-raw-assistant-midturn-release head=a59d953faf.
  • No turn.completion_idle_timeout or retryable app-server failure occurred in that post-restart proof.

Exact chat/session/thread/turn/run/tool-call IDs, auth profile details, local usernames, and local paths are intentionally omitted.

Copy link
Copy Markdown
Contributor Author

Post-fix recurrence update, redacted:

After rebuild/restart on fix/codex-raw-assistant-midturn-release at a59d953faf, a separate live side-effecting run still hit the same app-server terminal-signal failure:

  • 2026-05-28T21:37:04.144Z: live run started for a multi-file itinerary/data-model edit.
  • The run completed many tool calls/results, so it was not replay-safe/no-output.
  • Codex emitted assistant progress text: I'm changing the data model now....
  • 2026-05-28T21:40:22.199Z: OpenClaw recorded turn.completion_idle_timeout with idleMs=60001, timeoutMs=60000, lastActivityReason=notification:rawResponseItem/completed, lastNotificationMethod=rawResponseItem/completed, lastNotificationItemType=message, lastNotificationItemRole=assistant.
  • The run ended as session.ended status=error with codex app-server turn idle timed out waiting for turn/completed.

This confirms the current PR scope: it keeps completion semantics safe and handles replay-safe recovery, but it does not solve non-replay-safe post-tool assistant-progress stalls with side effects. Those need a separate recovery strategy or an upstream app-server fix. Exact chat/session/thread/turn/run/tool-call IDs, usernames, and local paths omitted.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. labels May 28, 2026
@keshavbotagent keshavbotagent changed the title fix(codex): recover replay-safe app-server stalls fix(codex): recover app-server completion stalls May 28, 2026

Copy link
Copy Markdown
Contributor Author

Implemented the next fix after the post-restart recurrence.

The recurrence was a side-effecting/progress timeout, so auto-replay remains intentionally blocked. New commit 06703bc1a6 changes that path to:

  • clear the stale Codex native thread binding when the native turn times out;
  • retire the stuck app-server client and release the OpenClaw lane;
  • classify assistant-progress/tool-activity completion-idle stalls as recoverable timeout outcomes;
  • return the recoverable timeout payload instead of throwing the raw codex app-server turn idle timed out waiting for turn/completed prompt error.

Validation run:

  • git diff --check
  • pnpm exec oxfmt --check on changed TypeScript files
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.extension-codex.config.ts extensions/codex/src/app-server/attempt-results.test.ts extensions/codex/src/app-server/run-attempt.turn-watches.test.ts
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/embedded-agent-runner/run.codex-app-server-recovery.test.ts
  • pnpm tsgo:core
  • pnpm tsgo:extensions

Remaining live validation: rebuild/restart on head 06703bc1a6 and confirm the same side-effecting stall now returns the recoverable message and does not poison the next native thread binding.

@keshavbotagent

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review
@codex re-review

@clawsweeper

clawsweeper Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Breezy!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. labels May 29, 2026
@openclaw-barnacle openclaw-barnacle Bot added the proof: supplied External PR includes structured after-fix real behavior proof. label May 29, 2026
@steipete steipete self-assigned this May 29, 2026
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. labels May 29, 2026
@keshavbotagent

Copy link
Copy Markdown
Contributor Author

@clawsweeper hatch

@keshavbotagent

Copy link
Copy Markdown
Contributor Author

@clawsweeper review
@codex review

@clawsweeper

clawsweeper Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Swish!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@github-actions

github-actions Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Dependency graph guard cleared

This PR no longer has blocked dependency graph changes. A future dependency graph change requires a fresh /allow-dependencies-change comment after the guard blocks that new head SHA.

  • Current SHA: c005ba9b702b389efb88b2bdfce0cacebc16b75f

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c005ba9b70

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +2403 to +2405
shouldSurfaceCodexCompletionTimeout =
attempt.codexAppServerFailure?.kind === "turn_completion_idle_timeout" &&
attempt.timedOut;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate timeout surfacing on completion watchdogs

When the Codex app-server watch fires for the progress or terminal watchdog, run-attempt.ts still reports the broad kind: "turn_completion_idle_timeout" and only distinguishes the actual watchdog in turnWatchTimeoutKind. This condition ignores that narrower field, so a turn_progress_idle_timeout or turn_terminal_idle_timeout now bypasses the normal prompt-error/failover path and is returned as the recoverable completion-timeout payload even though the retry resolver explicitly rejects non-completion watchdogs. Please also require turnWatchTimeoutKind === "completion" here (or handle the legacy-absent case deliberately).

Useful? React with 👍 / 👎.

@steipete

Copy link
Copy Markdown
Contributor

Behavior addressed: Codex app-server replay-safe completion-idle stalls retry once; non-completion turn-watch timeouts no longer replay; retry-exhausted/non-stdio completion timeouts surface user timeout payloads; native response deltas no longer disarm the post-tool completion guard incorrectly.
Real environment tested: local macOS plus Blacksmith Testbox.
Exact steps or command run after this patch:

  • node scripts/run-vitest.mjs src/infra/net/http-connect-tunnel.test.ts extensions/codex/src/app-server/run-attempt.turn-watches.test.ts src/agents/embedded-agent-runner/run.codex-app-server-recovery.test.ts
  • node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.turn-watches.test.ts src/agents/embedded-agent-runner/run.codex-app-server-recovery.test.ts extensions/codex-supervisor/src/supervisor.test.ts extensions/codex-supervisor/src/plugin-tools.test.ts
  • node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.extensions.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/extensions-test.tsbuildinfo
  • node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.core.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core-test.tsbuildinfo
  • node scripts/run-oxlint.mjs extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.turn-watches.test.ts src/agents/embedded-agent-runner/run.codex-app-server-recovery.test.ts src/agents/embedded-agent-runner/run/codex-app-server-recovery.ts src/agents/embedded-agent-runner/run/types.ts
  • /Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --mode branch --base origin/main
  • pnpm check:changed (Blacksmith Testbox tbx_01ksty84e7wby9y3ydsw5tmnjc)
    Evidence after fix: focused tests passed (4 files/78 tests and 5 files/102 tests); final autoreview clean; final Testbox changed gate exit 0; GitHub exact-head checks green.
    Observed result after fix: completion-idle stalls recover once only when replay-safe and from stdio; progress/terminal watchdog timeouts surface without replay; scoped native deltas keep post-tool guard semantics; oversized CONNECT timeout test no longer leaks unhandled rejection.
    What was not tested: live upstream Codex app-server future native raw response delta stream; covered by local harness/unit tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling docs Improvements or additions to documentation extensions: codex extensions: codex-supervisor Extension: codex-supervisor mantis: telegram-visible-proof Mantis should capture Telegram visible proof. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. size: L status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants