Skip to content

fix(codex): make post-tool raw assistant timeout configurable#84135

Closed
rozmiarD wants to merge 2 commits into
openclaw:mainfrom
rozmiarD:fix/codex-post-tool-raw-assistant-timeout
Closed

fix(codex): make post-tool raw assistant timeout configurable#84135
rozmiarD wants to merge 2 commits into
openclaw:mainfrom
rozmiarD:fix/codex-post-tool-raw-assistant-timeout

Conversation

@rozmiarD

@rozmiarD rozmiarD commented May 19, 2026

Copy link
Copy Markdown
Contributor

Motivation

Codex app-server currently uses the assistant completion idle timeout for the post-tool raw assistant completion guard. That keeps stuck post-tool turns fail-fast, but couples two different behaviors: final assistant release and post-tool completion wait. Heavy or trusted local workloads may need a longer post-tool wait without weakening final assistant release.

Root cause / diagnosis

Pre-tool raw assistant completion can represent final assistant output and should still release the session after the assistant idle budget.

Post-tool raw assistant completion or progress happens after a tool handoff and can represent post-tool synthesis or terminal-event wait.

Current code uses turnAssistantCompletionIdleTimeoutMs for the post-tool guard. That timeout defaults to 10s. Existing tests show this was intentional fail-fast behavior, so this PR adds a separate configurable budget rather than globally changing assistant release semantics.

Implementation

Added appServer.postToolRawAssistantCompletionIdleTimeoutMs.

Added runtime option and config resolution for the post-tool raw assistant completion guard.

Post-tool raw assistant guard now uses the dedicated timeout. If unset, behavior remains compatible by falling back to the assistant completion idle timeout.

Pre-tool raw assistant release remains unchanged.

Updated both Codex harness docs pages so the post-tool raw assistant paragraph mentions the dedicated completion-idle guard instead of only turn/completed or the terminal watchdog.

What did not change

  • Pre-tool raw assistant release.
  • turn/completed handling.
  • Raw tool-output completion handling.
  • Dynamic tool diagnostics.
  • Dynamic tool execution timeout.
  • requestTimeoutMs.
  • Terminal idle hard fallback.

Tests

  • git diff --check origin/main..HEAD passed after rebasing onto origin/main at 5d775122c1f6.
  • Local focused run passed on Node 24.15.0: CI=1 timeout 300s node scripts/run-vitest.mjs run --config test/vitest/vitest.extensions.config.ts extensions/codex/src/app-server/config.test.ts extensions/codex/src/app-server/run-attempt.test.ts -t "post-tool raw assistant|Codex app-server config|raw assistant progress".
  • Crabbox SSH focused run passed on a Linux lab target with the same command: 2 files passed, 46 tests passed, 183 skipped.

Regression coverage

Existing pre-tool raw assistant release behavior remains covered.

New coverage proves post-tool raw assistant completion can use a longer dedicated timeout than assistant release.

Config coverage proves the appServer field is accepted and resolved, and unknown fields remain rejected.

Existing post-tool timeout coverage remains valid; the default behavior still falls back to the assistant completion idle timeout.

Risk

Longer post-tool timeout may delay stuck-turn detection when operators configure it too high.

Too-short timeout can interrupt legitimate post-tool synthesis.

Keeping a separate config avoids weakening final assistant release and avoids global timeout changes.

Proof / evidence

git diff --stat origin/main..HEAD after the rebase:

CHANGELOG.md                                       |   1 +
docs/plugins/codex-harness-reference.md            |  44 ++++----
docs/plugins/codex-harness.md                      |  44 ++++----
extensions/codex/openclaw.plugin.json              |   9 ++
extensions/codex/src/app-server/config.test.ts     |  13 +++
extensions/codex/src/app-server/config.ts          |  12 +++
extensions/codex/src/app-server/run-attempt.test.ts | 113 +++++++++++++++++++++
extensions/codex/src/app-server/run-attempt.ts     |  22 +++-
8 files changed, 217 insertions(+), 41 deletions(-)

Before: post-tool raw assistant completion armed turn.completion_idle_timeout with turnAssistantCompletionIdleTimeoutMs, so the guard was tied to the final assistant release budget.

After: post-tool raw assistant completion arms the same completion-idle guard with postToolRawAssistantCompletionIdleTimeoutMs, falling back to turnAssistantCompletionIdleTimeoutMs when unset. Pre-tool raw assistant release still uses the assistant release path.

Docs now state that post-tool raw assistant progress keeps waiting for turn/completed while a completion-idle guard stays armed, using appServer.postToolRawAssistantCompletionIdleTimeoutMs when configured and falling back to the assistant completion idle timeout otherwise.

Real behavior proof

Behavior addressed: Post-tool raw assistant completion after a tool handoff uses the dedicated completion-idle timeout when configured instead of finishing on the shorter final assistant release timeout.

Real environment tested: OpenClaw PR head 9a410e3a176c rebased on upstream main 5d775122c1f6, executed through Crabbox static SSH on a Linux lab target. The runtime entrypoint was the production runCodexAppServerAttempt Codex app-server path from this PR head.

Exact steps or command run after this patch: crabbox run --provider ssh --target linux --static-host <redacted> --static-user <redacted> --static-work-root <redacted> --shell -- 'timeout 120s node --import tsx - <<EOF ... EOF'.

Evidence after fix: Terminal output from the Crabbox run:

OPENCLAW_PR_84135_CRABBOX_RUNTIME_PROOF
{
  "head": "9a410e3a176c",
  "base": "5d775122c1f6",
  "runtime": "OpenClaw Codex app-server runCodexAppServerAttempt",
  "configured": {
    "turnCompletionIdleTimeoutMs": 500,
    "turnAssistantCompletionIdleTimeoutMs": 5,
    "postToolRawAssistantCompletionIdleTimeoutMs": 100,
    "turnTerminalIdleTimeoutMs": 500
  },
  "observed": {
    "toolSuccess": false,
    "settledAfter20ms": false,
    "aborted": true,
    "timedOut": true,
    "promptError": "codex app-server turn idle timed out waiting for turn/completed",
    "interruptMethod": "turn/interrupt",
    "warningTimeoutMs": 100,
    "warningLastActivityReason": "notification:rawResponseItem/completed"
  }
}
command complete in 8.206s total=8.901s
run summary sync=680ms command=8.206s total=8.901s sync_skipped=true exit=0

Observed result after fix: The turn did not settle after the 5ms assistant-release budget (settledAfter20ms: false). After the post-tool raw assistant event, the completion watchdog used the configured 100ms budget (warningTimeoutMs: 100) and retained lastActivityReason: "notification:rawResponseItem/completed"; the stuck turn then aborted through turn/interrupt with the existing completion-idle error.

What was not tested: A live external model request was outside this proof. The proof exercised the OpenClaw app-server runtime path that owns this timeout decision, plus focused local and Crabbox regression runs for the changed files.

AI assistance

This PR was AI-assisted. I reviewed the diff, tests, and behavior before submitting.

@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation extensions: codex size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 19, 2026
@clawsweeper

clawsweeper Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
Adds appServer.postToolRawAssistantCompletionIdleTimeoutMs to the Codex plugin config/docs/manifest and uses it for post-tool raw assistant completion idle guarding, with tests and a changelog entry.

Reproducibility: yes. for source-level behavior, though I did not run mutating tests in this read-only review. Current main uses turnAssistantCompletionIdleTimeoutMs for the post-tool raw assistant guard, and the PR body includes Crabbox terminal proof of the configured 100 ms path after the patch.

PR rating
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🐚 platinum hermit
Summary: The PR has solid focused implementation, docs, tests, and terminal runtime proof, with only the opt-in availability tradeoff left for maintainer judgment.

Rank-up moves:

  • none
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Sufficient (terminal): The PR body now includes redacted Crabbox terminal output from a Linux lab target showing the configured post-tool timeout path after the patch.

Risk before merge

Maintainer options:

  1. Accept the opt-in timeout tradeoff (recommended)
    Merge after maintainers are comfortable that defaults remain unchanged and only operators who configure a larger value can delay stuck-turn interruption.
  2. Pause for the default-semantics decision
    If maintainers do not want a new config surface before deciding the default behavior, pause this PR and keep the product question in the linked follow-up issue.

Next step before merge
No repair lane is needed; the next action is maintainer review of the new opt-in timeout surface and availability tradeoff.

Security
Cleared: No concrete security or supply-chain concern was found; the diff is limited to Codex timeout config/runtime handling, docs, tests, manifest metadata, and changelog text.

Review details

Best possible solution:

Ship the compatible, config-gated timeout if maintainers accept the opt-in delay tradeoff, and leave any default-budget semantic change to #84137.

Do we have a high-confidence way to reproduce the issue?

Yes for source-level behavior, though I did not run mutating tests in this read-only review. Current main uses turnAssistantCompletionIdleTimeoutMs for the post-tool raw assistant guard, and the PR body includes Crabbox terminal proof of the configured 100 ms path after the patch.

Is this the best way to solve the issue?

Yes for the conservative path: the PR preserves current defaults and scopes the change to an opt-in config value. The broader default-budget semantics are better handled in the linked follow-up issue rather than folded into this PR.

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body now includes redacted Crabbox terminal output from a Linux lab target showing the configured post-tool timeout path after the patch.
  • add rating: 🐚 platinum hermit: Current PR rating is 🐚 platinum hermit because proof is 🐚 platinum hermit, patch quality is 🐚 platinum hermit, and The PR has solid focused implementation, docs, tests, and terminal runtime proof, with only the opt-in availability tradeoff left for maintainer judgment.
  • add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body now includes redacted Crabbox terminal output from a Linux lab target showing the configured post-tool timeout path after the patch.
  • remove rating: 🦪 silver shellfish: Current PR rating is rating: 🐚 platinum hermit, so this older rating label is no longer current.
  • remove status: 📣 needs proof: Current PR status label is status: 👀 ready for maintainer look.

Label justifications:

  • P2: This is a normal Codex plugin runtime/config improvement with limited blast radius and explicit regression coverage.
  • merge-risk: 🚨 availability: Configuring a high post-tool timeout can intentionally keep a stuck Codex turn alive longer before interruption.
  • rating: 🐚 platinum hermit: Current PR rating is 🐚 platinum hermit because proof is 🐚 platinum hermit, patch quality is 🐚 platinum hermit, and The PR has solid focused implementation, docs, tests, and terminal runtime proof, with only the opt-in availability tradeoff left for maintainer judgment.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body now includes redacted Crabbox terminal output from a Linux lab target showing the configured post-tool timeout path after the patch.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body now includes redacted Crabbox terminal output from a Linux lab target showing the configured post-tool timeout path after the patch.

What I checked:

  • Current main behavior: Current main arms the post-tool raw assistant completion guard with turnAssistantCompletionIdleTimeoutMs, so the post-tool path is tied to the shorter assistant-release budget. (extensions/codex/src/app-server/run-attempt.ts:1916, 5d775122c1f6)
  • Runtime change: The PR resolves a dedicated post-tool timeout from runtime options or plugin config with fallback to the assistant timeout, then passes it to armTurnCompletionIdleWatch for post-tool raw assistant completion. (extensions/codex/src/app-server/run-attempt.ts:1923, 9a410e3a176c)
  • Config and manifest surface: The PR adds the new field to runtime/config types, strict config keys, zod validation, runtime option resolution, and plugin manifest UI hints/schema. (extensions/codex/src/app-server/config.ts:384, 9a410e3a176c)
  • Docs alignment: Both Codex harness docs now document the new field and say post-tool raw assistant progress keeps waiting for turn/completed while a completion-idle guard uses the dedicated value when configured. Public docs: docs/plugins/codex-harness.md. (docs/plugins/codex-harness.md:526, 9a410e3a176c)
  • Regression coverage: The new test configures a 5 ms assistant-release timeout and a 100 ms post-tool timeout, asserts the run is still unsettled after 20 ms, and verifies the completion warning used the 100 ms budget. (extensions/codex/src/app-server/run-attempt.test.ts:3752, 9a410e3a176c)
  • Contributor proof: The PR body includes redacted Crabbox static-SSH terminal output from a Linux lab target showing settledAfter20ms: false, warningTimeoutMs: 100, warningLastActivityReason: notification:rawResponseItem/completed, and turn/interrupt after timeout on head 9a410e3a176c. (9a410e3a176c)

Likely related people:

  • @steipete: Peter Steinberger introduced and repeatedly refactored/hardened the Codex app-server harness and has the largest shortlog count across the touched Codex app-server/docs/manifest files. (role: feature-history owner; confidence: high; commits: dd26e8c44d4e, 31a0b7bd42a5, 659bcc5e5b59; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/config.ts, docs/plugins/codex-harness.md)
  • Ayaan Zaidi: Current blame on main for the central post-tool raw assistant timeout branch points to Ayaan Zaidi’s recent commit across the Codex app-server files, though the commit title suggests broad carry-forward history rather than narrow ownership. (role: recent area carrier; confidence: low; commits: d7a90ebea61a; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/config.ts, docs/plugins/codex-harness.md)
  • @vincentkoc: Vincent Koc has recent merged work in the Codex app-server/auth/runtime test area, including reused app-server harness coverage and bound auth profile startup behavior. (role: adjacent runtime contributor; confidence: medium; commits: f1cc8f0cfc7c, b5dfeaab4ce3; files: extensions/codex/src/app-server/run-attempt.test.ts, extensions/codex/src/app-server/run-attempt.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 5d775122c1f6.

Copy link
Copy Markdown
Contributor Author

Opened follow-up design issue #84137 for the semantic/default-budget question that was intentionally left out of this conservative PR.

This PR keeps the current default behavior and adds an operator-configurable post-tool budget. The follow-up issue tracks the higher-risk alternative: changing the default semantics so post-tool raw assistant completion/progress gets a distinct longer budget by default while preserving short pre-tool final assistant release.

@openclaw-barnacle openclaw-barnacle Bot added triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 19, 2026
@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. labels May 19, 2026

Copy link
Copy Markdown
Contributor Author

Addressed the docs alignment finding in 9d40415.

The post-tool raw assistant paragraph now documents that the turn still waits for turn/completed while a completion-idle guard remains armed, using appServer.postToolRawAssistantCompletionIdleTimeoutMs when configured and falling back to the assistant completion idle timeout otherwise.

I also updated the PR body proof section to remove the stale local pnpm note and to state the contributor-side proof limits explicitly: Blacksmith Testbox requires a GitHub organization app install, and direct AWS Crabbox credentials are not available in this contributor environment.

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added the merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. label May 19, 2026
@rozmiarD rozmiarD force-pushed the fix/codex-post-tool-raw-assistant-timeout branch from 9d40415 to 9a410e3 Compare May 20, 2026 12:34
@rozmiarD

Copy link
Copy Markdown
Contributor Author

Updated the PR to exact head 9a410e3 rebased on upstream main 5d77512 and replaced the old proof section with Crabbox static-SSH runtime output.

Runtime proof exercised the production OpenClaw Codex app-server runCodexAppServerAttempt path on a Linux lab target. Observed result: post-tool raw assistant progress did not settle after the 5ms assistant-release window, then used the configured 100ms post-tool completion guard with lastActivityReason=notification:rawResponseItem/completed and interrupted the stuck turn through turn/interrupt.

Focused regression also passed locally and through Crabbox: 2 files passed, 46 tests passed, 183 skipped.

@clawsweeper re-review

@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 20, 2026
@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 20, 2026
@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 🥚 common Moonlit Proofling

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🥚 common.
Trait: stacks clean commits.
Image traits: location merge queue dock; accessory little merge flag; palette plum, gold, and soft gray; mood watchful; pose standing beside its cracked shell; shell soft speckled shell; lighting gentle morning glow; background delicate sparkle particles.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Moonlit Proofling in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@vincentkoc

Copy link
Copy Markdown
Member

Maintainer stack note: this PR should stay narrow.

Relationship to the current Codex app-server stack:

Best maintainer framing: merge this only as the conservative config knob. Keep default semantics, subagent handoff, and long-reply truncation as related but separate follow-up work.

@vincentkoc

Copy link
Copy Markdown
Member

maintainer update: I created #84974 as the replacement PR because this source branch has maintainerCanModify=false, so I could not rebase/update it directly.

No functional disagreement with the patch here. #84974 keeps the contributor authorship, adds the docs/changelog alignment, and now has focused Vitest + Crabbox + Blacksmith Testbox changed-gate proof:

  • Crabbox fresh PR focused run: run_b0341d2ba17d / cbx_2b471a82c510.
  • Blacksmith Testbox changed gate: tbx_01ks5f478hn9sm2mcwbfzhzf2m, exit 0.

I would leave this PR as the contributor source/credit trail and use #84974 for the mergeable maintainer branch.

@vincentkoc

Copy link
Copy Markdown
Member

Maintainer closeout: closing this source PR as superseded by #84974, now squash-merged at 60d200f.

This is not a functional rejection of the patch. #84974 was the writable maintainer replacement because this branch could not be updated directly; it kept the same conservative timeout behavior, fixed the post-release changelog placement under Unreleased, rebased onto current main, and passed the focused Codex app-server checks plus required GitHub checks.

Keeping #84137 open for the separate default-semantics / subagent handoff decision.

@vincentkoc vincentkoc closed this May 22, 2026
@rozmiarD rozmiarD deleted the fix/codex-post-tool-raw-assistant-timeout branch May 22, 2026 05:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Improvements or additions to documentation extensions: codex merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: S status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants