Fix Codex raw tool-output watchdog by joshavant · Pull Request #82378 · openclaw/openclaw

joshavant · 2026-05-16T01:00:07Z

Summary

Keep Codex app-server's short turn-completion idle watchdog armed after raw custom_tool_call_output notifications.
Add a Codex regression for raw tool-output completion silence.
Add a Telegram isolated-ingress regression showing different chats can interleave while one lane remains blocked.

Verification

node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts extensions/telegram/src/polling-session.test.ts (post-rebase: 2 files, 151 tests passed)
pnpm tsgo:extensions
node scripts/run-oxlint.mjs extensions/telegram/src/polling-session.ts extensions/telegram/src/polling-session.test.ts extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.test.ts
oxfmt --check --threads=1 CHANGELOG.md extensions/telegram/src/polling-session.ts extensions/telegram/src/polling-session.test.ts extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.test.ts
git diff --check
/Users/steipete/Projects/agent-scripts/skills/codex-review/scripts/codex-review --parallel-tests "node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts extensions/telegram/src/polling-session.test.ts" (clean: no accepted/actionable findings reported)

Real behavior proof

Behavior addressed: Telegram-triggered Codex turns can stall after Codex emits a raw custom_tool_call_output item without a matching app-server item/completed; OpenClaw should use the short turn-completion watchdog instead of waiting for the terminal idle timeout.

Real environment tested: Local OpenClaw gateway with a real Telegram bot in a regular Telegram group, plus a local fake Codex app-server protocol endpoint that deterministically emitted the raw tool-output sequence.

Exact steps or command run after this patch: Started the temporary HITL runner from /private/tmp/openclaw-82274-hitl-runner.mjs, sent the requested bot mention in the Telegram group, and let the runner observe the gateway and fake Codex app-server logs through completion.

Evidence after fix: Runtime log excerpt from /private/tmp/openclaw-82274-hitl-2026-05-16T00-19-30-920Z/fake-codex.jsonl: thread/start, turn/start, item/tool/call, and rawResponseItem/completed with item.type=custom_tool_call_output were recorded during the same live Telegram-triggered gateway turn.

Observed result after fix: Gateway runtime log excerpt recorded codex app-server turn idle timed out waiting for completion; the proof summary marked shortCompletionIdleTimeout=true and terminalIdleTimeout=false.

What was not tested: A real Codex binary was not used for the live HITL proof; the Codex app-server side was a protocol fake so the problematic raw event sequence was deterministic.

Fixes #82274.

clawsweeper · 2026-05-16T01:04:54Z

Codex review: needs maintainer review before merge.

Summary
The branch keeps the Codex app-server short completion watchdog armed for raw custom_tool_call_output completions, updates Codex harness docs and changelog, and adds Codex plus Telegram ingress regression tests.

Reproducibility: yes. Source inspection on current main shows a current-turn raw custom_tool_call_output notification disarms the short completion watchdog, and the linked production report shows that exact last notification type before a 30-minute terminal idle timeout.

Real behavior proof
Sufficient (logs): The PR body includes after-fix logs from a real Telegram gateway/group run with a deterministic fake Codex app-server showing the short completion watchdog fired and the terminal watchdog did not.

Next step before merge
No automated repair is needed; the remaining action is maintainer review and landing under the protected maintainer label and normal checks.

Security
Cleared: The diff changes timeout handling, tests, docs, and changelog text without adding dependencies, scripts, permissions, package-resolution changes, or secret-handling surfaces.

Review details

Best possible solution:

Land the narrow watchdog mitigation with its regression tests and docs, then close the linked production issue once the merge reaches main.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection on current main shows a current-turn raw custom_tool_call_output notification disarms the short completion watchdog, and the linked production report shows that exact last notification type before a 30-minute terminal idle timeout.

Is this the best way to solve the issue?

Yes. Keeping the short watchdog armed only for the raw tool-output handoff is the narrow OpenClaw-side mitigation, and the branch now updates the public Codex harness contract to match.

Acceptance criteria:

node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts extensions/telegram/src/polling-session.test.ts
pnpm tsgo:extensions
node scripts/run-oxlint.mjs extensions/telegram/src/polling-session.ts extensions/telegram/src/polling-session.test.ts extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.test.ts
oxfmt --check --threads=1 CHANGELOG.md docs/plugins/codex-harness.md extensions/telegram/src/polling-session.ts extensions/telegram/src/polling-session.test.ts extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.test.ts
git diff --check

What I checked:

Current-main source gap: On current main, current-turn notifications touch activity, but a non-terminal current-turn rawResponseItem/completed that is not a tracked dynamic item/completed or last current-turn item falls through to disarmTurnCompletionIdleWatch(), leaving only the terminal watchdog for raw tool-output silence. (extensions/codex/src/app-server/run-attempt.ts:1251, 31cc26230f35)
Branch runtime fix: The PR patch adds rawToolOutputCompletion, rearms the short completion watchdog for current-turn raw custom_tool_call_output completions, and excludes that raw completion from the generic disarm branch. (extensions/codex/src/app-server/run-attempt.ts:1266, 9421608aede3)
Regression coverage: The PR adds a Codex regression expecting the short completion timeout after a raw custom_tool_call_output completion and a Telegram isolated-ingress regression for different chats interleaving while one lane remains blocked. (extensions/codex/src/app-server/run-attempt.test.ts:1712, 9421608aede3)
Docs follow-up addressed: The latest PR commit updates the Codex harness docs so raw custom_tool_call_output completions are documented as keeping the short post-tool watchdog armed, resolving the earlier ClawSweeper P3 docs finding. Public docs: docs/plugins/codex-harness.md. (docs/plugins/codex-harness.md:529, 9421608aede3)
Related production report: The linked issue reports OpenClaw 2026.5.12 production logs where the terminal idle timeout's last notification was rawResponseItem/completed with lastNotificationItemType: "custom_tool_call_output"; later comments add a separate production reproduction.
Telegram review context: The Telegram maintainer note requires real Telegram proof for transport/ingress behavior; the PR body provides a real Telegram gateway/group run with a deterministic fake Codex app-server endpoint for the raw event sequence. (.agents/maintainer-notes/telegram.md:1, 31cc26230f35)

Likely related people:

joshavant: Current main includes recent Josh Avant work in the same Codex app-server run-attempt path, and this branch continues that notification watchdog surface beyond the fact that he authored the PR. (role: recent Codex notification-liveness contributor; confidence: high; commits: ea16a5e9e10c; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/run-attempt.test.ts)
funmerlin: Commit metadata and the patch for the adjacent quiescent app-server fix show funmerlin added the existing last item/completed watchdog exception near the new raw tool-output exception. (role: recent Codex watchdog fix author; confidence: high; commits: 127156a88a29; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/run-attempt.test.ts)
steipete: Recent history shows Peter Steinberger authored the stalled Telegram ingress backlog work adjacent to the Telegram regression coverage touched here, and he added the docs clarification commit on this branch. (role: recent Telegram ingress contributor; confidence: high; commits: 25a8f5f3f852, 9421608aede3; files: extensions/telegram/src/polling-session.ts, extensions/telegram/src/polling-session.test.ts, docs/plugins/codex-harness.md)

Remaining risk / open question:

This read-only review did not rerun the PR's verification commands; it relies on source inspection, the branch diff, and the reported verification/proof in the PR body.
The real behavior proof uses a deterministic fake Codex app-server, so it proves OpenClaw's handling of the raw event sequence but not an upstream Codex binary fix.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 31cc26230f35.

steipete · 2026-05-16T02:07:04Z

Landed via rebase onto main.

Gate: node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts extensions/telegram/src/polling-session.test.ts -> 151 passed
Gate: oxfmt --check --threads=1 ... + git diff --check -> passed
Gate: Codex review helper with the same focused tests -> clean, no accepted/actionable findings
Gate: Real behavior proof -> passed after PR body newline fix
Source head: dbb5433
Landed commits: 44a3301, 5d93fb3, cf7c46d

Thanks @joshavant!

openclaw-barnacle Bot added channel: telegram Channel integration: telegram extensions: codex size: M maintainer Maintainer-authored PR labels May 16, 2026

clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 16, 2026

steipete force-pushed the codex/82274-raw-tool-output-watchdog branch from 6373b3c to 9421608 Compare May 16, 2026 01:35

openclaw-barnacle Bot added the docs Improvements or additions to documentation label May 16, 2026

joshavant and others added 3 commits May 16, 2026 02:55

fix(codex): keep raw tool output watchdog armed

37620cb

docs(changelog): add codex watchdog entry

c08f6fc

docs(codex): clarify raw tool output watchdog

dbb5433

steipete force-pushed the codex/82274-raw-tool-output-watchdog branch from 9421608 to dbb5433 Compare May 16, 2026 01:57

steipete merged commit cf7c46d into main May 16, 2026
113 of 114 checks passed

steipete deleted the codex/82274-raw-tool-output-watchdog branch May 16, 2026 02:06

Haderach-Ram mentioned this pull request May 16, 2026

Ecosystem Digest — 2026-05-16 Haderach-Ram/openclaw-radar#9

Open

clawsweeper Bot mentioned this pull request May 17, 2026

fix(codex): guard post-tool raw assistant terminal gaps #82816

Merged

steipete mentioned this pull request May 17, 2026

Codex app-server can timeout/fallback during near-window progressing turns #81114

Closed

mazetsoligarh-cell mentioned this pull request May 29, 2026

Codex app-server idle watchdog fires after image_generation_call raw completed item #87948

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Codex raw tool-output watchdog#82378

Fix Codex raw tool-output watchdog#82378
steipete merged 3 commits into
mainfrom
codex/82274-raw-tool-output-watchdog

joshavant commented May 16, 2026 •

edited by steipete

Loading

Uh oh!

clawsweeper Bot commented May 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

steipete commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

joshavant commented May 16, 2026 • edited by steipete Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Real behavior proof

Uh oh!

clawsweeper Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

steipete commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joshavant commented May 16, 2026 •

edited by steipete

Loading

clawsweeper Bot commented May 16, 2026 •

edited

Loading