Skip to content

Fix Codex raw tool-output watchdog#82378

Merged
steipete merged 3 commits into
mainfrom
codex/82274-raw-tool-output-watchdog
May 16, 2026
Merged

Fix Codex raw tool-output watchdog#82378
steipete merged 3 commits into
mainfrom
codex/82274-raw-tool-output-watchdog

Conversation

@joshavant

@joshavant joshavant commented May 16, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Keep Codex app-server's short turn-completion idle watchdog armed after raw custom_tool_call_output notifications.
  • Add a Codex regression for raw tool-output completion silence.
  • Add a Telegram isolated-ingress regression showing different chats can interleave while one lane remains blocked.

Verification

  • node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts extensions/telegram/src/polling-session.test.ts (post-rebase: 2 files, 151 tests passed)
  • pnpm tsgo:extensions
  • node scripts/run-oxlint.mjs extensions/telegram/src/polling-session.ts extensions/telegram/src/polling-session.test.ts extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.test.ts
  • oxfmt --check --threads=1 CHANGELOG.md extensions/telegram/src/polling-session.ts extensions/telegram/src/polling-session.test.ts extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.test.ts
  • git diff --check
  • /Users/steipete/Projects/agent-scripts/skills/codex-review/scripts/codex-review --parallel-tests "node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts extensions/telegram/src/polling-session.test.ts" (clean: no accepted/actionable findings reported)

Real behavior proof

Behavior addressed: Telegram-triggered Codex turns can stall after Codex emits a raw custom_tool_call_output item without a matching app-server item/completed; OpenClaw should use the short turn-completion watchdog instead of waiting for the terminal idle timeout.

Real environment tested: Local OpenClaw gateway with a real Telegram bot in a regular Telegram group, plus a local fake Codex app-server protocol endpoint that deterministically emitted the raw tool-output sequence.

Exact steps or command run after this patch: Started the temporary HITL runner from /private/tmp/openclaw-82274-hitl-runner.mjs, sent the requested bot mention in the Telegram group, and let the runner observe the gateway and fake Codex app-server logs through completion.

Evidence after fix: Runtime log excerpt from /private/tmp/openclaw-82274-hitl-2026-05-16T00-19-30-920Z/fake-codex.jsonl: thread/start, turn/start, item/tool/call, and rawResponseItem/completed with item.type=custom_tool_call_output were recorded during the same live Telegram-triggered gateway turn.

Observed result after fix: Gateway runtime log excerpt recorded codex app-server turn idle timed out waiting for completion; the proof summary marked shortCompletionIdleTimeout=true and terminalIdleTimeout=false.

What was not tested: A real Codex binary was not used for the live HITL proof; the Codex app-server side was a protocol fake so the problematic raw event sequence was deterministic.

Fixes #82274.

@openclaw-barnacle openclaw-barnacle Bot added channel: telegram Channel integration: telegram extensions: codex size: M maintainer Maintainer-authored PR labels May 16, 2026
@clawsweeper

clawsweeper Bot commented May 16, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Summary
The branch keeps the Codex app-server short completion watchdog armed for raw custom_tool_call_output completions, updates Codex harness docs and changelog, and adds Codex plus Telegram ingress regression tests.

Reproducibility: yes. Source inspection on current main shows a current-turn raw custom_tool_call_output notification disarms the short completion watchdog, and the linked production report shows that exact last notification type before a 30-minute terminal idle timeout.

Real behavior proof
Sufficient (logs): The PR body includes after-fix logs from a real Telegram gateway/group run with a deterministic fake Codex app-server showing the short completion watchdog fired and the terminal watchdog did not.

Next step before merge
No automated repair is needed; the remaining action is maintainer review and landing under the protected maintainer label and normal checks.

Security
Cleared: The diff changes timeout handling, tests, docs, and changelog text without adding dependencies, scripts, permissions, package-resolution changes, or secret-handling surfaces.

Review details

Best possible solution:

Land the narrow watchdog mitigation with its regression tests and docs, then close the linked production issue once the merge reaches main.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection on current main shows a current-turn raw custom_tool_call_output notification disarms the short completion watchdog, and the linked production report shows that exact last notification type before a 30-minute terminal idle timeout.

Is this the best way to solve the issue?

Yes. Keeping the short watchdog armed only for the raw tool-output handoff is the narrow OpenClaw-side mitigation, and the branch now updates the public Codex harness contract to match.

Acceptance criteria:

  • node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts extensions/telegram/src/polling-session.test.ts
  • pnpm tsgo:extensions
  • node scripts/run-oxlint.mjs extensions/telegram/src/polling-session.ts extensions/telegram/src/polling-session.test.ts extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.test.ts
  • oxfmt --check --threads=1 CHANGELOG.md docs/plugins/codex-harness.md extensions/telegram/src/polling-session.ts extensions/telegram/src/polling-session.test.ts extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.test.ts
  • git diff --check

What I checked:

  • Current-main source gap: On current main, current-turn notifications touch activity, but a non-terminal current-turn rawResponseItem/completed that is not a tracked dynamic item/completed or last current-turn item falls through to disarmTurnCompletionIdleWatch(), leaving only the terminal watchdog for raw tool-output silence. (extensions/codex/src/app-server/run-attempt.ts:1251, 31cc26230f35)
  • Branch runtime fix: The PR patch adds rawToolOutputCompletion, rearms the short completion watchdog for current-turn raw custom_tool_call_output completions, and excludes that raw completion from the generic disarm branch. (extensions/codex/src/app-server/run-attempt.ts:1266, 9421608aede3)
  • Regression coverage: The PR adds a Codex regression expecting the short completion timeout after a raw custom_tool_call_output completion and a Telegram isolated-ingress regression for different chats interleaving while one lane remains blocked. (extensions/codex/src/app-server/run-attempt.test.ts:1712, 9421608aede3)
  • Docs follow-up addressed: The latest PR commit updates the Codex harness docs so raw custom_tool_call_output completions are documented as keeping the short post-tool watchdog armed, resolving the earlier ClawSweeper P3 docs finding. Public docs: docs/plugins/codex-harness.md. (docs/plugins/codex-harness.md:529, 9421608aede3)
  • Related production report: The linked issue reports OpenClaw 2026.5.12 production logs where the terminal idle timeout's last notification was rawResponseItem/completed with lastNotificationItemType: "custom_tool_call_output"; later comments add a separate production reproduction.
  • Telegram review context: The Telegram maintainer note requires real Telegram proof for transport/ingress behavior; the PR body provides a real Telegram gateway/group run with a deterministic fake Codex app-server endpoint for the raw event sequence. (.agents/maintainer-notes/telegram.md:1, 31cc26230f35)

Likely related people:

  • joshavant: Current main includes recent Josh Avant work in the same Codex app-server run-attempt path, and this branch continues that notification watchdog surface beyond the fact that he authored the PR. (role: recent Codex notification-liveness contributor; confidence: high; commits: ea16a5e9e10c; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/run-attempt.test.ts)
  • funmerlin: Commit metadata and the patch for the adjacent quiescent app-server fix show funmerlin added the existing last item/completed watchdog exception near the new raw tool-output exception. (role: recent Codex watchdog fix author; confidence: high; commits: 127156a88a29; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/run-attempt.test.ts)
  • steipete: Recent history shows Peter Steinberger authored the stalled Telegram ingress backlog work adjacent to the Telegram regression coverage touched here, and he added the docs clarification commit on this branch. (role: recent Telegram ingress contributor; confidence: high; commits: 25a8f5f3f852, 9421608aede3; files: extensions/telegram/src/polling-session.ts, extensions/telegram/src/polling-session.test.ts, docs/plugins/codex-harness.md)

Remaining risk / open question:

  • This read-only review did not rerun the PR's verification commands; it relies on source inspection, the branch diff, and the reported verification/proof in the PR body.
  • The real behavior proof uses a deterministic fake Codex app-server, so it proves OpenClaw's handling of the raw event sequence but not an upstream Codex binary fix.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 31cc26230f35.

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 16, 2026
@steipete steipete force-pushed the codex/82274-raw-tool-output-watchdog branch from 6373b3c to 9421608 Compare May 16, 2026 01:35
@openclaw-barnacle openclaw-barnacle Bot added the docs Improvements or additions to documentation label May 16, 2026
@steipete steipete force-pushed the codex/82274-raw-tool-output-watchdog branch from 9421608 to dbb5433 Compare May 16, 2026 01:57
@steipete steipete merged commit cf7c46d into main May 16, 2026
113 of 114 checks passed
@steipete steipete deleted the codex/82274-raw-tool-output-watchdog branch May 16, 2026 02:06
@steipete

Copy link
Copy Markdown
Contributor

Landed via rebase onto main.

  • Gate: node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts extensions/telegram/src/polling-session.test.ts -> 151 passed
  • Gate: oxfmt --check --threads=1 ... + git diff --check -> passed
  • Gate: Codex review helper with the same focused tests -> clean, no accepted/actionable findings
  • Gate: Real behavior proof -> passed after PR body newline fix
  • Source head: dbb5433
  • Landed commits: 44a3301, 5d93fb3, cf7c46d

Thanks @joshavant!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: telegram Channel integration: telegram docs Improvements or additions to documentation extensions: codex maintainer Maintainer-authored PR proof: sufficient ClawSweeper judged the real behavior proof convincing. size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2026.5.12: Telegram isolated-ingress HOL blocking + Codex app-server stalls mid-turn after custom_tool_call_output → 30 min idle timeout

2 participants