Fix Codex raw tool-output watchdog#82378
Conversation
|
Codex review: needs maintainer review before merge. Summary Reproducibility: yes. Source inspection on current main shows a current-turn raw Real behavior proof Next step before merge Security Review detailsBest possible solution: Land the narrow watchdog mitigation with its regression tests and docs, then close the linked production issue once the merge reaches main. Do we have a high-confidence way to reproduce the issue? Yes. Source inspection on current main shows a current-turn raw Is this the best way to solve the issue? Yes. Keeping the short watchdog armed only for the raw tool-output handoff is the narrow OpenClaw-side mitigation, and the branch now updates the public Codex harness contract to match. Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 31cc26230f35. |
6373b3c to
9421608
Compare
9421608 to
dbb5433
Compare
|
Landed via rebase onto
Thanks @joshavant! |
Summary
custom_tool_call_outputnotifications.Verification
node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts extensions/telegram/src/polling-session.test.ts(post-rebase: 2 files, 151 tests passed)pnpm tsgo:extensionsnode scripts/run-oxlint.mjs extensions/telegram/src/polling-session.ts extensions/telegram/src/polling-session.test.ts extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.test.tsoxfmt --check --threads=1 CHANGELOG.md extensions/telegram/src/polling-session.ts extensions/telegram/src/polling-session.test.ts extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.test.tsgit diff --check/Users/steipete/Projects/agent-scripts/skills/codex-review/scripts/codex-review --parallel-tests "node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts extensions/telegram/src/polling-session.test.ts"(clean: no accepted/actionable findings reported)Real behavior proof
Behavior addressed: Telegram-triggered Codex turns can stall after Codex emits a raw
custom_tool_call_outputitem without a matching app-serveritem/completed; OpenClaw should use the short turn-completion watchdog instead of waiting for the terminal idle timeout.Real environment tested: Local OpenClaw gateway with a real Telegram bot in a regular Telegram group, plus a local fake Codex app-server protocol endpoint that deterministically emitted the raw tool-output sequence.
Exact steps or command run after this patch: Started the temporary HITL runner from
/private/tmp/openclaw-82274-hitl-runner.mjs, sent the requested bot mention in the Telegram group, and let the runner observe the gateway and fake Codex app-server logs through completion.Evidence after fix: Runtime log excerpt from
/private/tmp/openclaw-82274-hitl-2026-05-16T00-19-30-920Z/fake-codex.jsonl:thread/start,turn/start,item/tool/call, andrawResponseItem/completedwithitem.type=custom_tool_call_outputwere recorded during the same live Telegram-triggered gateway turn.Observed result after fix: Gateway runtime log excerpt recorded
codex app-server turn idle timed out waiting for completion; the proof summary markedshortCompletionIdleTimeout=trueandterminalIdleTimeout=false.What was not tested: A real Codex binary was not used for the live HITL proof; the Codex app-server side was a protocol fake so the problematic raw event sequence was deterministic.
Fixes #82274.