Skip to content

fix(codex): ignore account updates for turn liveness#79667

Merged
joshavant merged 2 commits into
mainfrom
fix/codex-app-server-liveness-78756
May 9, 2026
Merged

fix(codex): ignore account updates for turn liveness#79667
joshavant merged 2 commits into
mainfrom
fix/codex-app-server-liveness-78756

Conversation

@joshavant

Copy link
Copy Markdown
Contributor

Summary

  • Problem: Codex app-server runs could stay blocked after a visible message-tool delivery when turn/completed was lost, because unrelated account/rate-limit notifications refreshed active-turn liveness.
  • Why it matters: Telegram/Discord users could see a successful delivered reply but the OpenClaw run remained active until a later generic timeout path, risking duplicate timeout messages.
  • What changed: limit Codex app-server completion liveness refreshes to notifications for the current turn, and suppress the generic timeout payload when messaging-tool delivery evidence already exists.
  • What did NOT change (scope boundary): no provider auth, model selection, channel send semantics, or config surface changes.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Integrations
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: Codex app-server run after a visible Telegram message-tool delivery loses turn/completed while account/rate-limit updates continue.
  • Real environment tested: local OpenClaw gateway with live OpenAI Codex app-server path and live Telegram delivery; proxy injected the failure mode by blackholing after message-tool completion and continuing account updates.
  • Exact steps or command run after this patch: ran the gateway against the proxy mode blackhole-after-message-tool-completed-and-inject-account, sent live Telegram agent messages, and observed the CLI JSON result plus gateway/proxy logs.
  • Evidence after fix: current main-based fix branch marker OC78756_FIXED_MAIN_RERUN returned status: timeout, payloads: [], durationMs: 66765; gateway logged codex app-server turn idle timed out waiting for completion while the proxy continued account update injections.
  • Observed result after fix: the app-server stops on completion-idle liveness instead of extending the active turn from account notifications, and the runner does not emit a duplicate generic timeout payload after visible message delivery evidence.
  • What was not tested: additional non-Telegram channels in live mode; the core suppression path is generic and covered by a runner regression test.
  • Before evidence: unpatched reproductions on current main and 2026.5.7 showed the account/rate-limit updates extending the stuck turn and the fallback timeout payload path after visible delivery.

Root Cause (if applicable)

  • Root cause: run-attempt.ts refreshed turn-completion activity for every app-server notification, including account/rate-limit updates unrelated to the current turn. Separately, the embedded runner's timeout fallback did not account for already-visible messaging-tool delivery evidence.
  • Missing detection / guardrail: no regression test covered non-turn app-server notifications during a missing turn/completed condition, and no runner test covered timeout suppression after messaging-tool delivery.
  • Contributing context (if known): the user-visible reply can be delivered before the app-server completion signal arrives, so liveness and fallback behavior need to respect that split.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extensions/codex/src/app-server/run-attempt.test.ts; src/agents/pi-embedded-runner/run.overflow-compaction.loop.test.ts
  • Scenario the test should lock in: account/rate-limit updates do not reset current-turn completion-idle liveness, and timeout fallback payloads are suppressed after messaging-tool delivery evidence.
  • Why this is the smallest reliable guardrail: these tests isolate the two faulty decisions without requiring live provider/channel credentials in CI.
  • Existing test that already covers this (if any): none before this PR.
  • If no new test is added, why not: N/A.

User-visible / Behavior Changes

Codex app-server runs that already visibly delivered a message now stop cleanly on completion-idle timeout when the completion signal is lost, without extending liveness from background account updates or sending a duplicate generic timeout reply.

Diagram (if applicable)

Before:
message delivered -> turn/completed lost -> account updates refresh liveness -> later generic timeout payload can surface

After:
message delivered -> turn/completed lost -> account updates ignored for turn liveness -> completion-idle timeout -> no duplicate payload

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: macOS local gateway; isolated 2026.5.7 package install also tested locally
  • Runtime/container: Node 22 local runtime
  • Model/provider: OpenAI / gpt-5.5
  • Integration/channel (if any): Telegram live delivery
  • Relevant config (redacted): live credentials from local config/auth files; no credentials included in logs or PR artifacts

Steps

  1. Configure Codex app-server command to run through the local failure proxy.
  2. Start OpenClaw gateway with Telegram and Codex enabled.
  3. Send an agent request that performs a visible Telegram message.action call.
  4. Have the proxy blackhole app-server messages after the message-tool completion, including turn/completed, while injecting account/rate-limit updates.
  5. Observe CLI JSON result, gateway liveness log, and proxy trace.

Expected

  • The run times out on app-server completion-idle liveness.
  • Account/rate-limit notifications do not extend the stuck turn.
  • No generic timeout payload is emitted after the visible message-tool delivery.

Actual

  • Current main-based fix branch: marker OC78756_FIXED_MAIN_RERUN, status: timeout, payloads: [], durationMs: 66765, gateway log codex app-server turn idle timed out waiting for completion.
  • Isolated 2026.5.7 patched install: marker OC78756_REL57_FIXED_RERUN, status: timeout, payloads: [], durationMs: 71715, gateway log codex app-server turn idle timed out waiting for completion.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios: focused regression tests; formatter check; changed gate; live Telegram/OpenAI reproduction on the current main-based fix branch; live Telegram/OpenAI reproduction against an isolated 2026.5.7 install with the same minimal dist patch.
  • Edge cases checked: account/rate-limit updates after blackholed completion; existing visible messaging-tool delivery before timeout.
  • What you did not verify: other live channels beyond Telegram.

Verification commands run:

pnpm test extensions/codex/src/app-server/run-attempt.test.ts src/agents/pi-embedded-runner/run.overflow-compaction.loop.test.ts -- --reporter=verbose
pnpm exec oxfmt --check --threads=1 extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.test.ts src/agents/pi-embedded-runner/run.ts src/agents/pi-embedded-runner/run.overflow-compaction.loop.test.ts
pnpm check:changed
git diff --check

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: a legitimate non-turn app-server notification could previously keep a run alive while completion was missing.
    • Mitigation: completion liveness now tracks only the current turn's notifications; startup/login/account status still flows normally, just not as proof of active completion progress.
  • Risk: suppressing generic timeout payloads could hide a timeout from the user.
    • Mitigation: suppression only applies when the runner already has messaging-tool delivery evidence, meaning a user-visible send occurred before the timeout.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling extensions: codex size: S maintainer Maintainer-authored PR labels May 9, 2026
@clawsweeper

clawsweeper Bot commented May 9, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Summary
The branch scopes Codex app-server notification liveness to current-turn notifications, suppresses generic timeout payloads after messaging-tool delivery evidence, adds focused regression tests, and updates the changelog.

Reproducibility: yes. Source inspection shows current main refreshes liveness before current-turn filtering, and the PR body includes live Telegram/OpenAI repro evidence for the lost-completion plus account-update path; I did not run the live path in this read-only review.

Real behavior proof
Sufficient (logs): The PR body includes copied after-fix live gateway/proxy output for the Telegram/OpenAI path and redacted environment details, which is sufficient real behavior proof for this non-visual runtime change.

Next step before merge
No repair lane is needed; maintainers should review, land, or adjust the linked issue-closing scope for this active protected-label PR.

Security
Cleared: The diff changes local timeout/liveness logic, tests, and changelog only; it does not add dependencies, CI execution, permissions, secret handling, or new network surfaces.

Review details

Best possible solution:

Land this focused fix after maintainer approval and merge gates if it is intended to close the linked timeout report; otherwise adjust the closing reference and keep broader timeout policy work tracked separately.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection shows current main refreshes liveness before current-turn filtering, and the PR body includes live Telegram/OpenAI repro evidence for the lost-completion plus account-update path; I did not run the live path in this read-only review.

Is this the best way to solve the issue?

Yes for the stated PR scope. Moving the liveness refresh behind isTurnNotification(...) and guarding the generic timeout payload with existing delivery evidence are narrow changes in the owning runtime paths; the only maintainer call is whether this fully closes the broader linked issue.

Acceptance criteria:

  • pnpm test extensions/codex/src/app-server/run-attempt.test.ts src/agents/pi-embedded-runner/run.overflow-compaction.loop.test.ts -- --reporter=verbose
  • pnpm exec oxfmt --check --threads=1 extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.test.ts src/agents/pi-embedded-runner/run.ts src/agents/pi-embedded-runner/run.overflow-compaction.loop.test.ts
  • pnpm check:changed
  • git diff --check

What I checked:

  • Current-main liveness bug path: Current main refreshes turn-completion activity before checking whether the notification belongs to the active thread/turn, so unscoped account/rate-limit notifications can still refresh active-turn liveness. (extensions/codex/src/app-server/run-attempt.ts:1042, 0d277e9533af)
  • Current-turn filter exists immediately after the refresh: The existing current-turn predicate checks threadId and turnId; the PR moves liveness refresh behind that predicate instead of adding a broader cache or special-case account filter. (extensions/codex/src/app-server/run-attempt.ts:2031, 0d277e9533af)
  • Current-main duplicate timeout payload path: Current main emits an explicit prompt-timeout payload whenever the prompt timed out and no payloads were produced; the PR adds a messaging-delivery evidence guard to this branch. (src/agents/pi-embedded-runner/run.ts:2393, 0d277e9533af)
  • PR regression coverage: The diff adds a Codex app-server test for account/rate-limit notifications not refreshing completion liveness and an embedded-runner test for suppressing generic timeout payloads after messaging-tool delivery. (extensions/codex/src/app-server/run-attempt.test.ts:979, b82eb56df402)
  • Real behavior proof in PR body: The PR body includes after-fix live Telegram/OpenAI app-server proof with a proxy-injected lost completion signal plus continued account updates, showing timeout status with empty payloads and gateway idle-timeout logs. (b82eb56df402)
  • GitHub checks: The unauthenticated check-runs API returned 111 check runs for the PR head and no failed, cancelled, timed-out, or incomplete check runs in the first 100 returned check runs inspected. (b82eb56df402)

Likely related people:

  • steipete: Recent commits introduced the Codex app-server turn-activity liveness behavior and centralized messaging delivery evidence used by the runner path this PR edits. (role: recent maintainer and feature-history owner; confidence: high; commits: c22f414c6976, 9e9df8f2c578; files: extensions/codex/src/app-server/run-attempt.ts, src/agents/pi-embedded-runner/run.ts, src/agents/pi-embedded-runner/delivery-evidence.ts)
  • pashpashpash: Recent main history shows several Codex app-server and Codex routing changes around the same runtime surface shortly before this PR. (role: recent adjacent Codex app-server maintainer; confidence: medium; commits: 3f217964d1f9, 1c3399010815, c8f3fecad6fe; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/event-projector.ts)

Remaining risk / open question:

Codex review notes: model gpt-5.5, reasoning high; reviewed against 0d277e9533af.

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 9, 2026
@joshavant joshavant merged commit 5fdef4c into main May 9, 2026
112 of 114 checks passed
@joshavant joshavant deleted the fix/codex-app-server-liveness-78756 branch May 9, 2026 05:38
rdevaul pushed a commit to rdevaul/sybilclaw that referenced this pull request May 9, 2026
Cherry-pick from upstream 5fdef4c. Fixes codex app-server
completion liveness by ignoring account updates during turn.

Conflicts resolved by taking upstream versions (pre-existing
conflict markers in our fork are preserved by this cherry-pick).
rdevaul added a commit to rdevaul/sybilclaw that referenced this pull request May 9, 2026
6 upstream cherry-picks applied:
- fix(telegram): mirror outbound replies to session transcript
- fix(gateway): harden macOS update restart lifecycle
- fix(telegram): harden command menu cache keys
- Reduce Telegram command menu CPU work
- fix(status): show codex usage for codex harness
- fix(codex): ignore account updates for turn liveness (openclaw#79667)

Skipped: imessage, whatsapp, slack fixes (deleted in fork)
lykeion-dev pushed a commit to lykeion-dev/openclaw--rev that referenced this pull request May 14, 2026
* fix codex app-server completion liveness

* docs changelog codex liveness fix
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
* fix codex app-server completion liveness

* docs changelog codex liveness fix
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
* fix codex app-server completion liveness

* docs changelog codex liveness fix
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
* fix codex app-server completion liveness

* docs changelog codex liveness fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling extensions: codex maintainer Maintainer-authored PR proof: sufficient ClawSweeper judged the real behavior proof convincing. size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Codex app-server turns time out after 60s despite meaningful progress

1 participant