Bug type
Behavior bug (incorrect output/state without crash)
Beta release blocker
No
Summary
OpenClaw 2026.5.18 still loses productive Codex app-server turns when the last observed current-turn notification is item/completed and no turn/completed follows.
The already-merged fixes for #78756 and #82171 appear to be present in this installation. The current behavior is therefore not a missing-fix case, but a remaining recovery/turn-semantics problem:
- the session lane enters
processing
- diagnostics report
active_work_without_progress
lastProgress=codex_app_server:notification:item/completed
recovery=none
- after
turnCompletionIdleTimeoutMs, OpenClaw aborts the run
- no useful visible recovery/status is delivered for the failed work
- already-started work is not resumed
This makes chat lanes look silent or stuck and can drop real work after a completed tool call.
Steps to reproduce
- Run OpenClaw with a user-facing chat lane, reproduced here in Discord and Telegram direct chat.
- Configure an OpenAI GPT model to use the Codex app-server runtime.
- Disable model fallbacks to avoid hiding the Codex failure behind Anthropic fallback.
- Set
plugins.entries.codex.config.appServer.turnCompletionIdleTimeoutMs to 180000 to prove which watchdog fires.
- In Discord, ask the agent to do a multi-step file-producing task, for example building a static multi-page web presence from existing project drafts.
- Observe that the assistant completes one tool item and then no
turn/completed arrives.
- Watch diagnostics until the completion-idle timeout fires.
Relevant redacted config used during the test:
{
"agents": {
"defaults": {
"model": {
"primary": "openai/gpt-5.5",
"fallbacks": []
},
"timeoutSeconds": 900
}
},
"plugins": {
"entries": {
"codex": {
"config": {
"appServer": {
"turnCompletionIdleTimeoutMs": 180000
}
}
}
}
}
}
Discord reproduction sequence from the session JSONL:
2026-05-19T08:07:43.604Z user prompt from Discord
2026-05-19T08:07:43.995Z assistant toolCall: bash mkdir -p /home/casper/.openclaw/workspace/artifacts/maria-ward-smartphone-start/site/assets/img
2026-05-19T08:07:44.092Z toolResult: completed exitCode 0 durationMs 0
No subsequent assistant work was written for the requested site build before timeout. The only filesystem result was directory creation.
Expected behavior
OpenClaw should not silently drop a productive Codex app-server turn after a completed tool item if the turn is still expected to continue.
At minimum, if OpenClaw decides the app-server turn is unrecoverably incomplete because turn/completed never arrived, it should:
- release the session lane
- send a visible channel status explaining the failed turn
- preserve enough state to allow the user to retry/resume
- avoid misleading explanations such as user/UI interruption when the log cause is
turn_completion_idle_timeout
- avoid losing already-started work without a user-visible failure/recovery message
Better behavior would distinguish:
- completed tool call followed by expected assistant continuation
- genuinely terminal item completion
- missing/late
turn/completed
- app-server still computing vs. app-server protocol dead-air
Actual behavior
The run is aborted after the completion idle timeout. Diagnostics explicitly say recovery=none.
In the Discord reproduction, only a directory was created; no requested site files were produced. The user saw typing/activity disappear and no useful recovery surfaced.
Subsequent status questions can create confusing assistant explanations that imply a user/UI abort, even though the durable gateway evidence for the original run points to turn_completion_idle_timeout.
OpenClaw version
OpenClaw 2026.5.18 (50a2481)
Operating system
Ubuntu
Install method
npm global
Model
gpt-5.5
Provider / routing chain
openai-codex/gpt-5.5 -> Codex app-server harness -> OpenClaw embedded run -> Discord/Telegram chat lane
Additional provider/model setup details
Fallbacks were disabled during the primary test:
This was intentional to avoid an Anthropic fallback hiding the Codex app-server failure.
turnCompletionIdleTimeoutMs was deliberately raised to 180000 during testing. The same pattern had previously been observed around the default shorter idle behavior; raising the timeout made it clear which watchdog fired.
Earlier tests with fallbacks enabled caused additional confusing behavior: OpenClaw fell back to Anthropic, then hit context overflow/compaction and separate message tool delivery errors.
Related issues/PRs:
Logs, screenshots, and evidence
Gateway log excerpts:
2026-05-19T08:04:04.805Z [agent/embedded]
strict-agentic execution contract active:
runId=fa6f5365-411f-4028-8985-a9ec7a9b35a4
sessionId=ac54314e-d1ad-4145-b8fe-932309953759
provider=openai-codex/gpt-5.5 harness=codex
2026-05-19T08:07:04.822Z [diagnostic]
stalled session:
sessionId=ac54314e-d1ad-4145-b8fe-932309953759
sessionKey=agent:main:discord:channel:1497109509825626232
state=processing age=142s queueDepth=1
reason=active_work_without_progress
classification=stalled_agent_run
activeWorkKind=embedded_run
lastProgress=codex_app_server:notification:item/completed
lastProgressAge=141s
recovery=none
2026-05-19T08:07:34.819Z [diagnostic]
stalled session:
sessionId=ac54314e-d1ad-4145-b8fe-932309953759
sessionKey=agent:main:discord:channel:1497109509825626232
state=processing age=172s queueDepth=1
reason=active_work_without_progress
classification=stalled_agent_run
activeWorkKind=embedded_run
lastProgress=codex_app_server:notification:item/completed
lastProgressAge=171s
recovery=none
2026-05-19T08:07:43.435Z [agent/embedded]
codex app-server turn idle timed out waiting for completion
{
threadId: "019e3f2d-b7f2-7443-ab96-4e72fe219fe1",
turnId: "019e3f43-8034-7001-88af-70ffeb9bdb43",
idleMs: 180003,
timeoutMs: 180000,
lastActivityReason: "notification:item/completed",
lastNotificationMethod: "item/completed"
}
2026-05-19T08:07:43.457Z [agent/embedded]
codex app-server client retired after timed-out turn
{
threadId: "019e3f2d-b7f2-7443-ab96-4e72fe219fe1",
turnId: "019e3f43-8034-7001-88af-70ffeb9bdb43",
reason: "turn_completion_idle_timeout",
clearedSharedClient: true
}
2026-05-19T08:07:44.198Z [agent/embedded]
embedded run failover decision
{
runId: "fa6f5365-411f-4028-8985-a9ec7a9b35a4",
stage: "assistant",
decision: "surface_error",
failoverReason: "timeout",
profileFailureReason: "timeout",
provider: "openai-codex",
model: "gpt-5.5",
fallbackConfigured: false,
timedOut: true,
aborted: true
}
While diagnosing the Discord stall from Telegram, the Telegram direct session itself hit the same failure mode.
2026-05-19T08:14:59.977Z [agent/embedded]
strict-agentic execution contract active:
runId=6e9f7eb1-5418-4d5c-aabc-df8a1e7f7619
sessionId=9578d939-b2fd-4ec9-b65b-8a93348ca570
provider=openai-codex/gpt-5.5 harness=codex
2026-05-19T08:17:38.070Z [diagnostic]
stalled session:
sessionId=9578d939-b2fd-4ec9-b65b-8a93348ca570
sessionKey=agent:main:telegram:direct:287384854
state=processing age=129s queueDepth=1
reason=active_work_without_progress
classification=stalled_agent_run
activeWorkKind=embedded_run
lastProgress=codex_app_server:notification:item/completed
lastProgressAge=129s
recovery=none
2026-05-19T08:18:08.068Z [diagnostic]
stalled session:
sessionId=9578d939-b2fd-4ec9-b65b-8a93348ca570
sessionKey=agent:main:telegram:direct:287384854
state=processing age=159s queueDepth=1
reason=active_work_without_progress
classification=stalled_agent_run
activeWorkKind=embedded_run
lastProgress=codex_app_server:notification:item/completed
lastProgressAge=159s
recovery=none
2026-05-19T08:18:29.525Z [agent/embedded]
codex app-server turn idle timed out waiting for completion
{
threadId: "019e3ef4-0e36-7b32-b9e1-36b98cc115a8",
turnId: "019e3f4d-7f38-74e2-82fc-2557e24a98b1",
idleMs: 180001,
timeoutMs: 180000,
lastActivityReason: "notification:item/completed",
lastNotificationMethod: "item/completed"
}
2026-05-19T08:18:30.061Z [agent/embedded]
embedded run failover decision
{
runId: "6e9f7eb1-5418-4d5c-aabc-df8a1e7f7619",
stage: "assistant",
decision: "surface_error",
failoverReason: "timeout",
profileFailureReason: "timeout",
provider: "openai-codex",
model: "gpt-5.5",
fallbackConfigured: false,
timedOut: true,
aborted: true
}
Impact and severity
Severity: high for user-facing chat lanes using Codex app-server.
Impact:
- User-facing Discord/Telegram lanes can appear silent or stuck.
- Real work may be dropped after a completed tool call.
- Diagnostics say
recovery=none, leaving no clear user-facing recovery path.
- The failure can be confused with a user/UI abort even though logs show
turn_completion_idle_timeout.
- Increasing
turnCompletionIdleTimeoutMs only delays the abort; it does not solve recovery.
Additional information
Why #78756 and #82171 do not fully cover this:
The fixes appear to be present and working in a narrow sense:
- account/rate-limit updates are not prolonging this stall indefinitely
- the session does not wait for the 30-minute terminal cap
- the configured completion-idle watchdog fires
However, that still leaves a correctness/recovery gap:
- productive work can be aborted after the last observed
item/completed
- no useful visible recovery is emitted
- no resume/retry path is provided
- the lane is not self-healing in a user-meaningful way
This looks like a remaining bug adjacent to #82171: the fail-fast behavior prevents long hangs, but it does not provide correct turn semantics or recovery when turn/completed is missing.
Suggested fix direction:
- Preserve and expose a structured recovery result when
turn_completion_idle_timeout fires after item/completed.
- Emit a visible channel message when a user-facing lane aborts due to missing
turn/completed, including the last completed item/tool and retry guidance.
- Add a retry/resume mechanism that restarts the turn with a compact summary of already-completed tool calls and their results.
- Improve app-server protocol handling so that if the final observed current-turn item is a tool result, OpenClaw does not treat silence as terminal without preserving recovery.
- Add diagnostics that distinguish:
turn/completed missing after assistant final text
turn/completed missing after tool result where more assistant work is expected
- raw response completion stalls
- user/UI aborts
Workaround in this environment: avoid the Codex app-server runtime for user-facing chat lanes until this recovery gap is fixed. For OpenAI GPT models, forcing harness=pi is only viable if the OpenAI provider credentials have api.responses.write; otherwise the normal OpenAI Responses API path fails with HTTP 401.
Bug type
Behavior bug (incorrect output/state without crash)
Beta release blocker
No
Summary
OpenClaw
2026.5.18still loses productive Codex app-server turns when the last observed current-turn notification isitem/completedand noturn/completedfollows.The already-merged fixes for #78756 and #82171 appear to be present in this installation. The current behavior is therefore not a missing-fix case, but a remaining recovery/turn-semantics problem:
processingactive_work_without_progresslastProgress=codex_app_server:notification:item/completedrecovery=noneturnCompletionIdleTimeoutMs, OpenClaw aborts the runThis makes chat lanes look silent or stuck and can drop real work after a completed tool call.
Steps to reproduce
plugins.entries.codex.config.appServer.turnCompletionIdleTimeoutMsto180000to prove which watchdog fires.turn/completedarrives.Relevant redacted config used during the test:
{ "agents": { "defaults": { "model": { "primary": "openai/gpt-5.5", "fallbacks": [] }, "timeoutSeconds": 900 } }, "plugins": { "entries": { "codex": { "config": { "appServer": { "turnCompletionIdleTimeoutMs": 180000 } } } } } }Discord reproduction sequence from the session JSONL:
No subsequent assistant work was written for the requested site build before timeout. The only filesystem result was directory creation.
Expected behavior
OpenClaw should not silently drop a productive Codex app-server turn after a completed tool item if the turn is still expected to continue.
At minimum, if OpenClaw decides the app-server turn is unrecoverably incomplete because
turn/completednever arrived, it should:turn_completion_idle_timeoutBetter behavior would distinguish:
turn/completedActual behavior
The run is aborted after the completion idle timeout. Diagnostics explicitly say
recovery=none.In the Discord reproduction, only a directory was created; no requested site files were produced. The user saw typing/activity disappear and no useful recovery surfaced.
Subsequent status questions can create confusing assistant explanations that imply a user/UI abort, even though the durable gateway evidence for the original run points to
turn_completion_idle_timeout.OpenClaw version
OpenClaw 2026.5.18 (50a2481)
Operating system
Ubuntu
Install method
npm global
Model
gpt-5.5
Provider / routing chain
openai-codex/gpt-5.5 -> Codex app-server harness -> OpenClaw embedded run -> Discord/Telegram chat lane
Additional provider/model setup details
Fallbacks were disabled during the primary test:
"fallbacks": []This was intentional to avoid an Anthropic fallback hiding the Codex app-server failure.
turnCompletionIdleTimeoutMswas deliberately raised to180000during testing. The same pattern had previously been observed around the default shorter idle behavior; raising the timeout made it clear which watchdog fired.Earlier tests with fallbacks enabled caused additional confusing behavior: OpenClaw fell back to Anthropic, then hit context overflow/compaction and separate
messagetool delivery errors.Related issues/PRs:
Logs, screenshots, and evidence
Impact and severity
Severity: high for user-facing chat lanes using Codex app-server.
Impact:
recovery=none, leaving no clear user-facing recovery path.turn_completion_idle_timeout.turnCompletionIdleTimeoutMsonly delays the abort; it does not solve recovery.Additional information
Why #78756 and #82171 do not fully cover this:
The fixes appear to be present and working in a narrow sense:
However, that still leaves a correctness/recovery gap:
item/completedThis looks like a remaining bug adjacent to #82171: the fail-fast behavior prevents long hangs, but it does not provide correct turn semantics or recovery when
turn/completedis missing.Suggested fix direction:
turn_completion_idle_timeoutfires afteritem/completed.turn/completed, including the last completed item/tool and retry guidance.turn/completedmissing after assistant final textturn/completedmissing after tool result where more assistant work is expectedWorkaround in this environment: avoid the Codex app-server runtime for user-facing chat lanes until this recovery gap is fixed. For OpenAI GPT models, forcing
harness=piis only viable if the OpenAI provider credentials haveapi.responses.write; otherwise the normal OpenAI Responses API path fails with HTTP 401.