You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In v2026.5.27, isolated agentTurn cron runs using github-copilot/claude-sonnet-4.6 (and likely any github-copilot model on the anthropic-messages transport) silently hang for ~365s with zero progress events, then abort internally with promptError: "This operation was aborted". The cron run is then misclassified as status: ok in the cron history (despite the trajectory's finalStatus: error), suppressing all failure alerts — the cron appears healthy while delivering nothing.
This is the github-copilot variant of the watchdog-hang pattern fixed for claude-cli in #86895 / PR #87546. The fix in #87546 only added stream-json progress tracking for the claude-cli backend; the anthropic-messages transport used by github-copilot's claude models has no equivalent progress signal, so the gateway watchdog cannot distinguish a working-but-silent stream from a wedged one.
Steps to reproduce
Configure an isolated agentTurn cron with model: "github-copilot/claude-sonnet-4.6", no fallbacks, timeoutSeconds: 600, simple bash-then-message-tool prompt.
Trigger the cron (manual or scheduled).
Observe: trajectory file freezes at prompt.submitted for ~365s, then model.completed fires with aborted: true. No tool calls are made. Discord delivery never happens.
Cron history records status: ok, delivered: false, deliveryStatus: not-requested.
Reproduces deterministically across all my morning runs (5/5 attempts at 5:00, 5:15, 5:30, 5:45, 6:00 AM PT today, plus an additional manual run at 6:28 AM PT, all hung identically with durations 366,400–372,640 ms).
Expected behavior
Either:
The github-copilot anthropic-messages transport emits stream/heartbeat progress so the gateway's stuck-session watchdog can distinguish active streams from hung ones, OR
A short transport-level idle timeout (e.g., 30–60s with no first token) triggers a transport error that the failover chain can catch, OR
The cron runner correctly records status: error when the trajectory's finalStatus is error, so existing failure-alert plumbing fires.
Actual behavior
Run hangs for ~6 min with no progress; aborts internally; cron records "success"; nothing delivered; no alerts.
The straightforward fix — adding ["github-copilot/gpt-5.5", "github-copilot/claude-opus-4.6"] as payload.fallbacks — does not work. I confirmed this with a controlled retry: cron history shows fallbackUsed: false, delivered: false, status: ok after a fresh 374-second hang. The fallback chain is loaded but never triggers.
The reason is that pi-ai's failover classifier keys on specific patterns (the documented "An unknown error occurred" stream-wrapper text, HTTP 429, transport errors, etc.). The internal AbortError: "This operation was aborted" from the gateway's stuck-session watchdog does not match any of those patterns, so failover is silently inert for this exact failure mode.
Working mitigation: change the primary to a model on a different transport (github-copilot/gpt-5.5 via openai-responses). That bypasses the broken anthropic-messages code path entirely. Configured fallbacks back to sonnet/opus are still inert if gpt-5.5 ever hits the same class of issue.
Failover-classifier gap: internal AbortError from the stuck-session watchdog is not on pi-ai's failover-worthy error list. Configured fallback chains do not fire for this failure mode. The classifier should probably treat aborted: true + assistantTexts: [] + toolMetas: [] as a failover-worthy outcome regardless of the specific error string.
Cron-runner status misclassification: when the underlying trajectory ends with finalStatus: error and aborted: true, the cron runner still records status: ok in the run history. This means failureAlert blocks don't fire, consecutiveErrors stays at zero, the daily Cron Health Monitor reports the cron as healthy, and the user has no signal that anything is wrong — the cron silently dies for days while reporting green. Suggest the cron runner inspect trace.artifacts.data.finalStatus (or equivalently model.completed.data.aborted combined with empty assistantTexts) to set the correct status, and report deliveryStatus: failed when didSendViaMessagingTool: false on a payload whose prompt requires delivery.
Bugs 2 and 3 are likely worth separate issues, but I've kept them together here because they compose into the user-facing symptom: a cron that hangs, never delivers, never falls back, and never alerts.
Bug type
Crash (process/app hangs)
Beta release blocker
No
Summary
In v2026.5.27, isolated
agentTurncron runs usinggithub-copilot/claude-sonnet-4.6(and likely any github-copilot model on theanthropic-messagestransport) silently hang for ~365s with zero progress events, then abort internally withpromptError: "This operation was aborted". The cron run is then misclassified asstatus: okin the cron history (despite the trajectory'sfinalStatus: error), suppressing all failure alerts — the cron appears healthy while delivering nothing.This is the github-copilot variant of the watchdog-hang pattern fixed for claude-cli in #86895 / PR #87546. The fix in #87546 only added stream-json progress tracking for the claude-cli backend; the
anthropic-messagestransport used by github-copilot's claude models has no equivalent progress signal, so the gateway watchdog cannot distinguish a working-but-silent stream from a wedged one.Steps to reproduce
agentTurncron withmodel: "github-copilot/claude-sonnet-4.6", no fallbacks,timeoutSeconds: 600, simple bash-then-message-tool prompt.prompt.submittedfor ~365s, thenmodel.completedfires withaborted: true. No tool calls are made. Discord delivery never happens.status: ok,delivered: false,deliveryStatus: not-requested.Reproduces deterministically across all my morning runs (5/5 attempts at 5:00, 5:15, 5:30, 5:45, 6:00 AM PT today, plus an additional manual run at 6:28 AM PT, all hung identically with durations 366,400–372,640 ms).
Expected behavior
Either:
status: errorwhen the trajectory'sfinalStatusiserror, so existing failure-alert plumbing fires.Actual behavior
Run hangs for ~6 min with no progress; aborts internally; cron records "success"; nothing delivered; no alerts.
Trajectory artifact (
trace.artifacts.data){ "finalStatus": "error", "aborted": true, "externalAbort": false, "timedOut": false, "idleTimedOut": false, "timedOutDuringCompaction": false, "timedOutDuringToolExecution": false, "promptError": "This operation was aborted", "promptErrorSource": "prompt", "compactionCount": 0, "assistantTexts": [], "itemLifecycle": { "startedCount": 0, "completedCount": 0, "activeCount": 0 }, "toolMetas": [], "didSendViaMessagingTool": false, "messagingToolSentTexts": [], "messagingToolSentTargets": [] }Cron run history (same run, recorded by cron)
{ "action": "finished", "status": "ok", "runAtMs": 1779973200025, "durationMs": 372636, "provider": "github-copilot", "model": "claude-sonnet-4.6", "deliveryStatus": "not-requested", "delivery": { "delivered": false, "fallbackUsed": false } }Note the contradiction:
status: okanddelivered: falseon the cron-side, vsfinalStatus: errorandaborted: trueon the trajectory-side.Environment
~/.nvm/versions/node/v26.2.0/lib/node_modules/openclaw/dist/index.jsgithub-copilot(token exchanged viaproxy.business.githubcopilot.com, enterprise SKUcopilot_for_business_seat_quota, freshly refreshed mid-failure)github-copilot/claude-sonnet-4.6viaanthropic-messagesAPIisolatedbash + message(action=send) + message(action=upload-file)Related issues / context
runtime-pluginsphase (78–80s, before model invocation). Distinct stall point, same overall family.Mitigation in use
The straightforward fix — adding
["github-copilot/gpt-5.5", "github-copilot/claude-opus-4.6"]aspayload.fallbacks— does not work. I confirmed this with a controlled retry: cron history showsfallbackUsed: false,delivered: false,status: okafter a fresh 374-second hang. The fallback chain is loaded but never triggers.The reason is that pi-ai's failover classifier keys on specific patterns (the documented "An unknown error occurred" stream-wrapper text, HTTP 429, transport errors, etc.). The internal
AbortError: "This operation was aborted"from the gateway's stuck-session watchdog does not match any of those patterns, so failover is silently inert for this exact failure mode.Working mitigation: change the primary to a model on a different transport (
github-copilot/gpt-5.5viaopenai-responses). That bypasses the brokenanthropic-messagescode path entirely. Configured fallbacks back to sonnet/opus are still inert if gpt-5.5 ever hits the same class of issue.Three distinct bugs in this one symptom
github-copilot+anthropic-messagestransport opens a stream, never emits a chunk, and is aborted by the internal watchdog ~365s later. (Sibling of Webchat: substantive tool turns intermittently hang in post-tool-result generation → ~365s abort_embedded_run while the host event loop stays healthy, and the turn is silently lost — v2026.5.22 (a374c3a) #86895, fix in Fix Claude live tool progress for watchdog recovery #87546 only covered claude-cli.)AbortErrorfrom the stuck-session watchdog is not on pi-ai's failover-worthy error list. Configured fallback chains do not fire for this failure mode. The classifier should probably treataborted: true+assistantTexts: []+toolMetas: []as a failover-worthy outcome regardless of the specific error string.finalStatus: errorandaborted: true, the cron runner still recordsstatus: okin the run history. This meansfailureAlertblocks don't fire,consecutiveErrorsstays at zero, the daily Cron Health Monitor reports the cron as healthy, and the user has no signal that anything is wrong — the cron silently dies for days while reporting green. Suggest the cron runner inspecttrace.artifacts.data.finalStatus(or equivalentlymodel.completed.data.abortedcombined with emptyassistantTexts) to set the correct status, and reportdeliveryStatus: failedwhendidSendViaMessagingTool: falseon a payload whose prompt requires delivery.Bugs 2 and 3 are likely worth separate issues, but I've kept them together here because they compose into the user-facing symptom: a cron that hangs, never delivers, never falls back, and never alerts.