You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Behavior bug (incorrect output/state without crash)
Beta release blocker
No
Summary
On 2026.5.28, when an openai-codex auth profile hits its subscription cap and the upstream reports a "next reset in N days" timestamp, OpenClaw stores that timestamp verbatim into auth-state.json as blockedUntil; with fallbacks: [], the probe-during-cooldown path short-circuits on hasFallbackCandidates, so the profile is never re-probed and stays blocked for days even after the rolling cap has recovered.
Steps to reproduce
Install OpenClaw 2026.5.28 and @openclaw/codex@2026.5.28, configure with agents.defaults.model.primary: openai-codex/gpt-5.5 and fallbacks: [].
Drive enough usage to exhaust the rolling weekly cap (in this case, an accidental heartbeat firing every 30 min for ~24 hours).
Observe the upstream returns: You've reached your Codex subscription usage limit. Next reset in 6 days, Jun 7 at 3:43 PM UTC.
Check auth-state.json at agents/main/agent/auth-state.json:
Wait 3 days. Observe every scheduled cron lane logs decision=skip_candidate ... Provider openai-codex is in cooldown (suspending lanes). No model calls made.
Run openclaw infer model run --prompt "say hello in one word" directly. Returns successfully — the upstream API is callable. The block is purely OpenClaw-side stale state.
Expected behavior
After the upstream's rolling cap recovers (which happens before the reported "next reset" since it's a rolling window, not a discrete reset), OpenClaw should re-probe the primary and resume serving calls. With no fallback configured, recovery probing should still happen, since "is the primary callable yet?" is a recovery question, not a fallback-switching question.
Actual behavior
The profile stays blocked until blockedUntil arrives literally, regardless of actual API state. In dist/model-fallback-DRgKirrj.js:
The early return on !hasFallbackCandidates means with fallbacks: [], no probe ever fires. Gateway logs confirm: ~250 skip_candidate entries over 3 days, zero attempts at the actual upstream.
OpenClaw version
2026.5.28
Operating system
Ubuntu 24.04
Install method
npm global
Model
openai-codex/gpt-5.5
Provider / routing chain
openclaw -> @openclaw/codex@2026.5.28 -> openai (ChatGPT Plus OAuth)
Additional provider/model setup details
Single auth profile: openai-codex:<account> (OAuth, ChatGPT Plus subscription)
agents.defaults.model.fallbacks: [] (no fallback configured)
Jun 02 12:06:10 [model-fallback/decision] decision=candidate_failed
requested=openai-codex/gpt-5.5 candidate=openai-codex/gpt-5.5
reason=rate_limit next=none
detail=You've reached your Codex subscription usage limit. Next reset in 6 days, Jun 7 at 3:43 PM UTC.
Jun 02 14:30:00 [model-fallback/decision] decision=skip_candidate
requested=openai-codex/gpt-5.5 candidate=openai-codex/gpt-5.5
reason=rate_limit next=none
detail=Provider openai-codex is in cooldown (suspending lanes)
(repeats every scheduled cron tick for 3 days)
The auth-state.json snippet above. Direct openclaw infer model run succeeded immediately after manually clearing blockedUntil.
Impact and severity
Affected: any single-host OpenClaw install with one upstream and fallbacks: [] that hits a subscription cap.
Severity: blocks workflow — scheduled crons and channel replies stop posting for the entire duration of blockedUntil.
Frequency: triggered once per cap exhaustion, then sticks until manual intervention.
Consequence: agents go silent for days. In our case, 3 days of no replies to scheduled telegram interactions and four daily cron jobs not firing.
Two suggested minimal fixes (either alone would have prevented this):
Cap blockedUntil for subscription_limit reasons. Store min(reportedReset, now + MAX_SUBSCRIPTION_BLOCK_MS). With a cap of e.g. 1 hour, the profile gets re-probed an hour later; if still exhausted, the upstream returns the same error and the block is re-armed; if recovered, work resumes. Keep the reported timestamp in a separate expectedFullResetAt field for display only.
Drop the hasFallbackCandidates short-circuit for recovery probes. Split shouldProbePrimaryDuringCooldown into "should we try a fallback now?" (legitimately needs fallback candidates) and "should we re-probe the primary now?" (doesn't). The recovery-probe branch should fire on any time-based throttle regardless of fallback configuration.
Workaround currently in place: hourly cron clearing blockedUntil for subscription_limit blocks where blockedUntil > now + 12h AND lastFailureAt < now - 6h.
Bug type
Behavior bug (incorrect output/state without crash)
Beta release blocker
No
Summary
On 2026.5.28, when an
openai-codexauth profile hits its subscription cap and the upstream reports a "next reset in N days" timestamp, OpenClaw stores that timestamp verbatim intoauth-state.jsonasblockedUntil; withfallbacks: [], the probe-during-cooldown path short-circuits onhasFallbackCandidates, so the profile is never re-probed and stays blocked for days even after the rolling cap has recovered.Steps to reproduce
2026.5.28and@openclaw/codex@2026.5.28, configure withagents.defaults.model.primary: openai-codex/gpt-5.5andfallbacks: [].You've reached your Codex subscription usage limit. Next reset in 6 days, Jun 7 at 3:43 PM UTC.auth-state.jsonatagents/main/agent/auth-state.json:decision=skip_candidate ... Provider openai-codex is in cooldown (suspending lanes). No model calls made.openclaw infer model run --prompt "say hello in one word"directly. Returns successfully — the upstream API is callable. The block is purely OpenClaw-side stale state.Expected behavior
After the upstream's rolling cap recovers (which happens before the reported "next reset" since it's a rolling window, not a discrete reset), OpenClaw should re-probe the primary and resume serving calls. With no fallback configured, recovery probing should still happen, since "is the primary callable yet?" is a recovery question, not a fallback-switching question.
Actual behavior
The profile stays blocked until
blockedUntilarrives literally, regardless of actual API state. Indist/model-fallback-DRgKirrj.js:The early return on
!hasFallbackCandidatesmeans withfallbacks: [], no probe ever fires. Gateway logs confirm: ~250skip_candidateentries over 3 days, zero attempts at the actual upstream.OpenClaw version
2026.5.28
Operating system
Ubuntu 24.04
Install method
npm global
Model
openai-codex/gpt-5.5
Provider / routing chain
openclaw -> @openclaw/codex@2026.5.28 -> openai (ChatGPT Plus OAuth)
Additional provider/model setup details
openai-codex:<account>(OAuth, ChatGPT Plus subscription)agents.defaults.model.fallbacks: [](no fallback configured)compaction.maxActiveTranscriptBytes: "500kb",truncateAfterCompaction: trueauth.cooldowns: {}(defaults)Logs, screenshots, and evidence
The
auth-state.jsonsnippet above. Directopenclaw infer model runsucceeded immediately after manually clearingblockedUntil.Impact and severity
fallbacks: []that hits a subscription cap.blockedUntil.Additional information
quota_waitstate separate fromreauth_required). This bug is the concrete shape of one of the problems Feature request: native Codex quota/auth diagnosis plus brokered reauth execution #54278 describes.blockedUntilforsubscription_limitreasons. Storemin(reportedReset, now + MAX_SUBSCRIPTION_BLOCK_MS). With a cap of e.g. 1 hour, the profile gets re-probed an hour later; if still exhausted, the upstream returns the same error and the block is re-armed; if recovered, work resumes. Keep the reported timestamp in a separateexpectedFullResetAtfield for display only.hasFallbackCandidatesshort-circuit for recovery probes. SplitshouldProbePrimaryDuringCooldowninto "should we try a fallback now?" (legitimately needs fallback candidates) and "should we re-probe the primary now?" (doesn't). The recovery-probe branch should fire on any time-based throttle regardless of fallback configuration.blockedUntilforsubscription_limitblocks whereblockedUntil > now + 12hANDlastFailureAt < now - 6h.