Description
When all configured LLM providers fail authentication simultaneously, the gateway continues running (process stays alive, LaunchAgent healthy) but is completely unable to process any messages. No notification is sent to the user via any channel.
Steps to Reproduce
- Configure primary model (e.g.,
openai-codex/gpt-5.3-codex) with OAuth
- Configure fallback model (e.g.,
anthropic/claude-opus-4-6) with token auth
- Wait for the OAuth access token to expire (~60-90 min for Codex OAuth)
- If the auto-refresh fails silently, both primary and fallback will return 401
- Observe: gateway process stays alive, LaunchAgent shows healthy, but no messages are processed and no alert is sent
What I Observed
From gateway.err.log (2026-02-18, UTC):
03:59:27 - Embedded agent failed: openai-codex: LLM request timed out | anthropic: HTTP 401 Invalid bearer token
05:16:12 - Embedded agent failed: openai-codex: HTTP 401 Invalid bearer token | anthropic: HTTP 401 Invalid bearer token
06:16:48 - Embedded agent failed: openai-codex: HTTP 401 | anthropic: HTTP 401
07:10:02 - Embedded agent failed: openai-codex: HTTP 401 | anthropic: HTTP 401
08:03:09 - Embedded agent failed: Codex cooldown | Anthropic cooldown
The gateway was effectively dead for ~4 hours with no user-facing notification. The only way to detect the issue was to manually inspect gateway.err.log.
Expected Behavior
When all configured models fail auth (and the failure persists for, say, 2+ consecutive turns), the gateway should:
- Send a notification to the configured admin contact (e.g., via the first available channel, or log to a webhook)
- Increase the heartbeat frequency or emit a health event
- Surface the failure in
openclaw status as an error state (not just "running")
Environment
- OpenClaw v2026.2.17
- macOS, LaunchAgent
- Primary:
openai-codex/gpt-5.3-codex (OAuth)
- Fallback:
anthropic/claude-opus-4-6 (OAT token)
- Root cause: Codex OAuth access token expired, auto-refresh failed silently
Description
When all configured LLM providers fail authentication simultaneously, the gateway continues running (process stays alive, LaunchAgent healthy) but is completely unable to process any messages. No notification is sent to the user via any channel.
Steps to Reproduce
openai-codex/gpt-5.3-codex) with OAuthanthropic/claude-opus-4-6) with token authWhat I Observed
From
gateway.err.log(2026-02-18, UTC):The gateway was effectively dead for ~4 hours with no user-facing notification. The only way to detect the issue was to manually inspect
gateway.err.log.Expected Behavior
When all configured models fail auth (and the failure persists for, say, 2+ consecutive turns), the gateway should:
openclaw statusas an error state (not just "running")Environment
openai-codex/gpt-5.3-codex(OAuth)anthropic/claude-opus-4-6(OAT token)