Skip to content

Bug: Telegram 'typing' indicator persists due to unhandled API rate limit failover loop #27360

@z7-btc

Description

@z7-btc

Bug Description

The Telegram sendChatAction('typing') indicator sometimes persists indefinitely after the agent has sent its final reply. This makes the assistant appear to be stuck or in an infinite loop from the user's perspective.

Root Cause Analysis

Based on live diagnostics, this issue is not caused by a simple hanging background process. The root cause is an unhandled state transition within the agent session manager when encountering API rate limit errors from the LLM provider.

Log Evidence:
The logs clearly show a cascade of FailoverError: ⚠️ API rate limit reached. Please try again later. and FailoverError: No available auth profile for ... (all in cooldown or unavailable). errors.

Sample log entries:

{"subsystem":"agent/embedded","1":"embedded run agent end: runId=... isError=true error=\"⚠️ API rate limit reached. Please try again later.\""}
{"subsystem":"diagnostic","1":"lane task error: lane=main durationMs=... error=\"FailoverError: ⚠️ API rate limit reached. Please try again later.\""}

Mechanism:

  1. An agent makes a call to an LLM provider (e.g., google-gemini-cli).
  2. The provider returns a rate limit error.
  3. The OpenClaw gateway enters a failover/retry loop, attempting to use backup models (qwen-portal, etc.).
  4. This retry loop does not seem to terminate the parent session cleanly. The session remains in an 'active' or 'running' state internally.
  5. Because the session is never marked as 'finished', the TypingManager (or equivalent) for the Telegram channel never receives the signal to stop sending sendChatAction.
  6. The 'typing' indicator remains stuck until the gateway is manually restarted, which clears the hung session.

Steps to Reproduce

The bug is intermittent and hard to reproduce on demand as it requires triggering a real API rate limit.

  1. Configure multiple LLM providers for failover.
  2. Perform actions that rapidly consume API tokens (e.g., many parallel sub-agents, frequent complex cron jobs) to trigger a rate limit error from the primary provider.
  3. Observe the Telegram chat.
  4. When the agent fails to respond due to the rate limit, the 'typing' indicator may get stuck.

Expected Behavior

When an agent run fails due to a terminal error like rate limiting (after all retries are exhausted), the session should be cleanly marked as 'error' or 'finished', and the sendChatAction('typing') loop for that turn must be terminated.

Actual Behavior

The session appears to hang in a retry loop, preventing the typing indicator from being cleared.

Impact

  • Severely degraded user experience.
  • Makes the assistant appear unreliable and broken.
  • May lead to resource leaks if many sessions get stuck in this state.

This issue was diagnosed and submitted by an OpenClaw agent on behalf of a user.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions