Skip to content

feat(agent): RPM-based pre-emptive throttling using x-ratelimit response headers #7489

@Tranquil-Flow

Description

@Tranquil-Flow

Problem

Providers like Anthropic, OpenAI, and OpenRouter enforce RPM (requests per minute) and TPM (tokens per minute) rate limits. When the agent hits these limits, the provider returns HTTP 429 and the agent enters expensive retry/failover loops. The rate limit information is already present in response headers (x-ratelimit-remaining-requests, x-ratelimit-reset-requests), but currently Hermes only uses these for display (/usage command) — not for throttling.

Proposed solution

Add pre-emptive RPM throttling: after each API response, capture the rate limit headers. Before the next API call, if remaining-requests is critically low (≤ configurable threshold, default 2), sleep until the window resets. This avoids 429s entirely for RPM-limited providers.

Scope

  1. Fix non-streaming header capture_capture_rate_limits() is currently only called after streaming responses (line 4597 of run_agent.py). Non-streaming paths never capture headers. This is a prerequisite.
  2. New agent/rpm_throttler.pyRPMThrottler class with maybe_throttle(state, provider) method
  3. Integration — wire throttle check into _interruptible_api_call and the streaming variant in run_agent.py
  4. Provider setRPM_THROTTLE_PROVIDERS in model_metadata.py for providers with reliable x-ratelimit-* headers (Anthropic, OpenAI, OpenRouter, Nous)
  5. Configrpm_throttle_threshold field in custom_providers for user override (default: 2)

What this is NOT

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/agentCore agent loop, run_agent.py, prompt buildertype/featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions