Problem
Providers like Anthropic, OpenAI, and OpenRouter enforce RPM (requests per minute) and TPM (tokens per minute) rate limits. When the agent hits these limits, the provider returns HTTP 429 and the agent enters expensive retry/failover loops. The rate limit information is already present in response headers (x-ratelimit-remaining-requests, x-ratelimit-reset-requests), but currently Hermes only uses these for display (/usage command) — not for throttling.
Proposed solution
Add pre-emptive RPM throttling: after each API response, capture the rate limit headers. Before the next API call, if remaining-requests is critically low (≤ configurable threshold, default 2), sleep until the window resets. This avoids 429s entirely for RPM-limited providers.
Scope
- Fix non-streaming header capture —
_capture_rate_limits() is currently only called after streaming responses (line 4597 of run_agent.py). Non-streaming paths never capture headers. This is a prerequisite.
- New
agent/rpm_throttler.py — RPMThrottler class with maybe_throttle(state, provider) method
- Integration — wire throttle check into
_interruptible_api_call and the streaming variant in run_agent.py
- Provider set —
RPM_THROTTLE_PROVIDERS in model_metadata.py for providers with reliable x-ratelimit-* headers (Anthropic, OpenAI, OpenRouter, Nous)
- Config —
rpm_throttle_threshold field in custom_providers for user override (default: 2)
What this is NOT
Related
Problem
Providers like Anthropic, OpenAI, and OpenRouter enforce RPM (requests per minute) and TPM (tokens per minute) rate limits. When the agent hits these limits, the provider returns HTTP 429 and the agent enters expensive retry/failover loops. The rate limit information is already present in response headers (
x-ratelimit-remaining-requests,x-ratelimit-reset-requests), but currently Hermes only uses these for display (/usagecommand) — not for throttling.Proposed solution
Add pre-emptive RPM throttling: after each API response, capture the rate limit headers. Before the next API call, if
remaining-requestsis critically low (≤ configurable threshold, default 2), sleep until the window resets. This avoids 429s entirely for RPM-limited providers.Scope
_capture_rate_limits()is currently only called after streaming responses (line 4597 ofrun_agent.py). Non-streaming paths never capture headers. This is a prerequisite.agent/rpm_throttler.py—RPMThrottlerclass withmaybe_throttle(state, provider)method_interruptible_api_calland the streaming variant inrun_agent.pyRPM_THROTTLE_PROVIDERSinmodel_metadata.pyfor providers with reliablex-ratelimit-*headers (Anthropic, OpenAI, OpenRouter, Nous)rpm_throttle_thresholdfield incustom_providersfor user override (default: 2)What this is NOT
Related
agent/rate_limit_tracker.py