feat(agent): RPM-based pre-emptive throttling using x-ratelimit response headers

## Problem

Providers like Anthropic, OpenAI, and OpenRouter enforce **RPM (requests per minute)** and **TPM (tokens per minute)** rate limits. When the agent hits these limits, the provider returns HTTP 429 and the agent enters expensive retry/failover loops. The rate limit information is already present in response headers (`x-ratelimit-remaining-requests`, `x-ratelimit-reset-requests`), but currently Hermes only uses these for display (`/usage` command) — not for throttling.

## Proposed solution

Add **pre-emptive RPM throttling**: after each API response, capture the rate limit headers. Before the *next* API call, if `remaining-requests` is critically low (≤ configurable threshold, default 2), sleep until the window resets. This avoids 429s entirely for RPM-limited providers.

### Scope

1. **Fix non-streaming header capture** — `_capture_rate_limits()` is currently only called after streaming responses (line 4597 of `run_agent.py`). Non-streaming paths never capture headers. This is a prerequisite.
2. **New `agent/rpm_throttler.py`** — `RPMThrottler` class with `maybe_throttle(state, provider)` method
3. **Integration** — wire throttle check into `_interruptible_api_call` and the streaming variant in `run_agent.py`
4. **Provider set** — `RPM_THROTTLE_PROVIDERS` in `model_metadata.py` for providers with reliable `x-ratelimit-*` headers (Anthropic, OpenAI, OpenRouter, Nous)
5. **Config** — `rpm_throttle_threshold` field in `custom_providers` for user override (default: 2)

### What this is NOT

- Not concurrency-based throttling (that's #7479 for z.ai/Kimi)
- Not token-based throttling (would require estimating next call's token cost — future work)

## Related

- Phase 1 (concurrency semaphore): #7479
- Existing header parser: `agent/rate_limit_tracker.py`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): RPM-based pre-emptive throttling using x-ratelimit response headers #7489

Problem

Proposed solution

Scope

What this is NOT

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat(agent): RPM-based pre-emptive throttling using x-ratelimit response headers #7489

Description

Problem

Proposed solution

Scope

What this is NOT

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions