Skip to content

Track rate limit headers for proactive throttling #5449

@kshitijk4poor

Description

@kshitijk4poor

Problem

Hermes only reacts to rate limits after receiving a 429. Response headers like X-RateLimit-Remaining and X-RateLimit-Reset (returned by OpenAI, Anthropic, and most providers) are ignored. This means Hermes burns through its quota at full speed, then hits a wall.

Proposed Solution

Parse X-RateLimit-Remaining / X-RateLimit-Reset / X-RateLimit-Limit from successful API responses in the retry loop. When remaining quota drops below a threshold (e.g. 10%), insert a small adaptive delay before the next API call to smooth out request rate and avoid hard 429s.

Lightweight — just header parsing + optional delay, no queuing system needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/agentCore agent loop, run_agent.py, prompt buildertype/featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions