Problem
Hermes only reacts to rate limits after receiving a 429. Response headers like X-RateLimit-Remaining and X-RateLimit-Reset (returned by OpenAI, Anthropic, and most providers) are ignored. This means Hermes burns through its quota at full speed, then hits a wall.
Proposed Solution
Parse X-RateLimit-Remaining / X-RateLimit-Reset / X-RateLimit-Limit from successful API responses in the retry loop. When remaining quota drops below a threshold (e.g. 10%), insert a small adaptive delay before the next API call to smooth out request rate and avoid hard 429s.
Lightweight — just header parsing + optional delay, no queuing system needed.
Problem
Hermes only reacts to rate limits after receiving a 429. Response headers like
X-RateLimit-RemainingandX-RateLimit-Reset(returned by OpenAI, Anthropic, and most providers) are ignored. This means Hermes burns through its quota at full speed, then hits a wall.Proposed Solution
Parse
X-RateLimit-Remaining/X-RateLimit-Reset/X-RateLimit-Limitfrom successful API responses in the retry loop. When remaining quota drops below a threshold (e.g. 10%), insert a small adaptive delay before the next API call to smooth out request rate and avoid hard 429s.Lightweight — just header parsing + optional delay, no queuing system needed.