-
-
Notifications
You must be signed in to change notification settings - Fork 54.6k
Closed as not planned
Closed as not planned
Copy link
Labels
bugSomething isn't workingSomething isn't working
Description
Problem
When a model provider returns a 429 rate limit error, the current behavior is to fail the request and potentially fall back to another profile/model. There's no automatic retry with delay.
Current Behavior
- Rate limits are detected (
FailoverReason: 'rate_limit') - Profile gets marked with a cooldown (
calculateAuthProfileCooldownMs) - Request fails immediately
Proposed Behavior
For 429 responses:
- Check
Retry-Afterheader if present - Apply exponential backoff: 1s, 2s, 4s, 8s, 16s (max 3-5 retries)
- Retry the same request
- Only fail/fallback after exhausting retries
Why This Matters
Rate limits are often transient. Batch operations that hit rate limits shouldn't fail entirely - they should wait and retry. This is especially important for:
- Long-running batch jobs
- Multi-model operations
- High-throughput scenarios
Implementation Notes
The retry logic should integrate at the API call level, possibly in:
src/agents/pi-embedded-runner/run/attempt.ts- Or a wrapper around the provider SDK calls
The existing calculateAuthProfileCooldownMs shows the backoff pattern already in use for profile cooldowns.
Prior Art
- OpenAI SDK has built-in retry
- Anthropic SDK has configurable retry
- Most HTTP clients support retry middleware
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working