Skip to content

feat: Add automatic retry with exponential backoff for rate limits (429) #8894

@joetomasone

Description

@joetomasone

Problem

When a model provider returns a 429 rate limit error, the current behavior is to fail the request and potentially fall back to another profile/model. There's no automatic retry with delay.

Current Behavior

  • Rate limits are detected (FailoverReason: 'rate_limit')
  • Profile gets marked with a cooldown (calculateAuthProfileCooldownMs)
  • Request fails immediately

Proposed Behavior

For 429 responses:

  1. Check Retry-After header if present
  2. Apply exponential backoff: 1s, 2s, 4s, 8s, 16s (max 3-5 retries)
  3. Retry the same request
  4. Only fail/fallback after exhausting retries

Why This Matters

Rate limits are often transient. Batch operations that hit rate limits shouldn't fail entirely - they should wait and retry. This is especially important for:

  • Long-running batch jobs
  • Multi-model operations
  • High-throughput scenarios

Implementation Notes

The retry logic should integrate at the API call level, possibly in:

  • src/agents/pi-embedded-runner/run/attempt.ts
  • Or a wrapper around the provider SDK calls

The existing calculateAuthProfileCooldownMs shows the backoff pattern already in use for profile cooldowns.

Prior Art

  • OpenAI SDK has built-in retry
  • Anthropic SDK has configurable retry
  • Most HTTP clients support retry middleware

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions