Skip to content

feat(middleware): retry with rotated auth key on rate-limit errors #423

@alexey-pelykh

Description

@alexey-pelykh

Context

Part of #415 Phase 1, item 4. When an agent has multiple auth profiles configured (auth: ["key1", "key2"]), a rate-limit or auth error from the CLI should trigger a retry with the next key in the rotation.

Scope

  1. Detect rate-limit / auth errors from CLI subprocess output:

    • Exit codes indicating rate limits
    • Stderr patterns (e.g., "rate limit", "429", "quota exceeded")
    • `AgentRunResult.errorSubtype === "rate_limit"`
    • `AgentRunResult.stopReason` indicating auth failure
  2. When detected AND agent has multiple keys:

    • Advance rotation index to next key
    • Retry the execution with the new key injected
    • Cap retries at auth.length (try each key at most once per message)
  3. When detected AND agent has single key or auth: false:

    • No retry — surface the error as today

Design considerations

  • Retry happens at the ChannelBridge / agent-runner level, not inside the runtime
  • Each retry spawns a fresh CLI process (consistent with existing per-message model)
  • Retry budget: one attempt per key in the array, no more
  • Logging: log which key failed and which is being tried next (profile ID, not the key itself)

Tests

  • Rate-limit error with multi-key config triggers retry with next key
  • Retry succeeds with second key → normal response returned
  • All keys fail → error surfaced to user
  • Single key config → no retry attempted
  • auth: false → no retry attempted
  • Retry count never exceeds number of configured keys

Depends on

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions