Skip to content

Add limits-alternative routing strategy for automatic rate limit failover#21

Merged
erans merged 9 commits intomainfrom
feature/new-routing-when-hit-limits
Oct 29, 2025
Merged

Add limits-alternative routing strategy for automatic rate limit failover#21
erans merged 9 commits intomainfrom
feature/new-routing-when-hit-limits

Conversation

@erans
Copy link
Copy Markdown
Owner

@erans erans commented Oct 29, 2025

Summary

Implements a new limits-alternative routing strategy that automatically switches to alternative providers when rate limits are encountered, with intelligent backoff and automatic recovery.

What's New

Core Features

  • Automatic rate limit detection: Monitors HTTP 429 responses with retry-after header parsing
  • Immediate failover: Switches to alternatives within the same request (no retry needed)
  • Cross-dialect support: Can failover from OpenAI → Anthropic with automatic translation
  • Auto-recovery: Returns to primary providers when rate limits expire
  • Cascading alternatives: Tries all alternatives sequentially if multiple are rate-limited
  • Smart timing: Prioritizes retry-after header, falls back to exponential backoff (60s, 120s, 240s)

Security Hardening

  • Bounded memory: MAX_RATE_LIMIT_ENTRIES (1000) prevents unbounded growth
  • Automatic cleanup: 90% capacity threshold triggers expired entry removal
  • Attack protection: Refuses new entries if at capacity
  • Capped retry-after: MAX_RETRY_AFTER_SECS (48 hours) prevents indefinite blocking
  • Type-safe errors: Structured Error::RateLimitExceeded eliminates string parsing vulnerabilities

Observability

Three new Prometheus metrics:

  • lunaroute_rate_limits_total{provider, model} - Total rate limit events
  • lunaroute_rate_limit_alternatives_used{primary_provider, alternative_provider, model} - Alternative usage
  • lunaroute_rate_limit_backoff_seconds{provider} - Backoff durations

Configuration Example

routing:
  rules:
    - name: "gpt-with-rate-limit-protection"
      priority: 100
      matcher:
        model_pattern: "^gpt-.*"
      strategy:
        type: "limits-alternative"
        primary_providers:
          - "openai-primary"
          - "openai-backup"
        alternative_providers:
          - "anthropic-primary"
          - "anthropic-backup"
        exponential_backoff_base_secs: 60

How It Works

  1. Normal operation: Requests go to primary providers in order
  2. Rate limit detected: Provider returns HTTP 429 with retry-after header
  3. Immediate switch: Router tries alternatives within same request
  4. State tracking: Provider marked as rate-limited with expiration timestamp
  5. Alternative cascade: If alternative is also rate-limited, try next alternative
  6. Automatic recovery: Primary providers become available again after rate limit expires
  7. Backoff fallback: If no retry-after header, uses exponential backoff

Implementation

Commits (9 total)

  1. Phase 1 & 2: Core strategy implementation with rate limit state tracking
  2. Phase 3: Router integration with immediate alternative retry
  3. Phase 4 & 5: Observability metrics and configuration
  4. Phase 6: Comprehensive integration tests (6 tests)
  5. Phase 7: Documentation and polish
  6. Metrics: Prometheus instrumentation
  7. Security HIGH: Bounded memory with capacity limits
  8. Security MEDIUM: Structured errors replacing string parsing
  9. Documentation: Complete user-facing docs in README

Files Changed

  • Created:

    • crates/lunaroute-egress/src/retry_after.rs (233 lines)
    • crates/lunaroute-integration-tests/tests/limits_alternative_strategy.rs (679 lines)
    • docs/limits-alternative-routing-strategy.md (1,257 lines)
  • Modified:

    • crates/lunaroute-core/src/error.rs (added RateLimitExceeded variant)
    • crates/lunaroute-egress/src/lib.rs (structured error conversion)
    • crates/lunaroute-egress/src/openai.rs (retry-after extraction, 2 locations)
    • crates/lunaroute-egress/src/anthropic.rs (retry-after extraction)
    • crates/lunaroute-routing/src/strategy.rs (500+ lines with LimitsAlternative strategy)
    • crates/lunaroute-routing/src/provider_router.rs (immediate retry + metrics)
    • crates/lunaroute-observability/src/metrics.rs (3 new metrics)
    • examples/configs/routing-strategies.yaml (complete example)
    • crates/lunaroute-routing/README.md (comprehensive strategy documentation)
    • 6 test files for Router::new() signature updates

Testing

Test Coverage: 838 tests passing ✅

  • 5 unit tests: retry-after parsing (numeric, HTTP-date, edge cases)
  • 11 unit tests: rate limit state and strategy logic
  • 6 integration tests:
    • Basic rate limit switch
    • Cross-dialect alternative (OpenAI → Anthropic)
    • Cascade through alternatives
    • Auto-recovery to primary
    • All providers rate-limited error
    • Exponential backoff fallback
  • 3 additional tests: retry-after capping and validation

Verification

  • ✅ Build: Both debug and release passing
  • ✅ Tests: 838 tests passing (64 egress + 118 routing + 113 ingress + 543 other)
  • ✅ Clippy: 0 warnings with -D warnings flag
  • ✅ Pre-commit hooks: All passing

Documentation

  • ✅ Complete planning document with 7 phases (26 tasks, 119 subtasks)
  • ✅ User-facing README with strategy explanation and examples
  • ✅ YAML configuration examples
  • ✅ Rustdoc comments throughout codebase
  • ✅ Example flow diagrams in README

Production Readiness

Security ✅

  • Bounded memory usage (1000 entry limit)
  • Capped retry-after values (48 hours max)
  • Type-safe error handling (no string parsing)
  • Automatic cleanup of expired entries

Performance ✅

  • Lock-free concurrency (DashMap)
  • Atomic operations (AcqRel ordering)
  • Immediate failover (no request retry delay)
  • Efficient state cleanup

Observability ✅

  • Three Prometheus metrics
  • Session metadata captures rate limit switches
  • Warning logs for unusual retry-after values
  • Debug logs for all state transitions

Breaking Changes

None - fully backward compatible. Existing routing strategies (round-robin, weighted-round-robin) unchanged.

Future Enhancements (Not in this PR)

The following are potential future improvements documented in the planning doc:

  1. Persistent rate limit state (Redis/storage for restart survival)
  2. Proactive monitoring (detect approaching limits)
  3. Cost-aware selection (prefer cheaper alternatives)
  4. Rate limit budget tracking
  5. Predictive switching based on consumption patterns
  6. Manual override API

Status: Production-ready and ready for merge.

Test command: cargo test --workspace

Example config: See examples/configs/routing-strategies.yaml for complete working example

🤖 Generated with Claude Code

erans and others added 9 commits October 28, 2025 15:35
Implement automatic failover to alternative providers when rate limits
are encountered, with intelligent backoff and auto-recovery.

Phase 1: Header Parsing and Error Enhancement
- Add parse_retry_after() function supporting numeric and HTTP-date formats
- Update OpenAI connector to parse retry-after headers (2 locations)
- Update Anthropic connector to parse retry-after headers
- Populate retry_after_secs in EgressError::RateLimitExceeded
- Add 5 unit tests for retry-after parsing (all passing)

Phase 2: Core Strategy Implementation
- Add RateLimitState struct with expiration checking and exponential backoff
- Extend RoutingStrategy enum with LimitsAlternative variant
- Implement rate limit tracking using DashMap (lock-free, thread-safe)
- Implement provider selection with primary→alternative cascade
- Add AllProvidersRateLimited and InvalidLimitsAlternative error variants
- Add 11 unit tests for rate limit functionality (all passing)

Key Features:
- Priority 1: Use retry-after header from provider (when available)
- Priority 2: Exponential backoff (60s→120s→240s) when header missing
- Cross-dialect support (e.g., OpenAI → Anthropic with auto-translation)
- Automatic recovery to primary providers when rate limits expire
- Thread-safe concurrent access via atomic operations and DashMap

Test Results:
- Egress: 61 tests passing (5 new)
- Routing: 118 tests passing (11 new)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Modifications to crates/lunaroute-routing/src/provider_router.rs:
- Added extract_rate_limit_info() method to parse rate limit errors
  from error messages
- Modified try_provider() to accept strategy and rule_name parameters
  for rate limit context tracking
- Updated try_provider() to detect rate limit errors and record them
  in strategy state for LimitsAlternative strategies
- Updated send() method to extract strategy reference and pass it
  through to all try_provider() calls
- Added warning logs when providers are rate-limited

This completes the integration of rate limit detection and tracking
in the router layer. When a LimitsAlternative strategy is used and
a provider returns a rate limit error, the router now:
1. Extracts retry-after information from the error
2. Records the rate limit in the strategy state
3. Logs a warning with rate limit details
4. On next request, strategy will automatically skip rate-limited
   providers and try alternatives

All 118 routing tests passing.
All integration tests passing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
## Phase 4: Observability (crates/lunaroute-observability/src/metrics.rs)

Added three new Prometheus metrics for rate limit tracking:

1. **rate_limits_total** (CounterVec)
   - Labels: provider, model
   - Tracks total rate limit errors encountered

2. **rate_limit_alternatives_used** (CounterVec)
   - Labels: primary_provider, alternative_provider, model
   - Tracks when alternative providers are used due to rate limits

3. **rate_limit_backoff_seconds** (HistogramVec)
   - Labels: provider
   - Buckets: 60s, 120s, 240s, 480s, 960s, 1920s, 3840s
   - Tracks backoff duration distribution

Added helper methods:
- `record_rate_limit(provider, model, backoff_secs)`: Records rate limit event
- `record_alternative_used(primary, alternative, model)`: Records alternative usage

These metrics enable monitoring of:
- Rate limit frequency per provider/model
- Alternative provider usage patterns
- Backoff duration distribution for capacity planning

## Phase 5: Configuration (examples/configs/routing-strategies.yaml)

Added complete example of limits-alternative routing strategy:

```yaml
strategy:
  type: "limits-alternative"
  primary_providers:
    - "openai-primary"
    - "openai-backup"
  alternative_providers:
    - "anthropic-primary"
    - "anthropic-backup"
  exponential_backoff_base_secs: 60
```

Example demonstrates:
- Cross-dialect failover (OpenAI → Anthropic)
- Multiple primaries and alternatives
- Automatic format translation
- Exponential backoff configuration

The LimitsAlternative strategy is now fully configurable via YAML
thanks to serde support added in Phase 2.

All 36 observability tests passing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…trategy

Added 6 integration tests covering all key scenarios:
- test_basic_rate_limit_switch: Primary returns 429, alternative succeeds
- test_cross_dialect_alternative: OpenAI→Anthropic failover with translation
- test_cascade_through_alternatives: Sequential failover through multiple rate-limited alternatives
- test_auto_recovery_to_primary: Automatic return to primary after retry-after expires
- test_all_providers_rate_limited: Error when all providers exhausted
- test_exponential_backoff_without_retry_after: Fallback backoff when header missing

Critical router enhancement: Added immediate alternative retry loop in provider_router.rs
- When rate limit detected, router now tries alternatives immediately within same request
- Tracks tried providers to prevent infinite loops
- Continues loop only if alternative also returns rate limit
- Stops on non-rate-limit errors or when alternatives exhausted
- Refactored to address clippy warnings (collapsible-if, while-let-loop)

All 6 tests passing. This completes the core functionality of limits-alternative strategy.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Updated planning document with:
- Status changed from "Planning" to "✅ Completed & Tested"
- All 26 tasks marked complete (100% progress)
- Comprehensive changelog entries for Phases 3-7
- Implementation summary section with key highlights
- Technical details: lock-free concurrency, atomic operations, priority-based timing
- Test results: 298 total tests passing
- Files modified/created summary
- Production-ready status confirmation

Documentation now provides complete reference for:
- Feature overview and motivation
- Design decisions and trade-offs
- Implementation details for all 7 phases
- Configuration examples and usage
- Testing strategy and results
- Future enhancement ideas

Feature is fully implemented, tested, and documented. Ready for production use.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit wires up the Prometheus metrics that were defined in Phase 4
but never instrumented in the Router. This ensures observability of rate
limit behavior in production.

Changes:
1. Add metrics field to Router struct
   - Added Optional<Arc<Metrics>> field for backward compatibility
   - Updated Router::new() to accept metrics parameter
   - Updated Router::with_defaults() to pass None

2. Wire up metrics calls in router
   - Record rate_limits_total when rate limit detected (provider_router.rs:226-231)
   - Record rate_limit_alternatives_used when alternative succeeds (provider_router.rs:335-342)
   - Metrics recorded with provider, model, and backoff duration

3. Update all test files to use new Router::new() signature
   - Enhanced test_basic_rate_limit_switch to create metrics and verify counters
   - Updated Router::new() calls in 6 test files (13 total calls)
   - Files updated:
     * crates/lunaroute-routing/src/provider_router.rs (2 unit test calls)
     * crates/lunaroute-routing/tests/router_integration.rs (2 calls)
     * crates/lunaroute-routing/tests/router_streaming_integration.rs (2 calls)
     * crates/lunaroute-integration-tests/tests/limits_alternative_strategy.rs (6 calls)
     * crates/lunaroute-integration-tests/tests/router_observability_integration.rs (2 calls)
     * crates/lunaroute-integration-tests/tests/streaming_e2e.rs (1 call)

4. Add lunaroute-observability dependency
   - Added to crates/lunaroute-routing/Cargo.toml
   - Enables cross-crate metrics instrumentation

Test Results:
- All routing unit tests passing (14 tests)
- All routing integration tests passing (5 + 3 tests)
- All limits-alternative integration tests passing (6 tests)
- Metrics verification in test_basic_rate_limit_switch confirms instrumentation works

Phase 7.5 Complete ✅

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Addresses security findings from rust-web-security-reviewer:

1. HIGH: Fix unbounded growth of rate limit state map
   - Add MAX_RATE_LIMIT_ENTRIES constant (1000 entries)
   - Implement size checking at 90% capacity threshold
   - Clean up expired entries when approaching limit
   - Refuse new entries and log warnings if still at capacity after cleanup
   - Location: crates/lunaroute-routing/src/strategy.rs:77,266-288

2. MEDIUM: Cap retry-after values to prevent indefinite blocking
   - Add MAX_RETRY_AFTER_SECS constant (48 hours = 172800s)
   - Cap all retry-after values at 48 hours maximum
   - Log warnings for values exceeding 24 hours (WARN_THRESHOLD_SECS)
   - Based on research: OpenAI can return up to 24h for daily quota limits
   - Location: crates/lunaroute-egress/src/retry_after.rs:17-138

Security rationale:
- Unbounded growth: Prevents memory exhaustion attacks via fake provider IDs
- Retry-after cap: Prevents malicious/misconfigured providers from blocking indefinitely
- Both fixes include structured logging for monitoring and alerting

Test results:
- All 8 retry-after parsing tests passing (3 new tests added)
- All 23 strategy tests passing
- All 6 limits-alternative integration tests passing
- No regressions in existing functionality

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Replace brittle string parsing for rate limit detection with proper
structured error handling. This addresses MEDIUM priority security
issue identified by rust-web-security-reviewer.

Changes:
- Add RateLimitExceeded variant to core Error enum
- Update From<EgressError> to preserve structured rate limit info
- Replace extract_rate_limit_info() string parsing with pattern matching
- Remove fragile error message parsing that could be manipulated

Security Benefits:
- No longer relies on error message format (fragile and error-prone)
- Prevents potential manipulation of error strings
- Uses type-safe pattern matching on structured errors
- Cleaner separation of concerns (egress layer owns rate limit detection)

Testing:
- All 301 workspace tests passing
- Clippy checks passing
- No behavioral changes, only internal refactoring

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Updated comprehensive documentation for the limits-alternative routing
strategy including Phase 7.5 metrics instrumentation and security fixes.

Changes:
- Updated limits-alternative-routing-strategy.md:
  * Added Phase 7.5 changelog (metrics instrumentation)
  * Added Security Fixes changelog (HIGH + MEDIUM priority fixes)
  * Enhanced Implementation Summary with Security Features section
  * Updated status to "Production-Ready with Security Fixes"
  * Updated test counts (298 → 838 tests)
  * Updated commit count and file modification lists

- Updated routing crate README.md:
  * Added comprehensive "Limits-Alternative Strategy" section
  * Added YAML configuration examples
  * Added characteristics, use cases, and observability details
  * Added "How it works" 7-step explanation with example flow
  * Added API usage example in Rust
  * Added "Example 3: Rate Limit Protection" configuration

Documentation now covers:
- What: Automatic rate limit detection and alternative switching
- How: Detailed flow with HTTP 429 detection → alternative → recovery
- Config: YAML examples for strategy and provider setup
- Security: Bounded memory, capped retry-after, type-safe errors
- Observability: Three Prometheus metrics
- API: Rust code examples

All 838 tests passing. Feature production-ready.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@erans erans merged commit fad09bc into main Oct 29, 2025
8 checks passed
@erans erans deleted the feature/new-routing-when-hit-limits branch October 29, 2025 02:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant