Skip to content

[FEATURE][PLUGIN]: Create Retry with Backoff plugin #1002

@crivetimihai

Description

@crivetimihai

Overview

Create a Retry with Backoff Plugin that annotates tool and resource responses with retry and backoff policy metadata for improved error handling.

Plugin Requirements

Plugin Details

  • Name: RetryWithBackoffPlugin
  • Type: Self-contained (native) plugin
  • File Location: plugins/retry_with_backoff/
  • Complexity: Medium

Functionality

  • Add retry policy metadata to tool and resource responses
  • Configurable backoff strategies (exponential, linear, fixed)
  • Per-tool and per-error-type retry policies
  • Integration with HTTP status codes and error patterns
  • Circuit breaker pattern support

Hook Integration

  • Primary Hooks: tool_post_invoke, resource_post_fetch
  • Purpose: Provide retry guidance for failed operations
  • Behavior: Annotate responses with retry metadata based on error type

Configuration Schema

plugins:
  - name: "RetryWithBackoff"
    kind: "plugins.retry_with_backoff.retry.RetryWithBackoffPlugin"
    description: "Annotate retry/backoff policy in metadata"
    version: "0.1.0"
    hooks: ["tool_post_invoke", "resource_post_fetch"]
    mode: "permissive"
    priority: 7
    config:
      # Default retry policy
      default_policy:
        max_retries: 2
        backoff_strategy: "exponential"
        backoff_base_ms: 200
        backoff_multiplier: 2.0
        max_backoff_ms: 5000
        jitter: true
        jitter_max_percent: 20
      
      # HTTP status code policies
      status_code_policies:
        429:  # Rate limited
          max_retries: 5
          backoff_strategy: "exponential"
          backoff_base_ms: 1000
          max_backoff_ms: 30000
        500:  # Internal server error
          max_retries: 3
          backoff_strategy: "exponential"
          backoff_base_ms: 500
        502:  # Bad gateway
          max_retries: 3
          backoff_strategy: "linear"
          backoff_base_ms: 1000
        503:  # Service unavailable
          max_retries: 5
          backoff_strategy: "exponential"
          backoff_base_ms: 2000
        504:  # Gateway timeout
          max_retries: 2
          backoff_strategy: "fixed"
          backoff_base_ms: 3000
      
      # Tool-specific policies
      tool_policies:
        web_scraper:
          max_retries: 3
          backoff_base_ms: 1000
          retry_on_status: [429, 500, 502, 503, 504]
        api_caller:
          max_retries: 5
          backoff_strategy: "exponential"
          backoff_base_ms: 500
        database_query:
          max_retries: 2
          backoff_strategy: "linear"
          backoff_base_ms: 1000
      
      # Error pattern matching
      error_patterns:
        - pattern: "timeout"
          max_retries: 3
          backoff_base_ms: 2000
        - pattern: "connection.*reset"
          max_retries: 2
          backoff_base_ms: 1000
        - pattern: "temporary.*failure"
          max_retries: 4
          backoff_base_ms: 500
      
      # Circuit breaker integration
      circuit_breaker:
        enabled: true
        failure_threshold: 5
        recovery_timeout_ms: 30000
        half_open_max_calls: 3
      
      # Retry conditions
      retry_conditions:
        retry_on_status: [429, 500, 502, 503, 504]
        retry_on_timeout: true
        retry_on_connection_error: true
        never_retry_status: [400, 401, 403, 404]
      
      # Metadata annotation
      metadata:
        include_policy: true
        include_next_retry_time: true
        include_attempt_count: true
        include_circuit_breaker_state: true

Acceptance Criteria

  • Plugin implements RetryWithBackoffPlugin class
  • Multiple backoff strategies (exponential, linear, fixed)
  • Per-tool and per-status-code policies
  • Error pattern matching for retry decisions
  • Circuit breaker pattern support
  • Metadata annotation with retry guidance
  • Jitter support for avoiding thundering herd
  • Plugin manifest and documentation created
  • Unit tests with >90% coverage
  • Integration tests with various error scenarios

Priority

Medium - Reliability and resilience feature

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions