Skip to content

[ARCH] Add resilience patterns (retry, circuit breaker) #112

@erikdarlingdata

Description

@erikdarlingdata

Summary

No retry logic or circuit breaker patterns exist. A single connection failure results in immediate silent failure with no recovery. Background tasks use fire-and-forget without error tracking.

Source: Comprehensive themed agent review (Architect + Developer)

Current State

  • Connection failures: immediate fail, no retry
  • Alert checks: catch (Exception) { Logger.Warning(...) } - silent failure
  • Background tasks: _ = CheckForUpdatesOnStartupAsync() - fire-and-forget
  • Offline servers: every check attempt fails immediately (no fast-fail / cooldown)
  • No Polly or equivalent library in use

Checklist

Retry Logic

  • Add retry with exponential backoff for transient SQL connection failures
  • Add retry for email sending (SMTP can have transient failures)
  • Distinguish transient vs permanent failures (SqlException error codes)
  • Note: Lite already has RetryHelper.ExecuteWithRetryAsync() - evaluate extending to Dashboard

Circuit Breaker

  • Implement circuit breaker for offline servers (stop hammering after N failures)
  • Fast-fail for known-offline servers instead of waiting for timeout
  • Auto-recovery: periodically test circuit to detect server coming back online

Fire-and-Forget Cleanup

  • Wrap all fire-and-forget async calls with exception logging
  • _ = SomeAsync() pattern should use helper: _ = SafeFireAndForget(SomeAsync())
  • Remove .Wait() call in MainWindow.xaml.cs:236 (blocks UI thread)

Connection Management

  • Add configurable connection timeout (currently hardcoded)
  • Add connection pool configuration (Min/Max Pool Size)
  • Consider Polly library for standardized resilience policies

Priority

Tier 3 - Medium

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions