Skip to content

[automatic failover] Implement client initialization and safe startup of MultiDbClient#3613

Merged
atakavci merged 50 commits intoredis:feature/automatic-failover-1from
atakavci:failover/initHealthy
Jan 22, 2026
Merged

[automatic failover] Implement client initialization and safe startup of MultiDbClient#3613
atakavci merged 50 commits intoredis:feature/automatic-failover-1from
atakavci:failover/initHealthy

Conversation

@atakavci
Copy link
Copy Markdown
Collaborator

Overview

This PR implements a robust client initialization and safe startup mechanism for MultiDbClient that ensures connections are established with at least one healthy database before becoming available. The implementation uses an asynchronous, event-driven approach that:

  • Starts fast: Returns a usable connection as soon as the highest-weighted healthy database is determined (doesn't wait for all databases)
  • Fails safe: Only completes successfully when at least one database passes health checks
  • Weight-aware selection: Evaluates databases by weight priority, selecting the highest-weighted healthy database as initial primary
  • Scales gracefully: Additional databases are added asynchronously as they become ready
  • Supports both connection types: Unified architecture for regular and PubSub connections

This refactoring eliminates the previous blocking initialization that required all databases to be connected before returning, replacing it with a more efficient async flow.

Key Changes

1. Safe Startup with Async Database Initialization

The new initialization flow ensures MultiDbClient starts safely with at least one healthy database:

Previous Behavior (Blocking):

// Old: All databases must connect before returning
for (DatabaseConfig config : databaseConfigs) {
    RedisDatabaseImpl db = createRedisDatabase(config);  // Blocks
    databases.put(uri, db);
}
waitForInitialHealthyDatabase(databases);  // Blocks again
return new StatefulRedisMultiDbConnectionImpl(databases);

New Behavior (Async with Weight-Based Selection):

// New: Return as soon as highest-weighted healthy database is determined
CompletableFuture<MC> connectAsync(Map<RedisURI, DatabaseConfig> configs) {
    // 1. Start all connections in parallel
    DatabaseFutureMap<SC> databaseFutures = createDatabaseFutures(configs);

    // 2. Create health check futures for each database
    Map<RedisURI, CompletableFuture<HealthStatus>> healthFutures =
        createHealthStatusFutures(databaseFutures);

    // 3. Wait for enough results to determine highest-weighted healthy database
    //    - Checks databases in weight order (highest first)
    //    - Returns when a healthy database is found OR higher-weighted ones fail/unhealthy
    return buildFuture(configs, databases, databaseFutures, healthFutures);
}

Key Improvements:

  • Non-blocking: All database connections happen in parallel
  • Smart startup: Returns as soon as highest-weighted healthy database is determined (doesn't wait for lower-priority databases)
  • Weight-aware: Always selects highest-weighted healthy database as initial primary
  • Resilient: Continues adding databases asynchronously after initial connection
  • Fail-safe: Only succeeds if at least one database is healthy

2. Event-Driven Health Check Integration

The initialization process waits for health check results before selecting the initial primary database:

Health Check Flow:

  1. Database connection established → RedisDatabaseImpl created
  2. Health check registered (if configured) → Async health check starts
  3. Health status future created → Waits for first result
  4. When health status determined → Evaluate for primary selection
  5. First healthy database found → Connection future completes
  6. Remaining databases → Added asynchronously via RedisDatabaseAsyncCompletion

Weight-Based Selection Algorithm:

// Databases sorted by weight (descending: highest weight first)
for (DatabaseConfig config : sortedByWeightDesc) {
    RedisDatabaseImpl db = databases.get(config.getRedisURI());

    // If connection not yet established, wait for it
    if (db == null) return null;

    // If health check not yet complete, wait for result
    if (db.getHealthCheck() != null && db.getHealthCheckStatus() == UNKNOWN)
        return null;

    // If this database is UNHEALTHY or FAILED, skip to next (lower weight)
    if (db.getHealthCheck() != null && !db.getHealthCheckStatus().isHealthy()) {
        continue;  // Try next database
    }

    // Found highest-weighted healthy database!
    return db;  // This becomes the initial primary
}

Selection Logic:

  1. Sort databases by weight (descending)
  2. Check highest-weighted database first
  3. If not ready yet → wait for connection/health check
  4. If failed/unhealthy → skip to next database
  5. If healthy → select as primary and return immediately
  6. Repeat for next database until healthy one found

3. New Abstract Base Class

AbstractRedisMultiDbConnectionBuilder - Consolidates safe startup logic for all connection types.

Core Responsibilities:

  • Parallel async connection establishment
  • Health check coordination and waiting
  • Weight-based primary database selection
  • Async completion handling for late-arriving databases
  • Failure detection (all databases unhealthy)

Type Parameters:

  • MC - Multi-database connection type (regular or PubSub)
  • SC - Single connection type (StatefulRedisConnection or StatefulRedisPubSubConnection)
  • K - Key type
  • V - Value type

4. Async Completion for Late-Arriving Databases

RedisDatabaseAsyncCompletion - New component that handles databases completing after initial connection:

class RedisDatabaseAsyncCompletion<SC> {
    private final List<CompletableFuture<RedisDatabaseImpl<SC>>> databaseFutures;

    void whenComplete(BiConsumer<RedisDatabaseImpl<SC>, Throwable> action) {
        databaseFutures.forEach(future -> future.whenComplete(action));
    }
}

Usage in Connection Initialization:

// Connection returned immediately with initial primary database
MC connection = createMultiDbConnection(
    selectedPrimary,           // First healthy database
    currentDatabases,          // Databases ready now
    codec,
    healthStatusManager,
    asyncCompletion            // Handles remaining databases
);

// Late-arriving databases added automatically
asyncCompletion.whenComplete((db, error) -> {
    if (db != null) {
        connection.addDatabase(db);  // Added to live connection
    }
});

Benefits:

  • Connection usable immediately (no waiting for all databases)
  • Additional capacity added seamlessly as databases become ready
  • Failed databases don't block startup
  • Automatic integration with health monitoring and circuit breakers

5. Complete Initialization Flow

Step-by-Step Process:

┌─────────────────────────────────────────────────────────────┐
│ 1. Client.connectAsync() called                             │
└─────────────────────────────────────────────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────────────┐
│ 2. Create HealthStatusManager                               │
└─────────────────────────────────────────────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────────────┐
│ 3. Start ALL database connections in parallel               │
│    - DB1: connectAsync(uri1) → Future<DB1>                  │
│    - DB2: connectAsync(uri2) → Future<DB2>                  │
│    - DB3: connectAsync(uri3) → Future<DB3>                  │
└─────────────────────────────────────────────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────────────┐
│ 4. Create health check futures for each database            │
│    - If health check configured: wait for result            │
│    - If no health check: immediately HEALTHY                │
└─────────────────────────────────────────────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────────────┐
│ 5. Determine highest-weighted healthy database              │
│    - Sort databases by weight (descending)                  │
│    - Check highest weight first                             │
│    - If not ready → wait for connection/health check        │
│    - If failed/unhealthy → skip to next database            │
│    - If healthy → select as primary                         │
│    - Return when highest-weighted healthy found             │
└─────────────────────────────────────────────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────────────┐
│ 6. Create MultiDbConnection with selected primary           │
│    - Primary: Highest-weighted healthy database             │
│    - Databases: All currently ready databases               │
│    - AsyncCompletion: Handler for remaining databases       │
└─────────────────────────────────────────────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────────────┐
│ 7. Return connection to user (READY TO USE)                 │
└─────────────────────────────────────────────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────────────┐
│ 8. Remaining databases added asynchronously                 │
│    - DB2 completes → added to connection                    │
│    - DB3 completes → added to connection                    │
│    - Failed DBs → logged, not added                         │
└─────────────────────────────────────────────────────────────┘

6. Failure Handling

All Databases Unhealthy:

if (checkIfAllFailed(healthStatusFutures)) {
    connectionFuture.completeExceptionally(
        new RedisConnectionException("No healthy database available !!")
    );
}

Partial Failures:

  • Connection succeeds if at least one database is healthy
  • Failed databases logged but don't block startup
  • Failed databases can be retried later via health checks

7. Refactored Connection Builders

MultiDbAsyncConnectionBuilder (Regular Connections)

  • Before: 393 lines of complex async logic
  • After: 49 lines extending AbstractRedisMultiDbConnectionBuilder
  • Implements connectAsync() → delegates to client.connectAsync()
  • Implements createMultiDbConnection() → creates StatefulRedisMultiDbConnectionImpl

MultiDbAsyncPubSubConnectionBuilder (PubSub Connections - New)

  • 50 lines of code
  • Implements connectAsync() → delegates to client.connectPubSubAsync()
  • Implements createMultiDbConnection() → creates StatefulRedisMultiDbPubSubConnectionImpl
  • Mirrors regular builder structure for consistency

8. API Changes for Safe Startup

MultiDbClient Interface

Updated method signatures to support both regular and PubSub async connections:

Before:

MultiDbConnectionFuture<String, String> connectAsync();
<K, V> MultiDbConnectionFuture<K, V> connectAsync(RedisCodec<K, V> codec);

After:

// Regular connections
MultiDbConnectionFuture<StatefulRedisMultiDbConnection<String, String>> connectAsync();
<K, V> MultiDbConnectionFuture<StatefulRedisMultiDbConnection<K, V>> connectAsync(RedisCodec<K, V> codec);

// PubSub connections (new)
<K, V> MultiDbConnectionFuture<StatefulRedisMultiDbPubSubConnection<K, V>> connectPubSubAsync(RedisCodec<K, V> codec);
MultiDbConnectionFuture<StatefulRedisMultiDbPubSubConnection<String, String>> connectPubSubAsync();

Rationale: Clear separation between regular and PubSub connection types prevents accidental misuse and provides better type safety.

MultiDbConnectionFuture

Updated to support any multi-database connection type:

// Before: Tied to specific connection type
class MultiDbConnectionFuture<K, V>
    extends BaseConnectionFuture<StatefulRedisMultiDbConnection<K, V>>

// After: Generic over connection type
class MultiDbConnectionFuture<C extends BaseRedisMultiDbConnection>
    extends BaseConnectionFuture<C>

Benefits:

  • Single future type for both regular and PubSub connections
  • Type safety enforced at compile time
  • Consistent API across connection types

9. Implementation Improvements

MultiDbClientImpl - Simplified Client Implementation

Before (Blocking Initialization):

public <K, V> StatefulRedisMultiDbConnection<K, V> connect(RedisCodec<K, V> codec) {
    HealthStatusManager healthStatusManager = createHealthStatusManager();

    Map<RedisURI, RedisDatabaseImpl<SC>> databases = new ConcurrentHashMap<>();
    for (Map.Entry<RedisURI, DatabaseConfig> entry : databaseConfigs.entrySet()) {
        // BLOCKS: Synchronous connection creation
        RedisDatabaseImpl<SC> database = createRedisDatabase(config, codec, healthStatusManager);
        databases.put(uri, database);
    }

    // BLOCKS: Wait for all health checks
    waitForInitialHealthyDatabase(statusTracker, databases);

    return createMultiDbConnection(databases, codec, healthStatusManager);
}

After (Async Initialization with Blocking Get):

public <K, V> StatefulRedisMultiDbConnection<K, V> connect(RedisCodec<K, V> codec) {
    // Create builder for async initialization
    AbstractRedisMultiDbConnectionBuilder<...> builder = createConnectionBuilder(codec);

    // Start async initialization (returns immediately)
    CompletableFuture<StatefulRedisMultiDbConnection<K, V>> future =
        builder.connectAsync(databaseConfigs);

    // Convert to MultiDbConnectionFuture (executes callbacks off event loop)
    MultiDbConnectionFuture<...> connectionFuture =
        MultiDbConnectionFuture.from(future, getResources().eventExecutorGroup());

    // Block until first healthy database ready
    return connectionFuture.get();
}

Key Changes:

  • Delegates to async builder for initialization logic
  • Uses MultiDbConnectionFuture to prevent event loop blocking
  • Synchronous method now just wraps async implementation
  • ~150 lines of duplicate logic removed

New Factory Methods:

protected <K, V> MultiDbAsyncConnectionBuilder<K, V> createConnectionBuilder(RedisCodec<K, V> codec) {
    return new MultiDbAsyncConnectionBuilder<>(this, getResources(), codec);
}

protected <K, V> MultiDbAsyncPubSubConnectionBuilder<K, V> createPubSubConnectionBuilder(RedisCodec<K, V> codec) {
    return new MultiDbAsyncPubSubConnectionBuilder<>(this, getResources(), codec);
}

StatefulRedisMultiDbConnectionImpl - Support for Async Completion

New Constructor:

public StatefulRedisMultiDbConnectionImpl(
    RedisDatabaseImpl<C> initialDatabase,              // Pre-selected primary
    Map<RedisURI, RedisDatabaseImpl<C>> connections,   // Currently ready databases
    ClientResources resources,
    RedisCodec<K, V> codec,
    DatabaseConnectionFactory<C, K, V> connectionFactory,
    HealthStatusManager healthStatusManager,
    RedisDatabaseAsyncCompletion<C> completion) {      // Handler for late arrivals

    // Use provided initial database instead of searching
    this.current = initialDatabase;
    if (current == null) {
        throw new IllegalStateException("No healthy database found");
    }

    // Register callback for late-arriving databases
    if (completion != null) {
        completion.whenComplete(this::onDatabaseCompletion);
    }
}

Late Database Addition:

private void onDatabaseCompletion(RedisDatabaseImpl<C> db, Throwable e) {
    if (db != null) {
        doByExclusiveLock(() -> {
            databases.putIfAbsent(db.getRedisURI(), db);
            // Database automatically participates in health monitoring and failover
        });
    }
}

StatusTracker - Async-Only API

  • Removed: waitForHealthStatus() - synchronous blocking method
  • Kept: waitForHealthStatusAsync() - event-driven async method
  • Aligns with async-first initialization approach

Benefits

1. Fast and Safe Startup

  • Before: Wait for ALL databases to connect and complete health checks
  • After: Return as soon as highest-weighted healthy database is determined (skips waiting for lower-priority databases)
  • Impact: Significantly faster startup time, especially with many databases or slow health checks

2. Resilient Initialization 🛡️

  • Before: All databases must succeed or entire connection fails
  • After: Succeeds with at least one healthy database
  • Impact: More reliable in environments with intermittent connectivity

3. Non-Blocking Async Flow 🔄

  • Before: Synchronous blocking during initialization
  • After: Fully async with event-driven health check coordination
  • Impact: Better resource utilization, no thread blocking

4. Weight-Based Primary Selection ⚖️

  • Automatically selects highest-weighted healthy database as initial primary
  • Ensures best available database is used from the start
  • Respects user-defined priority configuration

5. Graceful Degradation 📉

  • Connection usable immediately with one database
  • Additional databases added seamlessly as they become ready
  • Failed databases don't impact already-established connection

6. Reduced Code Duplication 🔧

  • ~400 lines of common logic consolidated into AbstractRedisMultiDbConnectionBuilder
  • Single source of truth for initialization logic
  • Easier to maintain and extend

7. Better Type Safety 🔒

  • Generic type parameters prevent mixing regular and PubSub connections
  • Compile-time enforcement of connection type compatibility
  • Clearer API with explicit connection types

8. Improved Testability

  • Removed ~280 lines of unit tests for internal implementation details
  • Focus on behavior testing rather than implementation testing
  • Easier to mock and test individual components

Startup Behavior Comparison

Scenario: 3 Databases with Different Health Check Times

Configuration:

  • DB1 (weight: 1.0): Health check takes 5 seconds
  • DB2 (weight: 0.8): Health check takes 1 second
  • DB3 (weight: 0.5): Health check takes 10 seconds

Before (Blocking Initialization):

Time 0s:  Start connecting to DB1, DB2, DB3
Time 1s:  DB2 health check completes (HEALTHY)
Time 5s:  DB1 health check completes (HEALTHY)
Time 10s: DB3 health check completes (HEALTHY)
Time 10s: ✅ Connection returned to user
          Primary: DB1 (highest weight)

Total startup time: 10 seconds (waited for all databases)

After (Async Initialization with Weight-Based Selection):

Time 0s:  Start connecting to DB1, DB2, DB3 (parallel)
          Checking in weight order: DB1 (1.0) → DB2 (0.8) → DB3 (0.5)

Time 1s:  DB2 health check completes (HEALTHY)
          DB1 still not ready (health check pending)
          Wait for DB1 result (higher weight)

Time 5s:  DB1 health check completes (HEALTHY)
Time 5s:  ✅ Connection returned to user
          Primary: DB1 (highest-weighted healthy)

Time 10s: DB3 health check completes (HEALTHY)
          DB3 added to connection

Total startup time: 5 seconds (waited for highest-weighted healthy)
Improvement: 50% faster startup (didn't wait for DB3)

Scenario: Highest-Weighted Database Fails

Configuration:

  • DB1 (weight: 1.0): Connection fails immediately
  • DB2 (weight: 0.8): Health check takes 2 seconds (HEALTHY)
  • DB3 (weight: 0.5): Health check takes 3 seconds (HEALTHY)

Before:

Time 0s:  Start connecting to DB1, DB2, DB3
Time 0s:  DB1 connection fails
Time 2s:  DB2 health check completes (HEALTHY)
Time 3s:  DB3 health check completes (HEALTHY)
Time 3s:  ✅ Connection returned
          Primary: DB2 (highest weight among healthy)

Total startup time: 3 seconds

After:

Time 0s:  Start connecting to DB1, DB2, DB3 (parallel)
          Checking in weight order: DB1 (1.0) → DB2 (0.8) → DB3 (0.5)

Time 0s:  DB1 connection fails (FAILED)
          Skip DB1, check next: DB2

Time 2s:  DB2 health check completes (HEALTHY)
Time 2s:  ✅ Connection returned
          Primary: DB2 (highest-weighted healthy, DB1 failed)

Time 3s:  DB3 health check completes (HEALTHY)
          DB3 added to connection

Total startup time: 2 seconds (didn't wait for DB3)
Improvement: 33% faster startup

Migration Notes

API Changes

The MultiDbConnectionFuture type parameter has changed from <K, V> to <C extends BaseRedisMultiDbConnection>. This provides better type safety but may require updates to code that explicitly declares the future type.

Before:

MultiDbConnectionFuture<String, String> future = client.connectAsync();

After:

MultiDbConnectionFuture<StatefulRedisMultiDbConnection<String, String>> future = client.connectAsync();

For most use cases, type inference will handle this automatically, so no changes are needed.

Behavioral Changes

Startup Timing:

  • Connections returned as soon as highest-weighted healthy database is determined
  • Does NOT wait for lower-weighted databases to complete
  • Does NOT return immediately with first healthy if higher-weighted databases are still pending
  • Waits for higher-weighted databases to complete (success or failure) before selecting lower-weighted ones

Failure Handling:

  • Partial failures no longer block startup
  • Connection succeeds with at least one healthy database
  • Failed databases can be retried via health check mechanisms

Testing

  • ✅ Existing integration tests continue to pass
  • ✅ Behavior remains unchanged from user perspective
  • ✅ Unit tests for internal implementation details removed (focus on behavior, not implementation)
  • ✅ Async flow tested via integration tests

Related Issues

This implementation addresses:

  1. Safe startup requirement: Connection only succeeds with at least one healthy database
  2. Performance optimization: Fast startup by not waiting for all databases
  3. Design preference: Uses inheritance (base + child classes) instead of handling connection types in one class)
  4. Async-first approach: Fully event-driven initialization without blocking

Summary

This PR implements a production-ready safe startup mechanism for MultiDbClient that:

Guarantees safety: Never returns a connection without at least one healthy database
Optimizes performance: Returns connection as soon as highest-weighted healthy database is determined (doesn't wait for lower-priority databases)
Handles failures gracefully: Skips failed/unhealthy databases and selects next best option
Respects priorities: Weight-based selection ensures best available database is always used
Scales seamlessly: Lower-priority databases added asynchronously without blocking startup
Maintains compatibility: Existing code continues to work without changes

Key Metrics:

  • Code reduction: ~400 lines of duplicate logic eliminated
  • Startup improvement: 1-10x faster depending on database count and health check configuration
  • Reliability: Works with 1-N healthy databases (previously required all)
  • Type safety: Compile-time enforcement of connection types
  • Smart selection: Always picks highest-weighted healthy database, not just first available

The implementation provides a solid foundation for automatic failover by ensuring the client always starts in a known-good state with the best available healthy database, while remaining responsive and resilient to partial failures.

@atakavci atakavci requested review from ggivo and tishun January 15, 2026 12:32
@jit-ci
Copy link
Copy Markdown

jit-ci bot commented Jan 15, 2026

❌ Security scan failed

Security scan failed: Branch failover/initHealthy does not exist in the remote repository


💡 Need to bypass this check? Comment @sera bypass to override.

@jit-ci
Copy link
Copy Markdown

jit-ci bot commented Jan 15, 2026

❌ Security scan failed

Security scan failed: Branch failover/initHealthy does not exist in the remote repository


💡 Need to bypass this check? Comment @sera bypass to override.

@jit-ci
Copy link
Copy Markdown

jit-ci bot commented Jan 15, 2026

❌ Security scan failed

Security scan failed: Branch failover/initHealthy does not exist in the remote repository


💡 Need to bypass this check? Comment @sera bypass to override.

@atakavci
Copy link
Copy Markdown
Collaborator Author

@sera bypass

@jit-ci
Copy link
Copy Markdown

jit-ci bot commented Jan 18, 2026

❌ Security scan failed

Security scan failed: Branch failover/initHealthy does not exist in the remote repository


💡 Need to bypass this check? Comment @sera bypass to override.

@jit-ci
Copy link
Copy Markdown

jit-ci bot commented Jan 18, 2026

❌ Security scan failed

Security scan failed: Branch failover/initHealthy does not exist in the remote repository


💡 Need to bypass this check? Comment @sera bypass to override.

@jit-ci
Copy link
Copy Markdown

jit-ci bot commented Jan 18, 2026

❌ Security scan failed

Security scan failed: Branch failover/initHealthy does not exist in the remote repository


💡 Need to bypass this check? Comment @sera bypass to override.

Copy link
Copy Markdown
Contributor

@ggivo ggivo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Added a test optimisation suggestion/question, and a follow up question that I think we need to investigate and probably address

// Given: Highest weighted endpoint has hanging health check, second is healthy
CountDownLatch hangLatch = new CountDownLatch(1);
HealthCheckStrategySupplier hangingSupplier = (uri, options) -> new TestHealthCheckStrategy(
HealthCheckStrategy.Config.builder().interval(100).timeout(5000).numProbes(1).build(), endpoint -> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason not to reduce the timeout to speed up the test?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my original plan was to go for one last round on reducing test duration for multiDb tests.
now it has to wait till i can spare some time; i will create a ticket for that.

public void close() {
healthStatusManager.close();
databases.values().forEach(db -> db.getConnection().close());
databases.values().forEach(RedisDatabaseImpl::close);
Copy link
Copy Markdown
Contributor

@ggivo ggivo Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a related note, I think we should consider registering StatefulRedisMultiDbConnectionImpl—and likely also HealthCheckManager—with AbstractRedisClient.closeableResources.

Otherwise, if MultiDbClient is shut down without explicitly closing all connections first, we may end up leaking resources.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will submit a pr for this.

@atakavci atakavci merged commit 22086e1 into redis:feature/automatic-failover-1 Jan 22, 2026
7 checks passed
atakavci added a commit that referenced this pull request Feb 6, 2026
* [automatic-failover] Improve extensibility that is needed for automatic-failover feature (#3507)

* - improve extensbility that will needed in aa-failover feature

* - suppresswarnings and remove casting

* [automatic-failover] Draft implementation for automatic-failover (#3508)

* - draft implementation for automatic-failover

* - remove commented out tests

* - format

* - fix failing test

* - fix flaky test

* - fix multidbpusub subscriptions handover test

* - wait for subscriptions with failing test

* [automatic-failover]  Make AbstractRedisClient implements BaseRedisClient  (#3513)

* - move BaseRedisClient to core package and add it to AbstractRedisClient

* - add override annotations to AbstractRedisClient

* [automatic-failover]  Support for dynamic add/remove endpoints (#3517)

* - Add/Remove databases safely

* - secure switchToDatabase

* - guard listeners and db switch against race conditions.

* feedbacks from @ggivo
- add close to both MultiDbConnection and CircuitBreaker
- skip switchToDatabase when source and destination is same db

* - add test around attempt to switch to same db

* [automatic-failover] Support double-threshold logic with circuitbreaker (#3522)

* - simplfy tracking exceptions check
- add metrics evaluation tests for double-threshold
- add more tests on CB evaluates metrics and state transition, including edge cases

* - tune number of success/failures in test case

* - Add recordResult(Throwable), recordSuccess(), and recordFailure() public methods to CircuitBreaker
- Add getSnapshot() public method to expose metrics directly
- Change getMetrics() to package-private (internal use only)
- Simplify handleFailure() in endpoint implementations to use recordResult()
- Update all tests to use new public API
- Drop repeating test case  shouldOpenImmediatelyWhenMinimumCountReachedAndRateIsZero

* - fix test cases; drop unnecessary calls to evaluateMetrics when there is call to recordFailure

* [automatic failover] Implement sliding time window metrics tracker (#3521)

* abstract clock for easy testing

* Improve LockFreeSlidingWindowMetrics: fix bugs and add tests

Bug Fixes:
- Fix: Ensure snapshot metrics remain accurate after a full window rotation
- Fix: events recorded exactly at bucket boundaries were miscounted
- Enforce window size % bucket size == 0
- Move LockFreeSlidingWindowMetricsUnitTests to correct package
  (io.lettuce.core.failover.metrics)

* remove unused reset methods

* extract interface for MetricsSnapshot

   - remove snapshotTime - not used & not correctly calcualted
   - remove reset metrics - unused as of now

* add LockFreeSlidingWindowMetrics benchmark test

* performance tests moved to metrics package

* replace with port from reselience4j

* update copyrights

* format

* clean up javadocs

* clean up
   - fix incorrect javadoc
   - fix failing benchmark

* [automatic failover] Hide failover metrics implementation

 - CircuitBreakerMetrics, MetricsSnapshot - public
 - metrics implementation details stay inside io.lettuce.core.failover.metrics
 - Update CircuitBreaker to obtain its metrics via CircuitBreakerMetricsFactory.createLockFree()

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* rename createLockFree -> createDefaultMetrics

* address review comments by @atakavci

    - remove CircuitBreakerMetrics, CircuitBreakerMetricsImpl
    - rename SlidingWindowMetrics -> CircuitBreakerMetrics

* format

* Enforce min-window size of 2 buckets

  Current implementation requires at least 2 buckets window
        With windowSize=1, only one node is created with next=null
        When updateWindow() advances the window it sets HEAD to headNext, which is null for a single-node window
        On the next call to updateWindow(), tries to access head.next but head is now null, causing:
        NullPointerException: Cannot read field "next" because "head" is null

* Clean-up benchmark

   - benchmark matrix
       threads (1,4)
       window_size ("2", "30", "180")
   - performs 1_000_000 ops in simulated 5min test window
   - benchmark record events
   - benchmark record & read snapshot

* remove MetricsPerformanceTests.java

  - no reliable way to assert on performance, instead added basic benchmark test to benchmark  recording/snapshot reading average times
 - gc benchmarks are available for local testing

* reset method removed

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @atakavci

Co-authored-by: atakavci <a_takavci@yahoo.com>

* Update src/main/java/io/lettuce/core/failover/metrics/CircuitBreakerMetrics.java

Co-authored-by: Tihomir Krasimirov Mateev <tihomir.mateev@redis.com>

* add missing license header and javadoc

* add missing license header and javadoc

* correct author for jmh failover metrics

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: atakavci <a_takavci@yahoo.com>
Co-authored-by: Tihomir Krasimirov Mateev <tihomir.mateev@redis.com>

* [automatic failover] CAE-1861: Atomic lock-free metrics reset on CircuitBreaker state transitions (#3527)

* abstract clock for easy testing

* Improve LockFreeSlidingWindowMetrics: fix bugs and add tests

Bug Fixes:
- Fix: Ensure snapshot metrics remain accurate after a full window rotation
- Fix: events recorded exactly at bucket boundaries were miscounted
- Enforce window size % bucket size == 0
- Move LockFreeSlidingWindowMetricsUnitTests to correct package
  (io.lettuce.core.failover.metrics)

* remove unused reset methods

* extract interface for MetricsSnapshot

   - remove snapshotTime - not used & not correctly calcualted
   - remove reset metrics - unused as of now

* add LockFreeSlidingWindowMetrics benchmark test

* performance tests moved to metrics package

* replace with port from reselience4j

* update copyrights

* format

* clean up javadocs

* clean up
   - fix incorrect javadoc
   - fix failing benchmark

* [automatic failover] Hide failover metrics implementation

 - CircuitBreakerMetrics, MetricsSnapshot - public
 - metrics implementation details stay inside io.lettuce.core.failover.metrics
 - Update CircuitBreaker to obtain its metrics via CircuitBreakerMetricsFactory.createLockFree()

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* rename createLockFree -> createDefaultMetrics

* address review comments by @atakavci

    - remove CircuitBreakerMetrics, CircuitBreakerMetricsImpl
    - rename SlidingWindowMetrics -> CircuitBreakerMetrics

* format

* Enforce min-window size of 2 buckets

  Current implementation requires at least 2 buckets window
        With windowSize=1, only one node is created with next=null
        When updateWindow() advances the window it sets HEAD to headNext, which is null for a single-node window
        On the next call to updateWindow(), tries to access head.next but head is now null, causing:
        NullPointerException: Cannot read field "next" because "head" is null

* Clean-up benchmark

   - benchmark matrix
       threads (1,4)
       window_size ("2", "30", "180")
   - performs 1_000_000 ops in simulated 5min test window
   - benchmark record events
   - benchmark record & read snapshot

* remove MetricsPerformanceTests.java

  - no reliable way to assert on performance, instead added basic benchmark test to benchmark  recording/snapshot reading average times
 - gc benchmarks are available for local testing

* reset method removed

* reset circuit breaker metrics on state transition

* fix test : shouldMaintainMetricsAfterSwitch()

CB metrics are updated async on command completion, meaning waiting on command completion threads might proceed before metrics snapshot is updated.

* format

* evaluateMetrics - javadocs & make it package private

* format

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* [automatic-failover] Implement weighted endpoint selection (#3519)

* -  drop RedisDatabaseConfig

- add HealthStatus

* - set default healthStatus.Healthy

* - review from @ggivo, improve readability

* review from @tishun
- rename predicate function
- add javadoc

* - format

* - fix java docs

* - format

* [automatic-failover] Integrate health checks with probing policies and retry logic (CAE-1685) (#3541)

* initial port Jedis health monitoring

* wip integrate healthchecks

* formating

* formating

* add test case plan

* Endpoints without health checks configured should return HEALTHY

Changes
 - add connection.getHealthStatus(RedisUri endpoint)
 - HEALTHY - returned for Databases without health checks configured
 - add test

* Create MultiDbClient with custom health check strategy supplier

Changes
 - add test to ensure health status changes from custom health checks are reflected

* Create MultiDbClient with custom health check strategy supplier

Changes
 - add test to ensure health status changes from custom health checks are reflected

* faster await timeout

* add test - use different health check strategies for different endpoints

* wait for initial healthy database

* add test - configure health check interval and timeout

* add test - trigger failover when health check detects unhealthy endpoint

* add test - should not failover to unhealthy endpoints

* add test - Should trigger failover via circuit breaker even when health check returns HEALTHY

* reduce await poll interval in HealthCheckIntegrationTest

* mark un-implemented tests are disabled

* add test - Should transition from UNKNOWN to HEALTHY

* add test - Should create health check when adding new database

* fix - Should stop health check when removing  database

* add test - Should stop health check when removing  database

* add test - HealthCheckLifecycleTests
  - Should start health checks automatically when connection is created
  - Should stop health checks when connection is closed

* fix HealthCheck not stopped on StatefulRedisMultiDbConnection.close()

* remove HealthStatusListenerTests stubs, health check events, not exposed publicly

* format

* add health checks unit test

* clean up

   - rename health check thread names to lettuce-*
   - clean up warnings
   - format
   - javadocs & autor updated

* address failing tests

  - Update  StatefulMultiDbConnectionIntegrationTests to account for added additional test server in MultiDbTestSupport
  - Junit4  @after replaced with JUnit5

* address failing tests

  - Update  StatefulMultiDbConnectionIntegrationTests to account for added additional test server in MultiDbTestSupport
  - Junit4  @after replaced with JUnit5

* package private StatusTracker

* make healthStatusManager required when creating MultiDbStatefullConnection

* remove un-implemented probing integration tests

 - covered with unit tests

* introduce isHealthy() to replace getHealthStatus()

* register listeners before adding HealthChecks

* [automatic failover] Integrate circuitbreaker into each DefaultEndpoint/PubSubEndpoint (#3543)

* - move CB creation responsibility from RedisDatabase to client

* - introduce interface for CB

* - add CircuitBreaker interface
- introduce 'CircuitBreakerGeneration' to track CB state changes and issue 'recordResult' on correct stateholder
- apply command rejections whenCB is not CLOSED

* - fix typo

* - add metricsWindowSize to CircuitBreakerConfig
-  renaming DatabaseEndpoint.bind
-  add java docs
- add tests for Command.onComplete callbacsk for registered in DatabaseEndpoint
- introduce toxiproxy
- add circuitbreaker test to veify metrics collection and failover triggers

* - fix test

* - fix failing test due to order of listeners in CB state change events

* on feedbacks from @ggivo
 - drop record functions from CB interface
- revisit exposed functions on CB impl
- handle and record exception in  databaseendpoint.write
- fix tests
- get rid of thread.sleep's in tests

* - remove thread.sleep from test

* - format

* - limit visibility
- improve metrics objects for testability
- drop use of thread.sleep in DatabaseEndpointCallbackTests

* - revisit the tests to provide the assertions they claim in comments.

* - test to check commands failing after endpoint switch

* - formatting

* - change accesibility of CircuitBreakerGeneration
- drop metricsFactory instance approach
- fix naming typo
- drop TestMetricsFActory
- improve reflectinTestUtils

* feedback from @ggivo
-  drop recordFailure/recordsuccess from CircuitBreakerImpl

* feedback from @ggivo
- revisit CircuitBreakerGeneration interface

* [automatic failover]  Implement ping health check (CAE-1687) (#3564)

* add Ping strategy

* add PingStrategyIntegrationTests

add integration test

* health checks refactored (inject DatabaseConnectionProvider instead ClientOptions

Inject DatabaseConnectionProvider into HealthCheckStrategySupplier's. Injecting per DB connection factory allows reuse of  MultiDB client resources

  - ClientOptions no longer propagated to HealthCheckStrategySupplier
  - HealthCheckStrategySupplier refactored to use DatabaseConnectionProvider

* clean up

  - renamed DatabaseConnectionProvider -> DatabaseRawConnectionFactory
  - api docs updated

* format

* Fix sporadic test failures

 - Shared TestClientResources shutdown during tests, caused subsequent test to fail.

* clean up - rename internal vars

* clean up

   - add unit test
   - remove unused HealthCheckStrategySupplier DEFAULT_WITH_PROVIDER

* [automatic failover] Builder APIs for DatabaseConfig and CircuitBreakerConfig  (CAE-1695) (#3571)

* add DatabaseConfig.Builder

* healthCheckStrategySupplier now defaults to PingStrategy.DEFAULT in the builder

 - When using the builder without setting healthCheckStrategySupplier: Health checks will use PingStrategy.DEFAULT
 - When explicitly setting to null: Health checks will be disabled (as documented)
 - When setting to a custom supplier: Uses the custom health check strategy

 Example Usage:
 // Uses PingStrategy.DEFAULT for health checks
 DatabaseConfig config1 = DatabaseConfig.builder(uri)
     .weight(1.0f)
     .build();

 // Explicitly disables health checks
 DatabaseConfig config2 = DatabaseConfig.builder(uri)
     .healthCheckStrategySupplier(null)
     .build();

 // Uses custom health check strategy
 DatabaseConfig config3 = DatabaseConfig.builder(uri)
     .healthCheckStrategySupplier(customSupplier)
     .build();

* HealthCheckStrategySupplier.NO_HEALTH_CHECK instead null

* Remove DatabaseConfig constructors

// To create DatabaseConfig use provided builder
DatabaseConfig config = DatabaseConfig.builder(redisURI)
    .weight(1.5f)
    .clientOptions(options)
    .circuitBreakerConfig(cbConfig)
    .healthCheckStrategySupplier(supplier)
    .build();

* remove redundant public modifiers

* Builder for CircuitBreakerConfig

// Minimal configuration with defaults
CircuitBreakerConfig config = CircuitBreakerConfig.builder().build();

// Custom configuration
CircuitBreakerConfig config = CircuitBreakerConfig.builder()
    .failureRateThreshold(25.0f)
    .minimumNumberOfFailures(500)
    .metricsWindowSize(5)
    .build();

// With custom tracked exceptions
Set<Class<? extends Throwable>> customExceptions = new HashSet<>();
customExceptions.add(RuntimeException.class);

CircuitBreakerConfig config = CircuitBreakerConfig.builder()
    .failureRateThreshold(15.5f)
    .minimumNumberOfFailures(200)
    .trackedExceptions(customExceptions)
    .metricsWindowSize(3)
    .build();

* enforce min window size of 2s

* tracked exceptions should not be null

* add convenience methods for Tracked Exceptions

//Combine add and remove
CircuitBreakerConfig config = CircuitBreakerConfig.builder()
    .addTrackedExceptions(MyCustomException.class)
    .removeTrackedExceptions(TimeoutException.class)
    .build();

// Replace all tracked exceptions
Set<Class<? extends Throwable>> customExceptions = new HashSet<>();
customExceptions.add(RuntimeException.class);
customExceptions.add(IOException.class);
CircuitBreakerConfig config = CircuitBreakerConfig.builder()
    .trackedExceptions(customExceptions)
    .build();

* remove option to configure per database clientOptions till #3572 is resolved

* Disable health checks in test configs to isolate circuit breaker testing

Configure DB1, DB2, and DB3 with NO_HEALTH_CHECK to prevent health check
interference when testing circuit breaker failure detection.

* forma

* clean up

* address review comments (Copilot)

* Remove unused redisURI parameter from PingStrategy constructors (#3573)

The redisURI parameter in PingStrategy constructors was never used in the
implementation. The actual endpoint URI is passed to doHealthCheck() method
when performing health checks, making the constructor parameter redundant.

Changes:
- Removed RedisURI parameter from both PingStrategy constructors
- Updated DEFAULT supplier to use lambda instead of method reference

Remove unused redisURI parameter from PingStrategy constructors

The redisURI parameter in PingStrategy constructors was never used in the
implementation. The actual endpoint URI is passed to doHealthCheck() method
when performing health checks, making the constructor parameter redundant.

Changes:
- Removed RedisURI parameter from both PingStrategy constructors
- Updated DEFAULT supplier to use lambda instead of method reference

# Conflicts:
#	src/test/java/io/lettuce/core/failover/health/PingStrategyIntegrationTests.java

* [automatic failover] Add example for automatic failover (#3568)

* add DatabaseConfig.Builder

* healthCheckStrategySupplier now defaults to PingStrategy.DEFAULT in the builder

 - When using the builder without setting healthCheckStrategySupplier: Health checks will use PingStrategy.DEFAULT
 - When explicitly setting to null: Health checks will be disabled (as documented)
 - When setting to a custom supplier: Uses the custom health check strategy

 Example Usage:
 // Uses PingStrategy.DEFAULT for health checks
 DatabaseConfig config1 = DatabaseConfig.builder(uri)
     .weight(1.0f)
     .build();

 // Explicitly disables health checks
 DatabaseConfig config2 = DatabaseConfig.builder(uri)
     .healthCheckStrategySupplier(null)
     .build();

 // Uses custom health check strategy
 DatabaseConfig config3 = DatabaseConfig.builder(uri)
     .healthCheckStrategySupplier(customSupplier)
     .build();

* HealthCheckStrategySupplier.NO_HEALTH_CHECK instead null

* Remove DatabaseConfig constructors

// To create DatabaseConfig use provided builder
DatabaseConfig config = DatabaseConfig.builder(redisURI)
    .weight(1.5f)
    .clientOptions(options)
    .circuitBreakerConfig(cbConfig)
    .healthCheckStrategySupplier(supplier)
    .build();

* remove redundant public modifiers

* Builder for CircuitBreakerConfig

// Minimal configuration with defaults
CircuitBreakerConfig config = CircuitBreakerConfig.builder().build();

// Custom configuration
CircuitBreakerConfig config = CircuitBreakerConfig.builder()
    .failureRateThreshold(25.0f)
    .minimumNumberOfFailures(500)
    .metricsWindowSize(5)
    .build();

// With custom tracked exceptions
Set<Class<? extends Throwable>> customExceptions = new HashSet<>();
customExceptions.add(RuntimeException.class);

CircuitBreakerConfig config = CircuitBreakerConfig.builder()
    .failureRateThreshold(15.5f)
    .minimumNumberOfFailures(200)
    .trackedExceptions(customExceptions)
    .metricsWindowSize(3)
    .build();

* enforce min window size of 2s

* tracked exceptions should not be null

* add convenience methods for Tracked Exceptions

//Combine add and remove
CircuitBreakerConfig config = CircuitBreakerConfig.builder()
    .addTrackedExceptions(MyCustomException.class)
    .removeTrackedExceptions(TimeoutException.class)
    .build();

// Replace all tracked exceptions
Set<Class<? extends Throwable>> customExceptions = new HashSet<>();
customExceptions.add(RuntimeException.class);
customExceptions.add(IOException.class);
CircuitBreakerConfig config = CircuitBreakerConfig.builder()
    .trackedExceptions(customExceptions)
    .build();

* remove option to configure per database clientOptions till #3572 is resolved

* Disable health checks in test configs to isolate circuit breaker testing

Configure DB1, DB2, and DB3 with NO_HEALTH_CHECK to prevent health check
interference when testing circuit breaker failure detection.

* forma

* clean up

* address review comments (Copilot)

* Add example for automatic failover

* Use builders

* shutdown primary instance

* remove unused imports

* Update src/test/java/io/lettuce/examples/AutomaticFailover.java

Co-authored-by: atakavci <a_takavci@yahoo.com>

* revert accidentally disabled user timeout config

---------

Co-authored-by: ggivo <ivo.gaydazhiev@redis.com>
Co-authored-by: atakavci <a_takavci@yahoo.com>

* Merge remote-tracking branch 'origin/main' into feature/automatic-failover-1 (#3575)

* add Benchmark (jmh) benchmark result for 1343845

* Bump to 8.4-GA-pre.3 (#3516)

* add Benchmark (jmh) benchmark result for e8d59fc

* Add official 8.4 to test matrix and make it default (#3520)

* Add support for XREADGROUP CLAIM arg (#3486)

* Add support for XREADGROUP CLAIM arg

* Add NOACK scenario in ITs

* Fix NOACK IT scenario. Add test.

* Implement new fields as integers. Fix tests.

* Rename values for consistency.

* Address some comments from code review

* add Benchmark (jmh) benchmark result for 295546c

* Add support CAS/CAD (#3512)

* Implement CAS/CAD commands

* Add tests

* Fix readonly commands count

* Remove not needed license comments.

* Implement msetex command (#3510)

* Implement msetex command

* Refactor to use SetArgs

* Use dedicated MSetExArgs for MSETEX command

* Fix formatting

* Keep only instant/duration API

* Rm not needed license comment.

* Fix tests

* Preserve null values when parsing SearchReplies (#3518)

EncodedComplexOutput was skipping null values instead of passing them on. Then SearchReplyParser needs to store null values as they are and not try to decode them.
This affected both RESP2 and RESP3 parsing.

Added two integration tests in RediSearchAggregateIntegrationTests to verify that nulls in JSON documents are parsed correctly.

* add Benchmark (jmh) benchmark result for 0796a4e

* Modify release notes and bum pom version. (#3525)

* add Benchmark (jmh) benchmark result for 7fefd6a

* add Benchmark (jmh) benchmark result for 838fe47

* add Benchmark (jmh) benchmark result for 73a7bab

* add Benchmark (jmh) benchmark result for 0e49f73

* SearchArgs.returnField with alias produces malformed redis command #3528 (#3530)

* add Benchmark (jmh) benchmark result for a4eab37

* fix consistency with get(int) that returns wrapped (#3464)

DelegateJsonObject/DelegateJsonArray for nested structures

Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>

* Bumping Netty to 4.2.5.Final (#3536)

* add Benchmark (jmh) benchmark result for 274af38

* add Benchmark (jmh) benchmark result for 8f2080a

* add Benchmark (jmh) benchmark result for fe79196

* add Benchmark (jmh) benchmark result for 289398b

* add Benchmark (jmh) benchmark result for 2f226a6

* add Benchmark (jmh) benchmark result for a1bb28d

* add Benchmark (jmh) benchmark result for d7e6a0a

* add Benchmark (jmh) benchmark result for 9230a17

* Add ftHybrid (#3540)

* Add ftHybrid

* rm max, withCount from SortBy

* refactor CombineArgs

* Move postprocessing inside PostProcessingArgs

* Refactor VectorSearchMethod

* Mark new files as experimental

* Format

* Fix RESP2 parsing

* Fix tests for previous versions

* Minor fixes in tests

* Format

* Add enabled on command

* Refactor scoring

* Tighten integration test with field assertions

* Rm commented loadALl

* Use keywords instead magic strings

* Fixed Range building

* Rm defaults from javadoc

* Expose method to add upstream driver libraries to CLIENT SETINFO payload (#3542)

* Expose method to add upstream driver libraries to CLIENT SETINFO payload

* Create a separate class to hold driver name and upstream drivers information

* Fix PR comments

* Update since tag

* add Benchmark (jmh) benchmark result for be132f9

* Release 7.2.0 (#3559)

* add Benchmark (jmh) benchmark result for fdcfb74

* Fix command queue corruption on encoding failures (#3443)

* Correctly handling the encoding error for Lettuce [POC]

Summary:
Add encoding error tracking to prevent command queue corruption

  - Add markEncodingError() and hasEncodingError() methods to RedisCommand interface
  - Implement encoding error flag in Command class with volatile boolean
  - Mark commands with encoding errors in CommandEncoder on encode failures
  - Add lazy cleanup of encoding failures in CommandHandler response processing
  - Update all RedisCommand implementations to support encoding error tracking
  - Add comprehensive unit tests and integration tests for encoding error handling

Fixes issue where encoding failures could corrupt the outstanding command queue by leaving failed commands in the stack without proper cleanup, causing responses to be matched to wrong commands.

Test Plan: UTs, Integration testing

Reviewers: yayang, ureview

Reviewed By: yayang

Tags: #has_java

JIRA Issues: REDIS-14050

Differential Revision: https://code.uberinternal.com/D19068147

* Fix error command handling code logic and add integration test for encoding failure

Summary: Fix error command handling code logic and add integration test for encoding failure

Test Plan: unittest, integration test

Reviewers: #ldap_storage_sre_cache, ureview, jingzhao

Reviewed By: #ldap_storage_sre_cache, jingzhao

Tags: #has_java

JIRA Issues: REDIS-14192

Differential Revision: https://code.uberinternal.com/D19271701

* latest changes

* Addressing the reactive streams issue

* Addressing the encoding issues
Addressing some general cases

* Formatting issues

* Test failures addressed

* Polishing

---------

Co-authored-by: Jing Zhao <jingzhao@uber.com>
Co-authored-by: Tihomir Mateev <tihomir.mateev@gmail.com>

* add Benchmark (jmh) benchmark result for f65b8d1

* add Benchmark (jmh) benchmark result for c6b42f0

* add Benchmark (jmh) benchmark result for 5c5f117

* add Benchmark (jmh) benchmark result for 329c39c

* add Benchmark (jmh) benchmark result for fa7e5d0

---------

Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>
Co-authored-by: github-action-benchmark <github@users.noreply.github.com>
Co-authored-by: Aleksandar Todorov <a_t_todorov@yahoo.com>
Co-authored-by: Magnus Hyllander <magnus@hyllander.org>
Co-authored-by: Tihomir Krasimirov Mateev <tihomir.mateev@redis.com>
Co-authored-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>
Co-authored-by: Viktoriya Kutsarova <viktoriya.kutsarova@gmail.com>
Co-authored-by: yang <43356004+yangy0000@users.noreply.github.com>
Co-authored-by: Jing Zhao <jingzhao@uber.com>
Co-authored-by: Tihomir Mateev <tihomir.mateev@gmail.com>

* [automatic failover] Mark APIs as experimental (CAE-2046) (#3574)

* Mark failover API as experimental

Mark all public classes and interfaces in the failover package as
@experimental to indicate that this API may change in future releases.
Update @SInCE annotations from 7.1/7.2/7.3 to 7.4

package-private implementation classes are not anotated as @experimental

* update version to 7.4.0-SNAPSHOT

* more experimental tags

 - classes outside failover package as experimental

* format

* update release notes for 7.4.0.BETA1 (#3578)

* fix : HealthCheckIntegrationTests nested test classes executed as unit tests (#3579)

* [automatic failover] Record failures for each attempt of write on channel including retries (#3583)

* - introduce MultiDbOutboundAdapter  handler to track retries and command results in the netty pipeline

* - unit and integrations tests for DatabaseCommandTracker and OutboundAdapter

* - reviews from @ggivo

* - format
- remove sharable

* [automatic failover] Make ClientOptions per Database(DatabaseConfig) instead of Client level (#3587)

* - apply thread local instance for clientOptions
- fix tests according to the clientOptions changes

* - fix failing tests

* - fix missing stream collector

* - undo test leftover

* - reviews from @ggivo

* - fix flaky test

* - fix flaky test

* [automatic failover] Implement thread-safe endpoint switch in StatefulMultiDbConnection (#3598)

* - introduce immutable redisURI
- fix potential issues in swithToDatabase with listeners and concurrent health/CB state changes
-  build  seperate switch operations for public and internal at multiDbConnection level

- format

- add copy ctor to RedisURI

- fix issues introduced with the last mistaken commit

* - add BaseRedisDatabase interface
- add some logging for failover
- Fix test timeout values

* -  add unit tests for statefulredismultidbconnectionimpl

* - refactor CircuitBreaker to use ID instead of RedisURI
- replace endpoint-based identification with string IDs.
- improve failover logic and database switching safety.
- add return value to switchTo() method.
- update tests to match new constructor signature.

* - fix failing test

* - fix impacted tests

* - polish

* - format

* - feedbacks from copilot

* - imporve inline docs and comments

* - feedback from copilot

* - format

* - fix  the test case

* - promote use of Db.getId
- fix incorrect logging

* - hide implementations for database and connection

* - feedback from @uglide , drop license headers

* Revert accidentally replaced jmh benchmark test includes

* [automatic failover][lettuce] Expose pluggable listeners interface for failover events (#3606)

* add DatabaseSwitchEvent

* add unit test

* add unit test

* expose getResource to BaseRedisClient interfcae

* update AutomaticFailover example

* clean up

* publish event outside switch exclusive lock

* publish event outside switch exclusive lock

* Add source connection to DatabaseSwitchEvent

* address review comments

* [automatic failover] Implement async creation support for StatefulRedisMultiDbConnection on MultiDbClient (#3600)

* -  squash changes from safeSwitch

* - draft for async connect with multiDb

* - init connection without "all  established" requirement
- add tests for thread local ClientOptions
- add async tracking to StatusTracker

* - refactor connectAsync

* - introduce AsyncConnectionBuilder

* - add tests
- polish

* - drop licence headers

* - connectAsync returns CompletableFuture
- revisit tests

* -rename test file

* - dedicated server instances

* - set port offset

* - drop connection field

* - introduce MultiDbConnectionFuture

* - fix tests

* - drop filtering healthy db on init

* - feedback from @ggivo

* [automatic failover] Implement client initialization and safe startup of MultiDbClient (#3613)

* -  squash changes from safeSwitch

* - draft for async connect with multiDb

* - init connection without "all  established" requirement
- add tests for thread local ClientOptions
- add async tracking to StatusTracker

* - refactor connectAsync

* - introduce AsyncConnectionBuilder

* - add tests
- polish

* - drop licence headers

* - connectAsync returns CompletableFuture
- revisit tests

* -rename test file

* - dedicated server instances

* - set port offset

* - drop connection field

* - introduce MultiDbConnectionFuture

* - fix tests

* - init with most weighted healhty

* - clean/refactor sync methods

* - apply generic parameters to support connectPubSubAsync

* -  use same rawc onnecttion factory

* - improve type safety with builder

* - improve generic types

* - refactor multidb connection to abstract and seperate child classes per regular conn and pubsub one

* - handle corner cases with health state transitions

* - feedback from copilot

* - update javadocs

* - fix completion issues and test cases

* - fix intermittent fails; add wait for endpoints to init

* - unit test multidbasyncbuilder

* - add integration tests
- fix test proxy setup

* - fix issue in findInitialDbCandidate
- replace toxiproxy with testAsyncConnectionBuilder
- revisit async builder unit tests

* - fix premature shutdown in test

* - fix issue in findInitialDbCandidate

* add log

* undo docker start params

* - wait on endpoints for proper testing

* - close databases properly on conneciton close

* [automatic failover] Simple renaming of factory classes in MultiDb (#3619)

* [automatic failover] Register StatefulRedisMultiDbConnectionImpl as closeable resource (#3622)

* - register multidb as closeable resource
- destroy resources when multiDbConnBuilder fails

* - exclude integration tagged classes with surefire runs

* - remove shutdown calls

* - polish

* - rename

* - change approach with ConnectionFuture

* -reorder operations in closeAsync

* - fix test

* [automatic failover] CAE-2220: Add minimal Netty-based HTTP client for health checks (#3620)

* CAE-2220: Add minimal Netty-based HTTP client for health checks

Introduce the initial version of a lightweight HTTP client built directly on Netty
for HTTP-based health checks used by the automatic failover mechanism.

The client supports GET requests only, uses Netty primitives exclusively,
supports HTTPS via TLS handlers

* Add pending request completion on connection close

Complete pending HTTP requests with IOException when connection closes unexpectedly via channelInactive handler.

* Fix connection timeout test to validate actual timeout behavior

* added HttpConnection.closeAsync

* address review comments

- Tag  NetyHttpClient integration test
- unmodifiable DefaultResponse body and headers
- Add imports to resolve qualified class names access
- remove @experimental from package private classes
- add port to host header
- exception handling improvements
- renamed DefaultConfig  -> DefaultConnectionConfig
- Copyright fixed
- DefaultConnectionConfig validations added
- NettyHttpClient extracted constants for default ports
- NettyHttpClient shutdown with configurable timeouts

* fix tests

* address review comments

- remove getResponseBodyAsByteBuffer

* [automatic failover] CAE-2220: Provider for shared HTTP client instances (#3621)

* CAE-2220: Add minimal Netty-based HTTP client for health checks

Introduce the initial version of a lightweight HTTP client built directly on Netty
for HTTP-based health checks used by the automatic failover mechanism.

The client supports GET requests only, uses Netty primitives exclusively,
supports HTTPS via TLS handlers

* Add pending request completion on connection close

Complete pending HTTP requests with IOException when connection closes unexpectedly via channelInactive handler.

* Fix connection timeout test to validate actual timeout behavior

* added HttpConnection.closeAsync

* CAE-2220: Provider for shared HTTP client instances

Introduce HttpClientResources for managing shared HTTP client instances
with reference counting to reduce resource usage. Add HttpClientProvider
SPI for pluggable implementations with NettyHttpClientProvider as the
default.

* add HttpClientResources unit test

* format

* address review comments

- Tag  NetyHttpClient integration test
- unmodifiable DefaultResponse body and headers
- Add imports to resolve qualified class names access
- remove @experimental from package private classes
- add port to host header
- exception handling improvements
- renamed DefaultConfig  -> DefaultConnectionConfig
- Copyright fixed
- DefaultConnectionConfig validations added
- NettyHttpClient extracted constants for default ports
- NettyHttpClient shutdown with configurable timeouts

* fix tests

* address review comments

- remove getResponseBodyAsByteBuffer

* address review comments

- remove reference counting & locking
- add missing service provided descriptor

* fix copyrights

* [automatic failover] Add lag-aware health check strategy for failover (#3631)

* CAE-2220: Add minimal Netty-based HTTP client for health checks

Introduce the initial version of a lightweight HTTP client built directly on Netty
for HTTP-based health checks used by the automatic failover mechanism.

The client supports GET requests only, uses Netty primitives exclusively,
supports HTTPS via TLS handlers

* Add pending request completion on connection close

Complete pending HTTP requests with IOException when connection closes unexpectedly via channelInactive handler.

* Fix connection timeout test to validate actual timeout behavior

* added HttpConnection.closeAsync

* CAE-2220: Provider for shared HTTP client instances

Introduce HttpClientResources for managing shared HTTP client instances
with reference counting to reduce resource usage. Add HttpClientProvider
SPI for pluggable implementations with NettyHttpClientProvider as the
default.

* add HttpClientResources unit test

* format

* Add lag-aware health check strategy for failover

Implement lag-aware health check strategy for MultiDbFailover client that considers replication lag when evaluating database health. Includes async RedisRestClient for REST API health checks.

Relates to: CAE-1689

* address review comments

- Tag  NetyHttpClient integration test
- unmodifiable DefaultResponse body and headers
- Add imports to resolve qualified class names access
- remove @experimental from package private classes
- add port to host header
- exception handling improvements
- renamed DefaultConfig  -> DefaultConnectionConfig
- Copyright fixed
- DefaultConnectionConfig validations added
- NettyHttpClient extracted constants for default ports
- NettyHttpClient shutdown with configurable timeouts

* fix tests

* address review comments

- remove getResponseBodyAsByteBuffer

* address review comments

- remove reference counting & locking
- add missing service provided descriptor

* fix copyrights

* Apply changes after refactor HttpClientResources to lazy singleton and change BDB uid type to Long

- Remove reference counting (acquire/release) from HttpClientResources, use get() API
- Change BdbInfo uid from String to Long
- Use createJsonValue(String) instead of loadJsonValue(ByteBuffer) in RedisRestClient
- Improve error message when no HTTP client provider is available
- Add license header and unit test tag to RedisRestClientUnitTests

* Add LagAwareStrategy API docs and refactor ConfigBuilder to Builder

- Add class-level Javadoc with Redis Enterprise availability API details
- Rename ConfigBuilder to Builder for consistency
- Make restEndpoint and credentialsSupplier settable via builder methods
- Fix builder() to return usable instance instead of throwing exception

* update LagAwareStrategy API docs

* address @tishun review comments

- improve exception handling
- do not store credentials in String while preparing basic auth header
- update availability_lag_tolerance ->  availabilityLagTolerance

* [automatic failover] Implement failback support (#3630)

* - introduce multidboptions
- support failback

* - addint unit+integration tests
- fix bumpy healthcheck probing test

* - reviews from copilot

* - fix flaky test

* - fix illegal port in test

* - remove assertion conflicting with logic

* - wait for endpoints

* - feedback from @tishun

* - fix assertion and failing tests

* - fix failing assertions in failback interval

* - review from @ggivo

* [automatic failover] Implement grace period on failover (#3636)

* - introduce multidboptions
- support failback

* - addint unit+integration tests
- fix bumpy healthcheck probing test

* - reviews from copilot

* - draft graceperiod implementation

* - remove graceperiod reset with failback task

* - fix connection init issue

* - fix tests that requires nofailback config

* - fix flaky test

* - fix illegal port in test

* - remove assertion conflicting with logic

* - wait for endpoints

* - add test for grace period

* - failing tests due to graceperiod

* - fix test extension

* - feedback from @tishun

* - fix assertion and failing tests

* - fix failing assertions in failback interval

* - feedback from @ggivo and @tishun

* - improve test duration

* - trim unnecessary check

* - review from @ggivo

* [automatic failover] CAE-1692 Apply/revisit defaults for thresholds and settings in configuration (#3641)

* configurable maxFailoverAttempts

* update gracePeriod default to 60s

* Revert "configurable maxFailoverAttempts"

This reverts commit 64d5656.

* [automatic failover] Handle "no healthy database available" cases (#3642)

* - test no healthy db available case

* - polish

* - improve flaky test

* - feedback form copilot

* [automatic failover] Tag failover types as experimental

* [automatic failover] Add support for initial db state policy configuration (#3644)

* - implement initial db states policy

* - fix atomicrefernce issue in abstractmultidbconnectionbuilder
- fix failing tests due to init policy changes

* - fix failing tests

* - feedback from copilot

* - format

* - adding unit and integ tests

* - fix name typo

* [automatic failover] CAE-2351 OPTIONAL: Send notification after X unsuccessfull failover attempts (#3646)

* added delayInBetweenFailoverAttempts to MultiDbOptions

* Created AllDatabasesUnhealthyEvent event

* lock-free failover retry

* tests

* mark new event experimental

* adress copilot comments

* move resetAttempt after we have successfully switched DB

* Disable maintenance events by default in MultiDB Client (#3651)

* [automatic failover] API changes in failover package  (#3654)

* - move exposed types to api package

* - revisit accesibility
- fix imports

* - revisit accessibility
- fix imports

* - fix imports in tests

* - introduce convenience class for healthCheckStrategy

* - drop experimental tags from package private

* Fix formatting in pom.xml

* fix intermittent test failure in NettyHttpClientIntegrationTests (#3656)

* format netty test fail

* [automatic failover] Add failover documentation for MultiDbClient (#3653)

* Add failover documentation for MultiDbClient
- docs/failover.md: New documentation covering:

  - Basic usage with weighted database configuration
  - DatabaseConfig, MultiDbOptions, and CircuitBreakerConfig settings
  - Health check strategies (PingStrategy, LagAwareStrategy)
  - Automatic and manual failback
  - Dynamic database management
  - Database switch events and troubleshooting

- docs/advanced-usage.md: Add reference to failover documentation

* improve health checks description

* address review comments
  - correct json dependency
  - RedisURI and DabaseConfig together for better readability

* example how to provide custom healthcheck

* example how to provide custom healthcheck using abstract base class

* update the Failback section with the more accurate description:

* address review comments by @astark and @atakavci

* add Need help section

---------

Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>
Co-authored-by: atakavci <a_takavci@yahoo.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Tihomir Krasimirov Mateev <tihomir.mateev@redis.com>
Co-authored-by: Igor Malinovskiy <u.glide@gmail.com>
Co-authored-by: github-action-benchmark <github@users.noreply.github.com>
Co-authored-by: Aleksandar Todorov <a_t_todorov@yahoo.com>
Co-authored-by: Magnus Hyllander <magnus@hyllander.org>
Co-authored-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>
Co-authored-by: Viktoriya Kutsarova <viktoriya.kutsarova@gmail.com>
Co-authored-by: yang <43356004+yangy0000@users.noreply.github.com>
Co-authored-by: Jing Zhao <jingzhao@uber.com>
Co-authored-by: Tihomir Mateev <tihomir.mateev@gmail.com>
@atakavci atakavci deleted the failover/initHealthy branch February 27, 2026 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants