Connections closed on +switch-master notification from Sentinel remains in ConnPool.idleConns forever
Expected Behavior
Idle connections with state "CLOSED" should be cleaned up from ConnPool.idleConns and new connections should be reestablished.
Current Behavior
Idle connections remains in ConnPool.idleConns forever.
Currently we use v9.17.2.
Possible Solution
On v9.16.0 it works as expected.
First, on +switch-master event for every connection in pool p.closeConn(cn) is called. The difference between versions is a new line cn.stateMachine.Transition(StateClosed) in Conn.Close() method, which changes the state to "CLOSED".
Second, ConnPool.popIdle() behavior was changed.
In v9.16.0 the following code returns cn, which later is being tested in p.isHealthyConn(cn, now). As a result closed connection is being removed from pool.
if cn.CompareAndSwapUsed(false, true) {
if cn.IsUsable() {
p.idleConnsLen.Add(-1)
break
}
cn.SetUsed(false)
}
In v9.17.2 the code above was replaced with the following:
// Hot path optimization: try IDLE → IN_USE or CREATED → IN_USE transition
// Using inline TryAcquire() method for better performance (avoids pointer dereference)
if cn.TryAcquire() {
// Successfully acquired the connection
p.idleConnsLen.Add(-1)
break
}
cn.TryAcquire() will give True if and only if the stated is in "IDLE" or "CREATED". We have state "CLOSED" here, so as a result we have cn=nil. There are no remaining available connection in pool, so we go through the branch of code where for every upcoming commands a new connection is being created.
I believe we need to return closed cn from popIdle method to be cleared in isHealthyConn as before, or remove from pool here.
Steps to Reproduce
- Run redis in sentinel mode
- Create a Failover Client with poolSize=1
client = goredis.NewFailoverClient(&goredis.FailoverOptions{
MasterName: sentinelName,
SentinelAddrs: strings.Split(sentinels, ","),
PoolSize: 1,
DisableIdentity: true,
})
- Send at least one command to fill the pool
- Trigger switch master
- To easier catch the bug we can use multi-exec
set, err := client.Set(ctx, "testRadishKeyMulti2", "5", 0).Result()
assert.NoError(t, err)
assert.Equal(t, "OK", set)
multi, err := client.Do(ctx, "MULTI").Result()
assert.NoError(t, err)
assert.Equal(t, "OK", multi.(string))
incr, err := client.Do(ctx, "INCR", "testRadishKeyMulti2").Result()
assert.NoError(t, err)
assert.Equal(t, "QUEUED", incr.(string))
decr, err := client.Do(ctx, "DECR", "testRadishKeyMulti2").Result()
assert.NoError(t, err)
assert.Equal(t, "QUEUED", decr.(string))
exec, err := client.Do(ctx, "EXEC").Result()
assert.NoError(t, err)
execArray, ok := exec.([]any)
assert.True(t, ok)
assert.EqualValues(t, int64(6), execArray[0])
assert.EqualValues(t, int64(5), execArray[1])
Context (Environment)
We moved from v9.16.0 due to that maintenance bug
Detailed Description
Possible Implementation
Connections closed on +switch-master notification from Sentinel remains in ConnPool.idleConns forever
Expected Behavior
Idle connections with state "CLOSED" should be cleaned up from ConnPool.idleConns and new connections should be reestablished.
Current Behavior
Idle connections remains in ConnPool.idleConns forever.
Currently we use v9.17.2.
Possible Solution
On v9.16.0 it works as expected.
First, on +switch-master event for every connection in pool p.closeConn(cn) is called. The difference between versions is a new line
cn.stateMachine.Transition(StateClosed)in Conn.Close() method, which changes the state to "CLOSED".Second, ConnPool.popIdle() behavior was changed.
In v9.16.0 the following code returns cn, which later is being tested in p.isHealthyConn(cn, now). As a result closed connection is being removed from pool.
In v9.17.2 the code above was replaced with the following:
cn.TryAcquire() will give True if and only if the stated is in "IDLE" or "CREATED". We have state "CLOSED" here, so as a result we have cn=nil. There are no remaining available connection in pool, so we go through the branch of code where for every upcoming commands a new connection is being created.
I believe we need to return closed cn from popIdle method to be cleared in isHealthyConn as before, or remove from pool here.
Steps to Reproduce
Context (Environment)
We moved from v9.16.0 due to that maintenance bug
Detailed Description
Possible Implementation