Expected Behavior
When redis.Options.StreamingCredentialsProvider is set on a client, the client should not retain any state for connections that have been closed and removed from the pool.
Concretely, when a connection is removed from the pool through any path (including stale-conn discard, idle/lifetime expiry, and health-check failure), registered pool hooks should observe the removal via PoolHook.OnRemove. In particular, streaming.ReAuthPoolHook.OnRemove should fire so that streaming.Manager.RemoveListener(connID) runs and the corresponding entry in streamingCredentialsManager.credentialsListeners.listeners is deleted. The map size should track the live pool size, not the cumulative number of stale-conn closures.
Current Behavior
Any client created with StreamingCredentialsProvider accumulates one orphan *ConnReAuthCredentialsListener (and the closed *pool.Conn it captures) per stale-conn close, because (*ConnPool).CloseConn does not invoke PoolHook.OnRemove.
(*ConnPool).CloseConn (internal/pool/pool.go:1449) does not call the pool hook’s ProcessOnRemove. As a result, every connection that exits the pool through CloseConn (idle expiry, lifetime expiry, health-check failure, stale-conn discard from Get) leaves a permanent entry in streaming.Manager.credentialsListeners.listeners.
removeConnInternal and the Remove* paths do invoke the hook, so the gap is specifically CloseConn. The two functions sit next to each other in the same file:
// internal/pool/pool.go (v9.19.0)
func (p *ConnPool) removeConnInternal(ctx context.Context, cn *Conn, reason error, freeTurn bool) {
hookManager := p.hookManager.Load()
if hookManager != nil {
hookManager.ProcessOnRemove(ctx, cn, reason) // ✅ fires
}
...
}
func (p *ConnPool) CloseConn(ctx context.Context, cn *Conn, reason string, fromState string) error {
// ❌ ProcessOnRemove never called
removed := p.removeConnWithLock(cn)
if removed {
p.recordConnectionMetrics(ctx, cn, reason, fromState)
}
return p.closeConn(cn)
}
Because ReAuthPoolHook.OnRemove is the only path that calls Manager.RemoveListener(connID), each leaked map entry is a *ConnReAuthCredentialsListener that holds a conn *pool.Conn field — so the closed connection’s read buffer + write buffer + TLS state stay reachable for the lifetime of the process.
Possible Solution
One-hunk fix that mirrors the existing call in removeConnInternal:
diff --git a/internal/pool/pool.go b/internal/pool/pool.go
--- a/internal/pool/pool.go
+++ b/internal/pool/pool.go
@@ -1447,6 +1447,10 @@ func (p *ConnPool) RemoveWithoutTurn(ctx context.Context, cn *Conn, reason error
// - reason: why the connection is being closed (use CloseReason* constants)
// - fromState: the metric state the connection was in (use MetricState* constants)
func (p *ConnPool) CloseConn(ctx context.Context, cn *Conn, reason string, fromState string) error {
+ if hookManager := p.hookManager.Load(); hookManager != nil {
+ hookManager.ProcessOnRemove(ctx, cn, errors.New(reason))
+ }
+
removed := p.removeConnWithLock(cn)
// Only emit UpDownCounter decrements if we actually removed the connection.
The errors.New(reason) matches the error argument expected by PoolHook.OnRemove. After the patch, internal_listener_count stays equal to the live pool size across thousands of stale-conn cycles in the repro below.
Steps to Reproduce
A minimum single file repro environment is below.
package main
import (
"context"
"fmt"
"reflect"
"runtime"
"sync"
"time"
"github.com/redis/go-redis/v9"
"github.com/redis/go-redis/v9/auth"
)
type provider struct{}
func (provider) Subscribe(auth.CredentialsListener) (auth.Credentials, auth.UnsubscribeFunc, error) {
return auth.NewBasicCredentials("", ""), func() error { return nil }, nil
}
func listeners(c *redis.Client) int {
m := reflect.ValueOf(c).Elem().FieldByName("baseClient").Elem().FieldByName("streamingCredentialsManager")
if m.IsNil() {
return -1
}
return m.Elem().FieldByName("credentialsListeners").Elem().FieldByName("listeners").Len()
}
func main() {
c := redis.NewClient(&redis.Options{
Addr: "redis:6379",
StreamingCredentialsProvider: provider{},
PoolSize: 20,
MinIdleConns: 20,
ConnMaxIdleTime: 150 * time.Millisecond,
ReadBufferSize: 32 * 1024,
WriteBufferSize: 32 * 1024,
})
defer c.Close()
ctx := context.Background()
start := time.Now()
var wg sync.WaitGroup
for i := 1; i <= 500; i++ {
wg.Add(20)
for j := 0; j < 20; j++ {
go func() {
defer wg.Done()
_ = c.Set(ctx, "k", "v", time.Minute).Err()
}()
}
wg.Wait()
time.Sleep(250 * time.Millisecond)
if i == 1 || i%25 == 0 || i == 500 {
var m runtime.MemStats
runtime.ReadMemStats(&m)
s := c.PoolStats()
fmt.Printf("iter=%d stale=%d total=%d listeners=%d heap_MB=%d secs=%.0f\n",
i, s.StaleConns, s.TotalConns, listeners(c), m.HeapInuse/1024/1024, time.Since(start).Seconds())
}
}
}
example output
repro-1 | iter=1 stale=0 total=20 listeners=20 heap_MB=3 secs=0
repro-1 | iter=25 stale=482 total=20 listeners=482 heap_MB=35 secs=6
repro-1 | iter=50 stale=982 total=20 listeners=982 heap_MB=66 secs=13
... ...
repro-1 | iter=475 stale=9365 total=20 listeners=9330 heap_MB=610 secs=120
repro-1 | iter=500 stale=9865 total=20 listeners=9830 heap_MB=644 secs=127
Context (Environment)
- go-redis version:
v9.19.0 (also master at the time of writing)
- Go runtime: go1.25.x
- Auth path: redis.Options.StreamingCredentialsProvider (we use
https://github.com/redis/go-redis-entraid actually)
- Trigger: any workload that uses streaming credentials together with stale-conn churn (short ConnMaxIdleTime, ConnMaxLifetime expiry, or MinIdleConns warming after health-check failures).
The bug is independent of #3772 / #3785. Those fixed wrappedOnClose chain growth on baseClient.onClose in v9.19.0, removing the dominant leak in the streaming-credentials path. This issue is a separate retainer:
For long-running services using streaming credentials, the residual leak is large enough that pprof + map inspection are necessary to detect it after the v9.19.0 upgrade.
Detailed Description
The streaming-credentials manager owns a credentialsListeners.listeners map keyed by connection ID, populated in Manager.Listener(...) during (*baseClient).initConn. The corresponding cleanup is wired through the pool hook contract:
*ConnPool.removeConnInternal --(hookManager.ProcessOnRemove)--> ReAuthPoolHook.OnRemove --> Manager.RemoveListener(connID)
(*ConnPool).CloseConn is the only removal path that does not invoke this hook chain, even though it's the path used by:
- stale-conn discard inside (*ConnPool).Get when
isHealthyConn returns false
- ConnMaxIdleTime expiry
- ConnMaxLifetime expiry
- TCP probe / health-check failure
This is asymmetric with the surrounding code:
removeConnInternal and RemoveWithoutTurn invoke ProcessOnRemove.
Remove, CloseStaleConns-driven removals, and pool shutdown all flow through one of the above and therefore call the hook.
- Only
CloseConn skips it.
The behavioral consequence is that hook-managed state (currently: streaming credentials listeners; in principle: any future per-connection bookkeeping that uses PoolHook) is leaked whenever CloseConn is the exit path.
Possible Implementation
I think the explanation above is already quite clear.
Expected Behavior
When redis.Options.StreamingCredentialsProvider is set on a client, the client should not retain any state for connections that have been closed and removed from the pool.
Concretely, when a connection is removed from the pool through any path (including stale-conn discard, idle/lifetime expiry, and health-check failure), registered pool hooks should observe the removal via
PoolHook.OnRemove. In particular, streaming.ReAuthPoolHook.OnRemove should fire so that streaming.Manager.RemoveListener(connID) runs and the corresponding entry in streamingCredentialsManager.credentialsListeners.listeners is deleted. The map size should track the live pool size, not the cumulative number of stale-conn closures.Current Behavior
Any client created with StreamingCredentialsProvider accumulates one orphan *ConnReAuthCredentialsListener (and the closed *pool.Conn it captures) per stale-conn close, because (*ConnPool).CloseConn does not invoke
PoolHook.OnRemove.(*ConnPool).CloseConn (internal/pool/pool.go:1449) does not call the pool hook’s
ProcessOnRemove. As a result, every connection that exits the pool throughCloseConn(idle expiry, lifetime expiry, health-check failure, stale-conn discard fromGet) leaves a permanent entry in streaming.Manager.credentialsListeners.listeners.removeConnInternaland theRemove*paths do invoke the hook, so the gap is specificallyCloseConn. The two functions sit next to each other in the same file:Because ReAuthPoolHook.OnRemove is the only path that calls Manager.RemoveListener(connID), each leaked map entry is a *ConnReAuthCredentialsListener that holds a conn *pool.Conn field — so the closed connection’s read buffer + write buffer + TLS state stay reachable for the lifetime of the process.
Possible Solution
One-hunk fix that mirrors the existing call in
removeConnInternal:The errors.New(reason) matches the
errorargument expected byPoolHook.OnRemove. After the patch,internal_listener_countstays equal to the live pool size across thousands of stale-conn cycles in the repro below.Steps to Reproduce
A minimum single file repro environment is below.
example output
Context (Environment)
v9.19.0(also master at the time of writing)https://github.com/redis/go-redis-entraidactually)The bug is independent of #3772 / #3785. Those fixed
wrappedOnClosechain growth on baseClient.onClose inv9.19.0, removing the dominant leak in the streaming-credentials path. This issue is a separate retainer:unsubscribereference per ever-created conn).CloseConn).For long-running services using streaming credentials, the residual leak is large enough that pprof + map inspection are necessary to detect it after the v9.19.0 upgrade.
Detailed Description
The streaming-credentials manager owns a credentialsListeners.listeners map keyed by connection ID, populated in Manager.Listener(...) during (*baseClient).initConn. The corresponding cleanup is wired through the pool hook contract:
(*ConnPool).CloseConn is the only removal path that does not invoke this hook chain, even though it's the path used by:
isHealthyConnreturns falseThis is asymmetric with the surrounding code:
removeConnInternalandRemoveWithoutTurninvokeProcessOnRemove.Remove,CloseStaleConns-driven removals, and pool shutdown all flow through one of the above and therefore call the hook.CloseConnskips it.The behavioral consequence is that hook-managed state (currently: streaming credentials listeners; in principle: any future per-connection bookkeeping that uses
PoolHook) is leaked wheneverCloseConnis the exit path.Possible Implementation
I think the explanation above is already quite clear.