Versions
- go-redis: v9.18.0
- Redis: 6.x (reproduced with
redis:6 Docker image; bug is in go-redis, not Redis)
What did you do?
Used StreamingCredentialsProvider with Azure EntraID token rotation (~45–90 min lifetime).
With PoolSize=100, the service OOM-killed every 18–22 hours in production.
What did you expect?
Heap stays flat regardless of how many token rotations occur.
What did you see instead?
Heap grows ~600 MB per rotation and kills the process.
Reproduction (PoolSize=10, 50 workers, 15s rotation interval):
heap= 2 MB conns=10 stale=0 rotations=0
heap= 603 MB conns=10 stale=8711 rotations=1
heap=1038 MB conns=10 stale=15082 rotations=2
heap=3343 MB conns=10 stale=47059 rotations=11 (3 minutes)
Root cause
In redis.go:486, inside initConn:
c.onClose = c.wrappedOnClose(unsubscribeFromCredentialsProvider) // line 486 — BUG
cn.SetOnClose(unsubscribeFromCredentialsProvider) // line 487 — correct
wrappedOnClose captures the current c.onClose in a new closure and returns a wrapper
that calls both. Called on every initConn, this builds an unbounded linked list on the
shared baseClient:
initConn #1: c.onClose = wrap(unsub_1, nil)
initConn #2: c.onClose = wrap(unsub_2, wrap(unsub_1, nil))
initConn #3: c.onClose = wrap(unsub_3, wrap(unsub_2, wrap(unsub_1, nil)))
...
initConn #N: chain of N closures, each retaining a dead *pool.Conn (~70 KB TLS buffers)
This chain lives on the shared baseClient and is only walked at Close() — never trimmed
during normal operation. The leak is amplified by ReAuthPoolHook: during a token rotation
the hook rejects all pooled connections simultaneously, firing a storm of
queuedNewConn → initConn calls that rapidly extends the chain.
cn.SetOnClose on line 487 already correctly handles per-connection unsubscription when
individual connections close. Line 486 is redundant and harmful.
Fix
- c.onClose = c.wrappedOnClose(unsubscribeFromCredentialsProvider)
cn.SetOnClose(unsubscribeFromCredentialsProvider)
With the fix applied, heap stays flat at ~3 MB for the full 3-minute run.
Minimal reproduction
docker run --rm -p 6399:6379 redis:6 redis-server --requirepass test-token
go run main.go
Self-contained ~130-line reproduction: https://gist.github.com/fxsml/0191f9a91b62078ddecf84cd94160e9f
Related
Versions
redis:6Docker image; bug is in go-redis, not Redis)What did you do?
Used
StreamingCredentialsProviderwith Azure EntraID token rotation (~45–90 min lifetime).With
PoolSize=100, the service OOM-killed every 18–22 hours in production.What did you expect?
Heap stays flat regardless of how many token rotations occur.
What did you see instead?
Heap grows ~600 MB per rotation and kills the process.
Reproduction (PoolSize=10, 50 workers, 15s rotation interval):
Root cause
In
redis.go:486, insideinitConn:wrappedOnClosecaptures the currentc.onClosein a new closure and returns a wrapperthat calls both. Called on every
initConn, this builds an unbounded linked list on theshared
baseClient:This chain lives on the shared
baseClientand is only walked atClose()— never trimmedduring normal operation. The leak is amplified by
ReAuthPoolHook: during a token rotationthe hook rejects all pooled connections simultaneously, firing a storm of
queuedNewConn → initConncalls that rapidly extends the chain.cn.SetOnCloseon line 487 already correctly handles per-connection unsubscription whenindividual connections close. Line 486 is redundant and harmful.
Fix
- c.onClose = c.wrappedOnClose(unsubscribeFromCredentialsProvider) cn.SetOnClose(unsubscribeFromCredentialsProvider)With the fix applied, heap stays flat at ~3 MB for the full 3-minute run.
Minimal reproduction
Self-contained ~130-line reproduction: https://gist.github.com/fxsml/0191f9a91b62078ddecf84cd94160e9f
Related
usedconnection count spike withStreamingCredentialsProvider(sameReAuthPoolHookpath)