Skip to content

Memory leak: wrappedOnClose builds unbounded closure chain per initConn when using StreamingCredentialsProvider #3772

@fxsml

Description

@fxsml

Versions

  • go-redis: v9.18.0
  • Redis: 6.x (reproduced with redis:6 Docker image; bug is in go-redis, not Redis)

What did you do?

Used StreamingCredentialsProvider with Azure EntraID token rotation (~45–90 min lifetime).
With PoolSize=100, the service OOM-killed every 18–22 hours in production.

What did you expect?

Heap stays flat regardless of how many token rotations occur.

What did you see instead?

Heap grows ~600 MB per rotation and kills the process.

Reproduction (PoolSize=10, 50 workers, 15s rotation interval):

heap=   2 MB  conns=10  stale=0      rotations=0
heap= 603 MB  conns=10  stale=8711   rotations=1
heap=1038 MB  conns=10  stale=15082  rotations=2
heap=3343 MB  conns=10  stale=47059  rotations=11   (3 minutes)

Root cause

In redis.go:486, inside initConn:

c.onClose = c.wrappedOnClose(unsubscribeFromCredentialsProvider)  // line 486 — BUG
cn.SetOnClose(unsubscribeFromCredentialsProvider)                  // line 487 — correct

wrappedOnClose captures the current c.onClose in a new closure and returns a wrapper
that calls both. Called on every initConn, this builds an unbounded linked list on the
shared baseClient:

initConn #1: c.onClose = wrap(unsub_1, nil)
initConn #2: c.onClose = wrap(unsub_2, wrap(unsub_1, nil))
initConn #3: c.onClose = wrap(unsub_3, wrap(unsub_2, wrap(unsub_1, nil)))
...
initConn #N: chain of N closures, each retaining a dead *pool.Conn (~70 KB TLS buffers)

This chain lives on the shared baseClient and is only walked at Close() — never trimmed
during normal operation. The leak is amplified by ReAuthPoolHook: during a token rotation
the hook rejects all pooled connections simultaneously, firing a storm of
queuedNewConn → initConn calls that rapidly extends the chain.

cn.SetOnClose on line 487 already correctly handles per-connection unsubscription when
individual connections close. Line 486 is redundant and harmful.

Fix

-        c.onClose = c.wrappedOnClose(unsubscribeFromCredentialsProvider)
         cn.SetOnClose(unsubscribeFromCredentialsProvider)

With the fix applied, heap stays flat at ~3 MB for the full 3-minute run.

Minimal reproduction

docker run --rm -p 6399:6379 redis:6 redis-server --requirepass test-token
go run main.go

Self-contained ~130-line reproduction: https://gist.github.com/fxsml/0191f9a91b62078ddecf84cd94160e9f

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions