ClusterClient.SSubscribe silently re-subscribes to a random node after reconnect — PubSub.conn() ignores c.schannels

### Expected Behavior

A `*PubSub` created via `ClusterClient.SSubscribe` should reconnect to the
node owning the shard channel's hash slot after a connection failure, the
same way it does for the initial connection.

### Current Behavior

After any connection loss (server restart, transient network blip,
`CLIENT KILL`, idle timeout), the `PubSub` auto-reconnects to a **random
cluster node** and re-issues `SSUBSCRIBE` there. On any node other than the
slot owner, Redis replies `-MOVED`. The reply is never read (the `resubscribe`
path is write-only), so the failure is **silent**: the `PubSub` looks healthy,
`Ping` succeeds, `Receive`/`Channel()` keep returning, but no messages arrive
because the subscriber isn't on the shard the publishers reach.

On an N-node cluster the chance of landing on the wrong node is `(N-1)/N` per
reconnect. The only recovery is to close and recreate the `PubSub` (or restart
the process).

### Root Cause

[`pubsub.go:90`](https://github.com/redis/go-redis/blob/8cdff5946a35/pubsub.go#L90)
in `PubSub.conn()` builds the `channels` slice that's handed to the
`newConn` callback for node selection — but it only collects `c.channels`
(regular `SUBSCRIBE`), not `c.schannels` (sharded `SSUBSCRIBE`):

```go
channels := slices.Collect(maps.Keys(c.channels))
channels = append(channels, newChannels...)

cn, err := c.newConn(ctx, c.opt.Addr, channels)
```

For a `PubSub` that *only* has sharded subscriptions — the normal case for
`ClusterClient.SSubscribe` — `channels` is empty when reconnecting. The
`ClusterClient`'s `newConn` closure
([`osscluster.go:2152-2178`](https://github.com/redis/go-redis/blob/8cdff5946a35/osscluster.go#L2152-L2178))
then takes the `else` branch and picks a random node:

```go
if len(channels) > 0 {
    slot := hashtag.Slot(channels[0])
    ...
    node, err = c.slotMasterNode(ctx, slot)
    ...
} else {
    node, err = c.nodes.Random()       // ← reconnect lands here
    ...
}
```

`resubscribe()`
([`pubsub.go:127-128`](https://github.com/redis/go-redis/blob/8cdff5946a35/pubsub.go#L127-L128))
then writes `SSUBSCRIBE` over the new conn, but `_subscribe()` is
write-only and never reads the reply. The `-MOVED` from the wrong node is
either dropped, or eventually surfaces as an error from `ReceiveTimeout` —
but `isBadConn()`
([`error.go:204-210`](https://github.com/redis/go-redis/blob/8cdff5946a35/error.go#L204-L210))
returns `false` for a `MOVED` that points at a *different* address, so it
doesn't trigger another reconnect.

The *initial* connection isn't affected, because `PubSub.subscribe()` passes
the new channels through the `newChannels` argument to `conn()`, and they're
appended to the routing list. The bug only bites the **re**connect path,
where `newChannels` is `nil` and `c.schannels` is the only place the shard
channels live.

### Steps to Reproduce

1. Start a Redis cluster (≥ 2 master shards).
2. `pubsub := clusterClient.SSubscribe(ctx, "shard-chan")`.
3. Verify delivery works: `clusterClient.SPublish(ctx, "shard-chan", "x")` →
   `pubsub.ReceiveTimeout(...)` returns the message.
4. Forcibly close the PubSub connection on the slot owner — e.g.
   `CLIENT KILL TYPE pubsub` on that node — without sending `SUNSUBSCRIBE`.
5. Trigger reconnect (`pubsub.Ping(ctx)`, or just wait for the next
   `Receive` / health-check ping).
6. `SPublish` again → `ReceiveTimeout` times out, and `SPUBLISH`'s return
   value is `0` (no subscribers reached on the slot owner).

A regression test in the repo's existing ginkgo cluster harness will be
included in the PR (`osscluster_test.go`); it iterates the
kill→reconnect→publish cycle 8 times so a lucky random-node hit can't mask
the bug.

### Possible Solution

Include `c.schannels` when building the channel list passed to `newConn`:

```go
channels := slices.Collect(maps.Keys(c.channels))
channels = append(channels, slices.Collect(maps.Keys(c.schannels))...)
channels = append(channels, newChannels...)
```

Sharded channels are appended *after* regular channels so that for
`SSubscribe`-only `PubSub`s (the common case) `channels[0]` is a shard
channel and slot routing works, while `PubSub`s using only regular
`SUBSCRIBE`/`PSUBSCRIBE` see no behavior change.

The `channels` argument is unused for routing in `redis.Client.pubSub()`
(single node, fixed `addr`), `Ring.SSubscribe()` (shard chosen *before* the
`PubSub` is created, owned by a single `redis.Client`), and `SentinelClient`
(no `SSubscribe`), so the change only affects `ClusterClient`.

### Context (Environment)

- go-redis: v9.17.1 (also reproduced against v9.19.0 and `master` @
  `8cdff5946a35`)
- Server: Redis Cluster 7.x (sharded pub/sub requires ≥ 7.0)
- Production impact: on a 25-shard cluster, a transient connection blip
  caused message receive rate to drop ~90% (24/25 chance of wrong node) and
  never recover. The PubSub health-check ping kept succeeding against the
  wrong node, so the failure went undetected for ~6 hours until a process
  restart.

### Note on mixed regular + sharded subscriptions

A `PubSub` carrying *both* regular and sharded subscriptions on a cluster
client is already underdetermined (regular channels can be served by any
node; sharded channels must be on the slot owner; a single conn can't
satisfy both for arbitrary channel sets). This fix doesn't change behavior
for that case — `channels[0]` is still a regular channel and routing follows
it. `ClusterClient.Subscribe`/`PSubscribe`/`SSubscribe` each return a fresh
`PubSub`, so the mixed case only arises if the caller mixes
`Subscribe`/`SSubscribe` calls on the same `PubSub` deliberately.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ClusterClient.SSubscribe silently re-subscribes to a random node after reconnect — PubSub.conn() ignores c.schannels #3806

Expected Behavior

Current Behavior

Root Cause

Steps to Reproduce

Possible Solution

Context (Environment)

Note on mixed regular + sharded subscriptions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ClusterClient.SSubscribe silently re-subscribes to a random node after reconnect — PubSub.conn() ignores c.schannels #3806

Description

Expected Behavior

Current Behavior

Root Cause

Steps to Reproduce

Possible Solution

Context (Environment)

Note on mixed regular + sharded subscriptions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions