fix(cluster): fall back to origin port when CLUSTER SLOTS reports port 0#3828
Merged
Merged
Conversation
In TLS-only clusters started with `--port 0 --tls-port <port>`, CLUSTER SLOTS reports the regular port as 0. newClusterState used to pass that value verbatim into clusterNode.Client.opt.Addr, which the PubSub path dials directly with no MOVED/seed fallback, producing `dial tcp <ip>:0: connection refused`. Add a replaceZeroPort helper symmetric to replaceLoopbackHost: when a slot node's port is "0", substitute the origin endpoint's port — by definition reachable, since it is the port we used to fetch the slot map. NodeAddress is left untouched so maintenance-notification matching continues to see the server-reported value. Adds TestNewClusterState_PortZero covering the bug scenario plus the non-zero passthrough and the loopback+:0 composition. Fixes #3726
🛡️ Jit Security Scan Results✅ No security findings were detected in this PR
Security scan by Jit
|
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes Redis Cluster PubSub dialing failures in TLS-only clusters where CLUSTER SLOTS reports a node port of 0, by normalizing the dial address to use the origin endpoint’s port (while preserving the raw server-reported NodeAddress for maintenance-notification matching).
Changes:
- Normalize slot node dial addresses by replacing
:0ports with the origin connection’s port when building cluster state. - Add a regression test covering the
:0port scenario, non-zero passthrough, and interaction with loopback-host substitution. - Minor cleanup (small formatting/style tweaks).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
osscluster.go |
Adds replaceZeroPort and applies it during cluster state construction to avoid dialing :0 in TLS-only clusters. |
internal_test.go |
Adds TestNewClusterState_PortZero to prevent regressions and validate expected address vs NodeAddress behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ofekshenawa
approved these changes
May 27, 2026
saschazepter
pushed a commit
to saschazepter/forgejo
that referenced
this pull request
May 29, 2026
This PR contains the following updates: | Package | Change | [Age](https://docs.renovatebot.com/merge-confidence/) | [Confidence](https://docs.renovatebot.com/merge-confidence/) | |---|---|---|---| | [github.com/redis/go-redis/v9](https://github.com/redis/go-redis) | `v9.19.0` → `v9.20.0` |  |  | --- ### Release Notes <details> <summary>redis/go-redis (github.com/redis/go-redis/v9)</summary> ### [`v9.20.0`](https://github.com/redis/go-redis/releases/tag/v9.20.0): 9.20.0 [Compare Source](redis/go-redis@v9.19.0...v9.20.0) #### 🚀 Highlights ##### Redis 8.8 Support This release adds support for **Redis 8.8**. The README's supported-versions list now includes Redis 8.8 alongside 8.0/8.2/8.4, and CI exercises the `8.8` client-libs-test image across the full suite (Makefile, build workflow, doctests, run-tests action, and docker-compose). Coverage for the new commands that ship in the 8.x line, rounded out in this release: - **`AR*` array data type** ([#​3813](redis/go-redis#3813)) — new array data structure, exposed via the `ArrayCmdable` interface (see the experimental-features highlight below). - **`INCREX`** ([#​3816](redis/go-redis#3816)) — atomic increment with expiration in a single round-trip. - **`XNACK`** ([#​3790](redis/go-redis#3790)) — explicit negative-acknowledge of pending stream entries. - **`XAUTOCLAIM` PEL deletes** ([#​3798](redis/go-redis#3798)) — `XAUTOCLAIM`/`XAUTOCLAIMJUSTID` now return the list of deleted message IDs from the pending entries list. - **`TS.RANGE` multiple aggregators** ([#​3791](redis/go-redis#3791)) — `TS.RANGE`/`TS.REVRANGE`/`TS.MRANGE`/`TS.MREVRANGE` accept multiple aggregators in a single call. - **`Z(UNION|INTER|DIFF)` `COUNT` aggregator** ([#​3802](redis/go-redis#3802)) — `COUNT` reducer for sorted-set set operations. - **`JSON.SET FPHA`** ([#​3797](redis/go-redis#3797)) — new `FPHA` argument that specifies the floating-point type for homogeneous FP arrays. CI image bump ([#​3814](redis/go-redis#3814)) by [@​ofekshenawa](https://github.com/ofekshenawa). Command coverage contributions by [@​cxljs](https://github.com/cxljs), [@​elena-kolevska](https://github.com/elena-kolevska), [@​Khukharr](https://github.com/Khukharr), [@​ndyakov](https://github.com/ndyakov), and [@​ofekshenawa](https://github.com/ofekshenawa). ##### Stable RESP3 for RediSearch (`UnstableResp3` deprecated) `FT.SEARCH`, `FT.AGGREGATE`, `FT.INFO`, `FT.SPELLCHECK`, and `FT.SYNDUMP` now parse RESP3 (map) responses into the same typed result objects as RESP2 — `Val()` and `Result()` work uniformly on both protocols, no flag required. Previously, RESP3 search responses required `UnstableResp3: true` and were returned as opaque maps accessible only via `RawResult()` / `RawVal()`. As a result, the `UnstableResp3` option is now a **no-op** across every options struct (`Options`, `ClusterOptions`, `UniversalOptions`, `FailoverOptions`, `RingOptions`) and has been marked `// Deprecated:`. The field is retained for backwards compatibility — existing code that sets `UnstableResp3: true` will continue to compile and behave identically — but it will be removed in a future release and new code should not set it. `RawResult()` / `RawVal()` continue to work for callers that prefer the raw RESP payload. ([#​3741](redis/go-redis#3741)) by [@​ndyakov](https://github.com/ndyakov) ##### Experimental Array Data Structure Commands Adds an experimental `ArrayCmdable` interface with the `AR*` command family (`ARSet`, `ARGet`, `ARGetRange`, `ARMSet`, `ARMGet`, `ARDel`, `ARDelRange`, `ARScan`, `ARSeek`, `ARNext`, `ARLastItems`, `ARGrep`, `ARGrepWithValues`, `ARInfo`/`ARInfoFull`, and typed reducers `AROpSum`/`AROpMin`/`AROpMax`/`AROpAnd`/`AROpOr`/`AROpXor`/`AROpMatch`/`AROpUsed`) for working with Redis 8.8's new array data type. **API is experimental and may change in a future release.** ([#​3813](redis/go-redis#3813)) by [@​cxljs](https://github.com/cxljs) #### ✨ New Features - **RESP3 search parser**: First-class RESP3 parsing for `FT.SEARCH`/`FT.AGGREGATE`/`FT.INFO`/`FT.SPELLCHECK`/`FT.SYNDUMP` responses with backwards compatibility for RESP2 ([#​3741](redis/go-redis#3741)) by [@​ndyakov](https://github.com/ndyakov) - **INCREX**: New `INCREX` command support — atomic increment with expiration ([#​3816](redis/go-redis#3816)) by [@​ndyakov](https://github.com/ndyakov) - **XNACK**: Client support for the `XNACK` stream command for explicitly negative-acknowledging pending entries ([#​3790](redis/go-redis#3790)) by [@​elena-kolevska](https://github.com/elena-kolevska) - **TS range multiple aggregators**: `TS.RANGE`/`TS.REVRANGE`/`TS.MRANGE`/`TS.MREVRANGE` now accept multiple aggregators in a single call ([#​3791](redis/go-redis#3791)) by [@​elena-kolevska](https://github.com/elena-kolevska) - **`XAutoClaim` deleted IDs**: `XAUTOCLAIM`/`XAUTOCLAIMJUSTID` now return the list of deleted message IDs from the PEL ([#​3798](redis/go-redis#3798)) by [@​Khukharr](https://github.com/Khukharr) - **`JSON.SET FPHA`**: `JSON.SET` accepts a new `FPHA` argument that specifies the floating-point type for homogeneous floating-point arrays ([#​3797](redis/go-redis#3797)) by [@​ndyakov](https://github.com/ndyakov) - **Sorted-set union/intersection COUNT**: `ZUNION`/`ZINTER`/`ZDIFF` aggregator now supports `COUNT` ([#​3802](redis/go-redis#3802)) by [@​ofekshenawa](https://github.com/ofekshenawa) - **`FT.HYBRID` vector validation**: Validates hybrid-search vector input types and adds proper typed vector parameters ([#​3756](redis/go-redis#3756)) by [@​DengY11](https://github.com/DengY11) - **Cluster pool wait stats**: `ClusterClient.PoolStats()` now accumulates `WaitCount` and `WaitDurationNs` across all node pools (previously always zero) ([#​3809](redis/go-redis#3809)) by [@​LINKIWI](https://github.com/LINKIWI) #### 🐛 Bug Fixes - **TLS-only Cluster PubSub**: `CLUSTER SLOTS` port-0 entries now fall back to the origin endpoint's port, fixing `dial tcp <ip>:0: connection refused` on TLS-only clusters started with `--port 0 --tls-port <port>` (fixes [#​3726](redis/go-redis#3726)) ([#​3828](redis/go-redis#3828)) by [@​ndyakov](https://github.com/ndyakov) - **Sharded PubSub reconnect routing**: `PubSub.conn()` now passes both regular (`c.channels`) and sharded (`c.schannels`) channels into the per-PubSub `newConn` closure. Previously, `ClusterClient.SSubscribe`-only PubSubs reconnected to a random node (because the routing closure saw an empty channel list), the `SSUBSCRIBE` was sent to the wrong shard, and the resulting `MOVED` reply was silently dropped ([#​3829](redis/go-redis#3829)) by [@​ndyakov](https://github.com/ndyakov) - **ClusterClient `Watch` retry**: User errors returned from a `Watch` callback are no longer subjected to cluster-retry classification; transient cluster errors still retry, but a callback returning e.g. `net.ErrClosed` short-circuits immediately ([#​3821](redis/go-redis#3821)) by [@​obiyang](https://github.com/obiyang) - **Sentinel concurrent-probe leak**: `MasterAddr`'s concurrent sentinel probe now closes the non-winning sentinel clients instead of leaking them ([#​3827](redis/go-redis#3827)) by [@​cxljs](https://github.com/cxljs) - **Sentinel rediscovery loop on master-only setups**: `replicaAddrs` no longer tears down the cached sentinel client when the replica list is empty, eliminating a continuous rediscovery loop on master-only Sentinel deployments that flooded logs and added per-operation latency ([#​3795](redis/go-redis#3795)) by [@​shahyash2609](https://github.com/shahyash2609) - **Pool `CloseConn` hooks**: `Pool.CloseConn` now triggers registered hooks, fixing a memory leak when connections are closed explicitly rather than via the normal removal path ([#​3818](redis/go-redis#3818)) by [@​ndyakov](https://github.com/ndyakov) - **Dial TCP error redirection**: Wrapped `dial tcp` errors are now correctly classified as redirectable so cluster routing can recover from a single unreachable node ([#​3810](redis/go-redis#3810)) by [@​vladisa88](https://github.com/vladisa88) - **Pool `Close` health checks**: `ConnPool.Close` now only runs health checks against idle connections, avoiding spurious activity on connections still in use ([#​3805](redis/go-redis#3805)) by [@​ndyakov](https://github.com/ndyakov) - **VLinks return type**: Fixed the return type of `VLINKS`/`VLINKSWITHSCORES` vector-set replies ([#​3820](redis/go-redis#3820)) by [@​romanpovol](https://github.com/romanpovol) #### 🧪 Testing & Infrastructure - **Flaky tests**: Stabilized several flaky tests in the sentinel and pool suites ([#​3815](redis/go-redis#3815)) by [@​ndyakov](https://github.com/ndyakov) - **Sentinel failover metric race**: Fixed a data race in the sentinel failover metric test ([#​3824](redis/go-redis#3824)) by [@​cxljs](https://github.com/cxljs) - **`waitForSentinelClusterStable` post-conditions**: The sentinel test harness now waits for replicas to be fully connected (not just present in the count) and is robust to randomized spec ordering after failover specs, eliminating an intermittent `Expected master to equal slave` flake ([#​3830](redis/go-redis#3830)) by [@​ndyakov](https://github.com/ndyakov) - **`govulncheck` workflow**: New scheduled GitHub Actions workflow runs `govulncheck` on every push, PR, and weekly, surfacing newly disclosed Go vulnerabilities even when no code changes ([#​3779](redis/go-redis#3779)) by [@​solardome](https://github.com/solardome) - **CI Redis 8.8-rc1**: CI now exercises the 8.8-rc1 Redis image ([#​3814](redis/go-redis#3814)) by [@​ofekshenawa](https://github.com/ofekshenawa) #### 🧰 Maintenance - **`Cmd.Slot()` lookup refactor**: Caches the per-command `CommandInfo` and short-circuits keyless commands before the switch dispatch, removing redundant `Peek` calls ([#​3804](redis/go-redis#3804)) by [@​retr0-kernel](https://github.com/retr0-kernel) - **stdlib `math/rand`**: Replaced `internal/rand` with `math/rand` from the standard library now that the minimum Go version is 1.24 ([#​3823](redis/go-redis#3823)) by [@​cxljs](https://github.com/cxljs) - **ConnPool queue channel**: Removed the unused queue channel from `ConnPool`, trimming the pool's footprint ([#​3826](redis/go-redis#3826)) by [@​cxljs](https://github.com/cxljs) - **Extra packages LICENSE**: Added a LICENSE file to each `extra/*` package ([#​3817](redis/go-redis#3817)) by [@​ndyakov](https://github.com/ndyakov) - **README & CI image**: Documentation refresh and bumped the default CI image tag ([#​3822](redis/go-redis#3822)) by [@​ndyakov](https://github.com/ndyakov) #### 👥 Contributors We'd like to thank all the contributors who worked on this release! [@​cxljs](https://github.com/cxljs), [@​DengY11](https://github.com/DengY11), [@​elena-kolevska](https://github.com/elena-kolevska), [@​Khukharr](https://github.com/Khukharr), [@​LINKIWI](https://github.com/LINKIWI), [@​ndyakov](https://github.com/ndyakov), [@​obiyang](https://github.com/obiyang), [@​ofekshenawa](https://github.com/ofekshenawa), [@​retr0-kernel](https://github.com/retr0-kernel), [@​romanpovol](https://github.com/romanpovol), [@​shahyash2609](https://github.com/shahyash2609), [@​solardome](https://github.com/solardome), [@​vladisa88](https://github.com/vladisa88) *** **Full Changelog**: <redis/go-redis@v9.19.0...v9.20.0> </details> --- ### Configuration 📅 **Schedule**: (UTC) - Branch creation - Between 12:00 AM and 03:59 AM (`* 0-3 * * *`) - Automerge - Between 12:00 AM and 03:59 AM (`* 0-3 * * *`) 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xOTUuMSIsInVwZGF0ZWRJblZlciI6IjQzLjE5NS4xIiwidGFyZ2V0QnJhbmNoIjoiZm9yZ2VqbyIsImxhYmVscyI6WyJkZXBlbmRlbmN5LXVwZ3JhZGUiLCJ0ZXN0L25vdC1uZWVkZWQiXX0=--> Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/12804 Reviewed-by: Mathieu Fenniak <mfenniak@noreply.codeberg.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In TLS-only clusters started with
--port 0 --tls-port <port>, CLUSTER SLOTS reports the regular port as 0. newClusterState used to pass that value verbatim into clusterNode.Client.opt.Addr, which the PubSub path dials directly with no MOVED/seed fallback, producingdial tcp <ip>:0: connection refused.Add a replaceZeroPort helper symmetric to replaceLoopbackHost: when a slot node's port is "0", substitute the origin endpoint's port — by definition reachable, since it is the port we used to fetch the slot map. NodeAddress is left untouched so maintenance-notification matching continues to see the server-reported value.
Adds TestNewClusterState_PortZero covering the bug scenario plus the non-zero passthrough and the loopback+:0 composition.
Fixes #3726
Note
Medium Risk
Touches cluster address normalization used by all node dials including PubSub; behavior is narrow (port 0 only) but incorrect fallback could mis-route in exotic topologies.
Overview
Cluster topology now normalizes dial addresses when
CLUSTER SLOTSreports port 0 (TLS-only setups with--port 0 --tls-port <n>). InnewClusterState, after existing loopback host substitution,replaceZeroPortrewrites:0endpoints to use the origin endpoint’s port—the same address used to fetch the slot map—so clients (including PubSub, which dialsClient.opt.Addrdirectly) no longer hitdial tcp <ip>:0.NodeAddressstays the raw server-reported value (…:0) for maintenance-notification matching; only the transformedAddrused for connections changes. Regression coverage is inTestNewClusterState_PortZero(TLS:0case, non-zero ports unchanged, loopback +:0composition).Reviewed by Cursor Bugbot for commit 38f527f. Bugbot is set up for automated code reviews on this repo. Configure here.