refactor(TODO-3803): Enhance key position retrieval with CommandInfo caching#3804
Conversation
|
Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset. In case there are security findings, they will be communicated to you as a comment inside the PR. Hope you’ll enjoy using Jit. Questions? Comments? Want to learn more? Get in touch with us. |
🛡️ Jit Security Scan Results✅ No security findings were detected in this PR
Security scan by Jit
|
ndyakov
left a comment
There was a problem hiding this comment.
left some suggestions. keep in mind there is another configuration with the default command policies and we may want to combine those. will get back to you with more information soon.
I will reiterate and get back to you on these. |
|
Hi @ndyakov ,
|
361dd5b to
95ae37d
Compare
|
Sorry for the mess. Accidentally committed with my official so just changed the authors. The code is still intact if any issues will cherry pick y changes into a new branch. |
…caching ultimately solving the TODO
…esh lock mechanism as requested
|
Hi @ndyakov any updates on this? I was thinking of cherry picking for a clean change. |
|
Hi @ndyakov any updates on this? |
|
Hi @ndyakov any actionable items from my side? |
|
Is this MR still ready to merge? Let me know if any changed needed from my side @ndyakov |
|
@retr0-kernel i will do a full review today or tomorrow and let you know, thank you for pinging me. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit dbf545a. Configure here.
ndyakov
left a comment
There was a problem hiding this comment.
Thanks for sticking with this @retr0-kernel , and apologies for the delay. I went through the diff end-to-end and there are a few things to address before this can land:
-
cmdsInfoCache is never populated for
ring. cmdsInfoCache.Get() is only called from ClusterClient.cmdInfo (osscluster.go:2294). Nothing in ring.go ever calls Get, so Peek() always returns nil for Ring and every info argument we pass into cmdFirstKeyPosWithInfo from ring code is nil. Net effect for Ring: identical routing as before, plus an extra RLock per command. We need to either populate the ring cache lazily on first routing decision (mirroring how cluster gets it via cmdInfo) or drop the ring changes from this PR and do it as a follow-up. -
Per-loop redundant peeks in slottedKeyedCommands and executeMultiShard/createSlotSpecificCommand. See inline comments. The ring.generalProcessPipeline pattern (peek once, lookup per cmd) is what we want on the cluster side too.
-
No test covers the new path. We need at least one regression test where the cached FirstKeyPos differs from the hardcoded fallback — otherwise the suite only ever exercises the cold-cache branch.
-
Peek() may block - added a suggestion to edit the comment.
|
Hi @ndyakov no issues. I will get these done and will update |
…ehavior during initial population
…nfo and reducing redundant Peek calls
|
Hey @ndyakov, sorry for the slow reply. Was tied up with some other stuff this past week. Addressed everything from your review:Dropped the ring changes. The cache is never warmed there since Get() is never called in the ring routing path, so Peek() always returned nil and the RLock per command was just noise. Both call sites in cmdShard and generalProcessPipeline now pass nil directly, identical behaviour to before. Left a TODO at both spots, fixing it properly means threading ctx into cmdShard and calling Get() lazily on first use like ClusterClient.cmdInfo does, but that touches method signatures across the ring routing path so felt cleaner to leave for a separate PR. For slottedKeyedCommands, moved the Peek() outside the loop so it's one lock acquire for the whole batch. Added cmdSlotWithPos to take the pre-computed pos so we're not acquiring the lock twice per command. The cluster getkeysinslot/countkeysinslot special case is still there. For executeMultiShard/createSlotSpecificCommand, firstKeyPos is now computed once and passed down through executeMultiSlot. The independent Peek() in createSlotSpecificCommand is gone so the inconsistency you flagged can't happen. Updated the Peek() comment to mention the RLock and that it can briefly block if Get or Refresh holds the write lock. Added TestCmdFirstKeyPosWithInfo_UsesCommandInfoWhenWarm in Let me know if any changes or more clarifications needed. |
ndyakov
left a comment
There was a problem hiding this comment.
Looks good, will apply the godoc suggestion and proceed.
This PR contains the following updates: | Package | Change | [Age](https://docs.renovatebot.com/merge-confidence/) | [Confidence](https://docs.renovatebot.com/merge-confidence/) | |---|---|---|---| | [github.com/redis/go-redis/v9](https://github.com/redis/go-redis) | `v9.19.0` → `v9.20.0` |  |  | --- ### Release Notes <details> <summary>redis/go-redis (github.com/redis/go-redis/v9)</summary> ### [`v9.20.0`](https://github.com/redis/go-redis/releases/tag/v9.20.0): 9.20.0 [Compare Source](redis/go-redis@v9.19.0...v9.20.0) #### 🚀 Highlights ##### Redis 8.8 Support This release adds support for **Redis 8.8**. The README's supported-versions list now includes Redis 8.8 alongside 8.0/8.2/8.4, and CI exercises the `8.8` client-libs-test image across the full suite (Makefile, build workflow, doctests, run-tests action, and docker-compose). Coverage for the new commands that ship in the 8.x line, rounded out in this release: - **`AR*` array data type** ([#​3813](redis/go-redis#3813)) — new array data structure, exposed via the `ArrayCmdable` interface (see the experimental-features highlight below). - **`INCREX`** ([#​3816](redis/go-redis#3816)) — atomic increment with expiration in a single round-trip. - **`XNACK`** ([#​3790](redis/go-redis#3790)) — explicit negative-acknowledge of pending stream entries. - **`XAUTOCLAIM` PEL deletes** ([#​3798](redis/go-redis#3798)) — `XAUTOCLAIM`/`XAUTOCLAIMJUSTID` now return the list of deleted message IDs from the pending entries list. - **`TS.RANGE` multiple aggregators** ([#​3791](redis/go-redis#3791)) — `TS.RANGE`/`TS.REVRANGE`/`TS.MRANGE`/`TS.MREVRANGE` accept multiple aggregators in a single call. - **`Z(UNION|INTER|DIFF)` `COUNT` aggregator** ([#​3802](redis/go-redis#3802)) — `COUNT` reducer for sorted-set set operations. - **`JSON.SET FPHA`** ([#​3797](redis/go-redis#3797)) — new `FPHA` argument that specifies the floating-point type for homogeneous FP arrays. CI image bump ([#​3814](redis/go-redis#3814)) by [@​ofekshenawa](https://github.com/ofekshenawa). Command coverage contributions by [@​cxljs](https://github.com/cxljs), [@​elena-kolevska](https://github.com/elena-kolevska), [@​Khukharr](https://github.com/Khukharr), [@​ndyakov](https://github.com/ndyakov), and [@​ofekshenawa](https://github.com/ofekshenawa). ##### Stable RESP3 for RediSearch (`UnstableResp3` deprecated) `FT.SEARCH`, `FT.AGGREGATE`, `FT.INFO`, `FT.SPELLCHECK`, and `FT.SYNDUMP` now parse RESP3 (map) responses into the same typed result objects as RESP2 — `Val()` and `Result()` work uniformly on both protocols, no flag required. Previously, RESP3 search responses required `UnstableResp3: true` and were returned as opaque maps accessible only via `RawResult()` / `RawVal()`. As a result, the `UnstableResp3` option is now a **no-op** across every options struct (`Options`, `ClusterOptions`, `UniversalOptions`, `FailoverOptions`, `RingOptions`) and has been marked `// Deprecated:`. The field is retained for backwards compatibility — existing code that sets `UnstableResp3: true` will continue to compile and behave identically — but it will be removed in a future release and new code should not set it. `RawResult()` / `RawVal()` continue to work for callers that prefer the raw RESP payload. ([#​3741](redis/go-redis#3741)) by [@​ndyakov](https://github.com/ndyakov) ##### Experimental Array Data Structure Commands Adds an experimental `ArrayCmdable` interface with the `AR*` command family (`ARSet`, `ARGet`, `ARGetRange`, `ARMSet`, `ARMGet`, `ARDel`, `ARDelRange`, `ARScan`, `ARSeek`, `ARNext`, `ARLastItems`, `ARGrep`, `ARGrepWithValues`, `ARInfo`/`ARInfoFull`, and typed reducers `AROpSum`/`AROpMin`/`AROpMax`/`AROpAnd`/`AROpOr`/`AROpXor`/`AROpMatch`/`AROpUsed`) for working with Redis 8.8's new array data type. **API is experimental and may change in a future release.** ([#​3813](redis/go-redis#3813)) by [@​cxljs](https://github.com/cxljs) #### ✨ New Features - **RESP3 search parser**: First-class RESP3 parsing for `FT.SEARCH`/`FT.AGGREGATE`/`FT.INFO`/`FT.SPELLCHECK`/`FT.SYNDUMP` responses with backwards compatibility for RESP2 ([#​3741](redis/go-redis#3741)) by [@​ndyakov](https://github.com/ndyakov) - **INCREX**: New `INCREX` command support — atomic increment with expiration ([#​3816](redis/go-redis#3816)) by [@​ndyakov](https://github.com/ndyakov) - **XNACK**: Client support for the `XNACK` stream command for explicitly negative-acknowledging pending entries ([#​3790](redis/go-redis#3790)) by [@​elena-kolevska](https://github.com/elena-kolevska) - **TS range multiple aggregators**: `TS.RANGE`/`TS.REVRANGE`/`TS.MRANGE`/`TS.MREVRANGE` now accept multiple aggregators in a single call ([#​3791](redis/go-redis#3791)) by [@​elena-kolevska](https://github.com/elena-kolevska) - **`XAutoClaim` deleted IDs**: `XAUTOCLAIM`/`XAUTOCLAIMJUSTID` now return the list of deleted message IDs from the PEL ([#​3798](redis/go-redis#3798)) by [@​Khukharr](https://github.com/Khukharr) - **`JSON.SET FPHA`**: `JSON.SET` accepts a new `FPHA` argument that specifies the floating-point type for homogeneous floating-point arrays ([#​3797](redis/go-redis#3797)) by [@​ndyakov](https://github.com/ndyakov) - **Sorted-set union/intersection COUNT**: `ZUNION`/`ZINTER`/`ZDIFF` aggregator now supports `COUNT` ([#​3802](redis/go-redis#3802)) by [@​ofekshenawa](https://github.com/ofekshenawa) - **`FT.HYBRID` vector validation**: Validates hybrid-search vector input types and adds proper typed vector parameters ([#​3756](redis/go-redis#3756)) by [@​DengY11](https://github.com/DengY11) - **Cluster pool wait stats**: `ClusterClient.PoolStats()` now accumulates `WaitCount` and `WaitDurationNs` across all node pools (previously always zero) ([#​3809](redis/go-redis#3809)) by [@​LINKIWI](https://github.com/LINKIWI) #### 🐛 Bug Fixes - **TLS-only Cluster PubSub**: `CLUSTER SLOTS` port-0 entries now fall back to the origin endpoint's port, fixing `dial tcp <ip>:0: connection refused` on TLS-only clusters started with `--port 0 --tls-port <port>` (fixes [#​3726](redis/go-redis#3726)) ([#​3828](redis/go-redis#3828)) by [@​ndyakov](https://github.com/ndyakov) - **Sharded PubSub reconnect routing**: `PubSub.conn()` now passes both regular (`c.channels`) and sharded (`c.schannels`) channels into the per-PubSub `newConn` closure. Previously, `ClusterClient.SSubscribe`-only PubSubs reconnected to a random node (because the routing closure saw an empty channel list), the `SSUBSCRIBE` was sent to the wrong shard, and the resulting `MOVED` reply was silently dropped ([#​3829](redis/go-redis#3829)) by [@​ndyakov](https://github.com/ndyakov) - **ClusterClient `Watch` retry**: User errors returned from a `Watch` callback are no longer subjected to cluster-retry classification; transient cluster errors still retry, but a callback returning e.g. `net.ErrClosed` short-circuits immediately ([#​3821](redis/go-redis#3821)) by [@​obiyang](https://github.com/obiyang) - **Sentinel concurrent-probe leak**: `MasterAddr`'s concurrent sentinel probe now closes the non-winning sentinel clients instead of leaking them ([#​3827](redis/go-redis#3827)) by [@​cxljs](https://github.com/cxljs) - **Sentinel rediscovery loop on master-only setups**: `replicaAddrs` no longer tears down the cached sentinel client when the replica list is empty, eliminating a continuous rediscovery loop on master-only Sentinel deployments that flooded logs and added per-operation latency ([#​3795](redis/go-redis#3795)) by [@​shahyash2609](https://github.com/shahyash2609) - **Pool `CloseConn` hooks**: `Pool.CloseConn` now triggers registered hooks, fixing a memory leak when connections are closed explicitly rather than via the normal removal path ([#​3818](redis/go-redis#3818)) by [@​ndyakov](https://github.com/ndyakov) - **Dial TCP error redirection**: Wrapped `dial tcp` errors are now correctly classified as redirectable so cluster routing can recover from a single unreachable node ([#​3810](redis/go-redis#3810)) by [@​vladisa88](https://github.com/vladisa88) - **Pool `Close` health checks**: `ConnPool.Close` now only runs health checks against idle connections, avoiding spurious activity on connections still in use ([#​3805](redis/go-redis#3805)) by [@​ndyakov](https://github.com/ndyakov) - **VLinks return type**: Fixed the return type of `VLINKS`/`VLINKSWITHSCORES` vector-set replies ([#​3820](redis/go-redis#3820)) by [@​romanpovol](https://github.com/romanpovol) #### 🧪 Testing & Infrastructure - **Flaky tests**: Stabilized several flaky tests in the sentinel and pool suites ([#​3815](redis/go-redis#3815)) by [@​ndyakov](https://github.com/ndyakov) - **Sentinel failover metric race**: Fixed a data race in the sentinel failover metric test ([#​3824](redis/go-redis#3824)) by [@​cxljs](https://github.com/cxljs) - **`waitForSentinelClusterStable` post-conditions**: The sentinel test harness now waits for replicas to be fully connected (not just present in the count) and is robust to randomized spec ordering after failover specs, eliminating an intermittent `Expected master to equal slave` flake ([#​3830](redis/go-redis#3830)) by [@​ndyakov](https://github.com/ndyakov) - **`govulncheck` workflow**: New scheduled GitHub Actions workflow runs `govulncheck` on every push, PR, and weekly, surfacing newly disclosed Go vulnerabilities even when no code changes ([#​3779](redis/go-redis#3779)) by [@​solardome](https://github.com/solardome) - **CI Redis 8.8-rc1**: CI now exercises the 8.8-rc1 Redis image ([#​3814](redis/go-redis#3814)) by [@​ofekshenawa](https://github.com/ofekshenawa) #### 🧰 Maintenance - **`Cmd.Slot()` lookup refactor**: Caches the per-command `CommandInfo` and short-circuits keyless commands before the switch dispatch, removing redundant `Peek` calls ([#​3804](redis/go-redis#3804)) by [@​retr0-kernel](https://github.com/retr0-kernel) - **stdlib `math/rand`**: Replaced `internal/rand` with `math/rand` from the standard library now that the minimum Go version is 1.24 ([#​3823](redis/go-redis#3823)) by [@​cxljs](https://github.com/cxljs) - **ConnPool queue channel**: Removed the unused queue channel from `ConnPool`, trimming the pool's footprint ([#​3826](redis/go-redis#3826)) by [@​cxljs](https://github.com/cxljs) - **Extra packages LICENSE**: Added a LICENSE file to each `extra/*` package ([#​3817](redis/go-redis#3817)) by [@​ndyakov](https://github.com/ndyakov) - **README & CI image**: Documentation refresh and bumped the default CI image tag ([#​3822](redis/go-redis#3822)) by [@​ndyakov](https://github.com/ndyakov) #### 👥 Contributors We'd like to thank all the contributors who worked on this release! [@​cxljs](https://github.com/cxljs), [@​DengY11](https://github.com/DengY11), [@​elena-kolevska](https://github.com/elena-kolevska), [@​Khukharr](https://github.com/Khukharr), [@​LINKIWI](https://github.com/LINKIWI), [@​ndyakov](https://github.com/ndyakov), [@​obiyang](https://github.com/obiyang), [@​ofekshenawa](https://github.com/ofekshenawa), [@​retr0-kernel](https://github.com/retr0-kernel), [@​romanpovol](https://github.com/romanpovol), [@​shahyash2609](https://github.com/shahyash2609), [@​solardome](https://github.com/solardome), [@​vladisa88](https://github.com/vladisa88) *** **Full Changelog**: <redis/go-redis@v9.19.0...v9.20.0> </details> --- ### Configuration 📅 **Schedule**: (UTC) - Branch creation - Between 12:00 AM and 03:59 AM (`* 0-3 * * *`) - Automerge - Between 12:00 AM and 03:59 AM (`* 0-3 * * *`) 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xOTUuMSIsInVwZGF0ZWRJblZlciI6IjQzLjE5NS4xIiwidGFyZ2V0QnJhbmNoIjoiZm9yZ2VqbyIsImxhYmVscyI6WyJkZXBlbmRlbmN5LXVwZ3JhZGUiLCJ0ZXN0L25vdC1uZWVkZWQiXX0=--> Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/12804 Reviewed-by: Mathieu Fenniak <mfenniak@noreply.codeberg.org>

Fixes #3803
There's a TODO that's been sitting in cmdFirstKeyPos function. The function figures out which argument position holds the first key for a given Redis command and we need this to route commands to the right node. The way it works right now is by checking a hardcoded map of around 40 keyless commands and defaulting to 1 for everything else.
The thing is we already fetch COMMAND INFO from Redis at startup and cache it, and that response has the exact first-key position for every command baked in. We just weren't using it here. So every time Redis adds a new keyless command, someone has to remember to update our map by hand. Easy to forget, and we've probably already missed some.
What I changed: I wired the cached COMMAND INFO data into the routing logic. The function now takes an optional
*CommandInfoand usesinfo.FirstKeyPosdirectly when it's available. All the cluster and ring call sites pass it in via a small helper that peeks at the already-warm cache.The round-trip concern: @ndyakov had a genuine concern that first-key resolution shouldn't add a Redis round-trip. It doesn't. To tackle that, I added a
Peek()method on the cache that just returns whatever is already in memory, so no network call, no blocking. If the cache is cold (like still starting up or something), it returns nil and the code falls through to the old hardcoded table exactly as before. Once the cache is warm, it's just a map lookup.One edge case:
eval/evalshastill get special-cased. TheirFirstKeyPosin CommandInfo is 3, but that's only correct whennumkeys > 0. Whennumkeys == 0there are no key arguments at all, and you can only know that by looking at the actual runtime arguments, so no cached metadata can help there. So that logic is unchanged.I've added comments throughout the code to make it easier to follow while reviewing. Tested locally and everything looks good, but if something seems off or I've got it wrong somewhere, just let me know and I'll fix it.
Note
Medium Risk
Changes cluster slot and multi-shard routing logic; wrong first-key positions could misroute commands, though cold-cache fallbacks preserve prior behavior.
Overview
First-key routing now prefers cached
COMMAND INFOmetadata instead of relying only on a hand-maintained keyless map and defaulting keyed commands to argument index1.cmdFirstKeyPosis replaced bycmdFirstKeyPosWithInfo, which still honors per-commandfirstKeyPos, thekeylessCommandsset, andeval/evalshanumkeysruntime logic, but usesCommandInfo.FirstKeyPoswhen a non-nil*CommandInfois supplied.cmdsInfoCachegainsPeek()(read-only, no Redis round-trip; nil when cold) andsync.RWMutexso concurrent peeks don’t block each other after warm-up.Cluster call sites use
cmdInfoPeek, batchPeek()once inslottedKeyedCommands, andcmdSlotWithPosso slot computation can reuse a resolved key index. Multi-shard paths passfirstKeyPosintocreateSlotSpecificCommandso sub-commands don’t re-resolve keys inconsistently. Ring still passesnilforCommandInfo(TODO to warm cache like cluster).Unit test
TestCmdFirstKeyPosWithInfo_UsesCommandInfoWhenWarmcovers cold vs warm cache behavior.Reviewed by Cursor Bugbot for commit 8752c63. Bugbot is set up for automated code reviews on this repo. Configure here.