Skip to content

Skip slot cache optimization for AOF client to prevent key duplication and data corruption#3004

Merged
murphyjacob4 merged 1 commit into
valkey-io:unstablefrom
AdityaTeltia:fix-duplicate-keys-loading-AOF
Jan 10, 2026
Merged

Skip slot cache optimization for AOF client to prevent key duplication and data corruption#3004
murphyjacob4 merged 1 commit into
valkey-io:unstablefrom
AdityaTeltia:fix-duplicate-keys-loading-AOF

Conversation

@AdityaTeltia

@AdityaTeltia AdityaTeltia commented Jan 4, 2026

Copy link
Copy Markdown
Contributor

When loading AOF in cluster mode, keys inside a MULTI/EXEC block could be
inserted into wrong hash slots, causing key duplication and data corruption.

The root cause was the slot caching optimization in getKeySlot(). This
optimization reuses a cached slot value to avoid recalculating the hash
for every key operation. However, when replaying AOF, a transaction may
contain commands affecting keys in different slots. The cached slot from
a previous command (e.g., SET k1) would incorrectly be used for subsequent
commands in the transaction (e.g., SET k0), causing k0 to be stored in k1's
slot.

The existing code already skipped this optimization for replicated clients
(commands from primary) using isReplicatedClient(). This change extends
that to also skip for AOF clients by using mustObeyClient() instead, which
covers both replicated clients and the AOF client.

Fixes #2995, introduced in #1949.

…cation (valkey-io#2995)

Signed-off-by: aditya.teltia <teltia.aditya22@gmail.com>
@codecov

codecov Bot commented Jan 5, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.15%. Comparing base (263d9ea) to head (26c7680).
⚠️ Report is 13 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3004      +/-   ##
============================================
- Coverage     74.27%   74.15%   -0.12%     
============================================
  Files           129      129              
  Lines         70896    70896              
============================================
- Hits          52656    52573      -83     
- Misses        18240    18323      +83     
Files with missing lines Coverage Δ
src/db.c 94.01% <100.00%> (ø)

... and 26 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@enjoy-binbin enjoy-binbin changed the title fix: skip slot cache optimization for AOF client to prevent key duplication (#2995) Skip slot cache optimization for AOF client to prevent key duplication and data corruption Jan 5, 2026

@enjoy-binbin enjoy-binbin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix.

Comment thread src/db.c
Comment thread tests/unit/multi.tcl
@enjoy-binbin enjoy-binbin added the release-notes This issue should get a line item in the release notes label Jan 6, 2026
@murphyjacob4

Copy link
Copy Markdown
Contributor

For the purpose of understanding the impact, I think the only way to get a cross-slot MULTI/EXEC in the AOF is:

  1. Load an AOF from a cluster mode disabled instance (where cross-slot transactions are permitted)
  2. Load an AOF with content generated from a script or module that executes a cross-slot transaction

But yeah definitely a bug. Good find!

@murphyjacob4

Copy link
Copy Markdown
Contributor

I think the only way to get a cross-slot MULTI/EXEC in the AOF is

Sorry, read through the issue again. It looks like the issue is also occurring with a same-slot MULTI/EXEC. It is basically using the slot of the previous command execution. So yeah - not just cross-slot MULTI/EXEC

@murphyjacob4 murphyjacob4 merged commit de00054 into valkey-io:unstable Jan 10, 2026
58 checks passed
@github-project-automation github-project-automation Bot moved this to To be backported in Valkey 9.0 Jan 10, 2026
zuiderkwast pushed a commit to zuiderkwast/valkey that referenced this pull request Jan 29, 2026
…n and data corruption (valkey-io#3004)

When loading AOF in cluster mode, keys inside a MULTI/EXEC block could
be
inserted into wrong hash slots, causing key duplication and data
corruption.

The root cause was the slot caching optimization in getKeySlot(). This
optimization reuses a cached slot value to avoid recalculating the hash
for every key operation. However, when replaying AOF, a transaction may
contain commands affecting keys in different slots. The cached slot from
a previous command (e.g., SET k1) would incorrectly be used for
subsequent
commands in the transaction (e.g., SET k0), causing k0 to be stored in
k1's
slot.

The existing code already skipped this optimization for replicated
clients
(commands from primary) using isReplicatedClient(). This change extends
that to also skip for AOF clients by using mustObeyClient() instead,
which
covers both replicated clients and the AOF client.

Fixes valkey-io#2995, introduced in valkey-io#1949.

Signed-off-by: aditya.teltia <teltia.aditya22@gmail.com>
zuiderkwast pushed a commit to zuiderkwast/valkey that referenced this pull request Jan 30, 2026
…n and data corruption (valkey-io#3004)

When loading AOF in cluster mode, keys inside a MULTI/EXEC block could
be
inserted into wrong hash slots, causing key duplication and data
corruption.

The root cause was the slot caching optimization in getKeySlot(). This
optimization reuses a cached slot value to avoid recalculating the hash
for every key operation. However, when replaying AOF, a transaction may
contain commands affecting keys in different slots. The cached slot from
a previous command (e.g., SET k1) would incorrectly be used for
subsequent
commands in the transaction (e.g., SET k0), causing k0 to be stored in
k1's
slot.

The existing code already skipped this optimization for replicated
clients
(commands from primary) using isReplicatedClient(). This change extends
that to also skip for AOF clients by using mustObeyClient() instead,
which
covers both replicated clients and the AOF client.

Fixes valkey-io#2995, introduced in valkey-io#1949.

Signed-off-by: aditya.teltia <teltia.aditya22@gmail.com>
@zuiderkwast zuiderkwast moved this from To be backported to 9.0.2 WIP in Valkey 9.0 Jan 30, 2026
zuiderkwast pushed a commit that referenced this pull request Feb 3, 2026
…n and data corruption (#3004)

When loading AOF in cluster mode, keys inside a MULTI/EXEC block could
be
inserted into wrong hash slots, causing key duplication and data
corruption.

The root cause was the slot caching optimization in getKeySlot(). This
optimization reuses a cached slot value to avoid recalculating the hash
for every key operation. However, when replaying AOF, a transaction may
contain commands affecting keys in different slots. The cached slot from
a previous command (e.g., SET k1) would incorrectly be used for
subsequent
commands in the transaction (e.g., SET k0), causing k0 to be stored in
k1's
slot.

The existing code already skipped this optimization for replicated
clients
(commands from primary) using isReplicatedClient(). This change extends
that to also skip for AOF clients by using mustObeyClient() instead,
which
covers both replicated clients and the AOF client.

Fixes #2995, introduced in #1949.

Signed-off-by: aditya.teltia <teltia.aditya22@gmail.com>
harrylin98 pushed a commit to harrylin98/valkey_forked that referenced this pull request Feb 19, 2026
…n and data corruption (valkey-io#3004)

When loading AOF in cluster mode, keys inside a MULTI/EXEC block could
be
inserted into wrong hash slots, causing key duplication and data
corruption.

The root cause was the slot caching optimization in getKeySlot(). This
optimization reuses a cached slot value to avoid recalculating the hash
for every key operation. However, when replaying AOF, a transaction may
contain commands affecting keys in different slots. The cached slot from
a previous command (e.g., SET k1) would incorrectly be used for
subsequent
commands in the transaction (e.g., SET k0), causing k0 to be stored in
k1's
slot.

The existing code already skipped this optimization for replicated
clients
(commands from primary) using isReplicatedClient(). This change extends
that to also skip for AOF clients by using mustObeyClient() instead,
which
covers both replicated clients and the AOF client.

Fixes valkey-io#2995, introduced in valkey-io#1949.

Signed-off-by: aditya.teltia <teltia.aditya22@gmail.com>
hpatro pushed a commit to hpatro/valkey that referenced this pull request Mar 5, 2026
…n and data corruption (valkey-io#3004)

When loading AOF in cluster mode, keys inside a MULTI/EXEC block could
be
inserted into wrong hash slots, causing key duplication and data
corruption.

The root cause was the slot caching optimization in getKeySlot(). This
optimization reuses a cached slot value to avoid recalculating the hash
for every key operation. However, when replaying AOF, a transaction may
contain commands affecting keys in different slots. The cached slot from
a previous command (e.g., SET k1) would incorrectly be used for
subsequent
commands in the transaction (e.g., SET k0), causing k0 to be stored in
k1's
slot.

The existing code already skipped this optimization for replicated
clients
(commands from primary) using isReplicatedClient(). This change extends
that to also skip for AOF clients by using mustObeyClient() instead,
which
covers both replicated clients and the AOF client.

Fixes valkey-io#2995, introduced in valkey-io#1949.

Signed-off-by: aditya.teltia <teltia.aditya22@gmail.com>
Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
lmagomes pushed a commit to lmagomes/home-services that referenced this pull request May 12, 2026
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [docker.io/valkey/valkey](https://github.com/valkey-io/valkey) | image | patch | `9.0.1` → `9.0.4` |

---

### Release Notes

<details>
<summary>valkey-io/valkey (docker.io/valkey/valkey)</summary>

### [`v9.0.4`](https://github.com/valkey-io/valkey/releases/tag/9.0.4)

[Compare Source](valkey-io/valkey@9.0.3...9.0.4)

Upgrade urgency SECURITY: This release includes security fixes we recommend you
apply as soon as possible.

##### Security fixes

- (CVE-2026-23479) Use-After-Free in unblock client flow
- (CVE-2026-25243) Invalid Memory Access in RESTORE command
- (CVE-2026-23631) Use-after-free when full sync occurs during a yielding Lua/function execution

### [`v9.0.3`](https://github.com/valkey-io/valkey/releases/tag/9.0.3)

[Compare Source](valkey-io/valkey@9.0.2...9.0.3)

##### Valkey 9.0.3

Upgrade urgency SECURITY: This release includes security fixes we recommend you
apply as soon as possible.

##### Security fixes

- (CVE-2025-67733) RESP Protocol Injection via Lua error\_reply
- (CVE-2026-21863) Remote DoS with malformed Valkey Cluster bus message
- (CVE-2026-27623) Reset request type after handling empty requests

##### Bug fixes

- Avoids crash during MODULE UNLOAD when ACL rules reference a module command and subcommand ([#&#8203;3160](valkey-io/valkey#3160))
- Fix server assert on ACL LOAD when current user loses permission to channels ([#&#8203;3182](valkey-io/valkey#3182))
- Fix bug causing no response flush sometimes when IO threads are busy ([#&#8203;3205](valkey-io/valkey#3205))

### [`v9.0.2`](https://github.com/valkey-io/valkey/releases/tag/9.0.2)

[Compare Source](valkey-io/valkey@9.0.1...9.0.2)

Upgrade urgency HIGH: There are critical bugs that may affect a subset of users.

#### Bug fixes

- Avoid memory leak of new argv when HEXPIRE commands target only non-exiting fields ([#&#8203;2973](valkey-io/valkey#2973))
- Fix HINCRBY and HINCRBYFLOAT to update volatile key tracking ([#&#8203;2974](valkey-io/valkey#2974))
- Avoid empty hash object when HSETEX added no fields ([#&#8203;2998](valkey-io/valkey#2998))
- Fix case-sensitive check for the FNX and FXX arguments in HSETEX ([#&#8203;3000](valkey-io/valkey#3000))
- Prevent assertion in active expiration job after a hash with volatile fields is overwritten ([#&#8203;3003](valkey-io/valkey#3003), [#&#8203;3007](valkey-io/valkey#3007))
- Fix HRANDFIELD to return null response when no field could be found ([#&#8203;3022](valkey-io/valkey#3022))
- Fix HEXPIRE to not delete items when validation rules fail and expiration is in the past ([#&#8203;3023](valkey-io/valkey#3023), [#&#8203;3048](valkey-io/valkey#3048))
- Fix how hash is handling overriding of expired fields overwrite ([#&#8203;3060](valkey-io/valkey#3060))
- HSETEX - Always issue keyspace notifications after validation ([#&#8203;3001](valkey-io/valkey#3001))
- Make zero a valid TTL for hash fields during import mode and data loading ([#&#8203;3006](valkey-io/valkey#3006))
- Trigger prepareCommand on argc change in module command filters ([#&#8203;2945](valkey-io/valkey#2945))
- Restrict TTL from being negative and avoid crash in import-mode ([#&#8203;2944](valkey-io/valkey#2944))
- Fix chained replica crash when doing dual channel replication ([#&#8203;2983](valkey-io/valkey#2983))
- Skip slot cache optimization for AOF client to prevent key duplication and data corruption ([#&#8203;3004](valkey-io/valkey#3004))
- Fix used\_memory\_dataset underflow due to miscalculated used\_memory\_overhead ([#&#8203;3005](valkey-io/valkey#3005))
- Avoid duplicate calculations of network-bytes-out in slot stats with copy-avoidance ([#&#8203;3046](valkey-io/valkey#3046))
- Fix XREAD returning error on empty stream with + ID ([#&#8203;2742](valkey-io/valkey#2742))

#### Performance/Efficiency Improvements

- Track reply bytes in I/O threads if commandlog-reply-larger-than is -1 ([#&#8203;3086](valkey-io/valkey#3086), [#&#8203;3126](valkey-io/valkey#3126)).
  This makes it possible to mitigate a performance regression in 9.0.1 caused by the bug fix [#&#8203;2652](valkey-io/valkey#2652).

**Full Changelog**: <valkey-io/valkey@9.0.1...9.0.2>

</details>

---

### Configuration

📅 **Schedule**: (UTC)

- Branch creation
  - "before 6am"
- Automerge
  - At any time (no schedule defined)

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Mend Renovate](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xNjkuNCIsInVwZGF0ZWRJblZlciI6IjQzLjE2OS40IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZSJdfQ==-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes This issue should get a line item in the release notes

Projects

Status: 9.0.2

Development

Successfully merging this pull request may close these issues.

[BUG] Valkey cluster is duplicating keys when loading AOF

4 participants