Skip to content

Optimize WATCH duplicate key check from O(N) to O(1) using per-db hashtable#3360

Merged
enjoy-binbin merged 4 commits into
valkey-io:unstablefrom
enjoy-binbin:watch
Mar 31, 2026
Merged

Optimize WATCH duplicate key check from O(N) to O(1) using per-db hashtable#3360
enjoy-binbin merged 4 commits into
valkey-io:unstablefrom
enjoy-binbin:watch

Conversation

@enjoy-binbin

@enjoy-binbin enjoy-binbin commented Mar 13, 2026

Copy link
Copy Markdown
Member

Previously, watchForKey() checked for duplicate watched keys by iterating
through the client's entire watched_keys list with O(N) complexity, where
N is the total number of keys watched by the client. So the time complexity
for the WATCH command could be quite poor and become a slow command.

This commit introduces a per-db hashtable (watched_keys_by_db) in the
client's multiState structure to enable O(1) duplicate key detection.
The hashtable is lazily allocated only when the client starts watching
keys, minimizing memory overhead for clients that don't use WATCH.

The per-db hashtable stores watchedKey* directly as the hashtable entry
since it already contains the key, so no custom destructors are needed.
Memory management remains centralized in the watched_keys list.

This optimization is especially beneficial when a client watches many
keys across different databases, as the check no longer scales with
the total watched key count.

This might be a minor scenario, but there's no harm in optimizing it.

There is a test in multi.tcl, before this patch, it took 15s, and after
this patch, it only took 50ms.

        set elements {}
        for {set i 0} {$i < 50000} {incr i} {
            lappend elements key-$i
        }
        r watch {*}$elements
        r watch {*}$elements

Previously, watchForKey() checked for duplicate watched keys by iterating
through the client's entire watched_keys list with O(N) complexity, where
N is the total number of keys watched by the client. So the time complexity
for the WATCH command could be quite poor and become a slow command.

This commit introduces a per-db dictionary (watched_keys_by_db) in the
client's multiState structure to enable O(1) duplicate key detection.
The dictionary is lazily allocated only when the client starts watching
keys, minimizing memory overhead for clients that don't use WATCH.

The per-db dict stores borrowed references (keys from watchedKey->key,
values are watchedKey pointers), so no custom destructors are needed.
Memory management remains centralized in the watched_keys list.

This optimization is especially beneficial when a client watches many
keys across different databases, as the check no longer scales with
the total watched key count.

This might be a minor scenario, but there's no harm in optimizing it.

There is a test in multi.tcl, before this patch, it took 15s, and after
this patch, it only took 50ms.
```
        set elements {}
        for {set i 0} {$i < 50000} {incr i} {
            lappend elements key-$i
        }
        r watch {*}$elements
        r watch {*}$elements
```

Signed-off-by: Binbin <binloveplay1314@qq.com>

@dvkashapov dvkashapov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM but have a couple of suggestions/questions

Comment thread src/multi.c
Comment thread src/multi.c Outdated
@codecov

codecov Bot commented Mar 13, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.35%. Comparing base (ac5e44c) to head (9fe68f5).
⚠️ Report is 77 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3360      +/-   ##
============================================
- Coverage     74.53%   74.35%   -0.18%     
============================================
  Files           130      130              
  Lines         72731    72752      +21     
============================================
- Hits          54208    54096     -112     
- Misses        18523    18656     +133     
Files with missing lines Coverage Δ
src/multi.c 97.90% <100.00%> (+0.92%) ⬆️
src/server.c 89.49% <ø> (-0.06%) ⬇️
src/server.h 100.00% <ø> (ø)

... and 27 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin enjoy-binbin added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Mar 13, 2026
@madolson

Copy link
Copy Markdown
Member

This could use the hashtable interface instead of dict to avoid the extra dictEntry allocation (24 bytes) per watched key. The watchedKey* can be stored directly as the hashtable entry since it already contains the key:

static const void *watchedKeyGetKey(const void *entry) {
    const watchedKey *wk = entry;
    return wk->key;
}

hashtableType watchedKeysHashtableType = {
    .hashFunction = dictEncObjHash,
    .entryGetKey = watchedKeyGetKey,
    .keyCompare = hashtableEncObjKeyCompare,
};

Then hashtableFind(ht, key, NULL) / hashtableAdd(ht, wk) replace the dict calls, and the watchedKeysDictType in server.c can be removed.

Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin enjoy-binbin changed the title Optimize WATCH duplicate key check from O(N) to O(1) using per-db dict Optimize WATCH duplicate key check from O(N) to O(1) using per-db hashtable Mar 17, 2026
Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin enjoy-binbin merged commit 9586093 into valkey-io:unstable Mar 31, 2026
66 checks passed
@github-project-automation github-project-automation Bot moved this to To be backported in Valkey 9.1 Mar 31, 2026
@enjoy-binbin enjoy-binbin deleted the watch branch March 31, 2026 02:34
Nikhil-Manglore pushed a commit to Nikhil-Manglore/valkey that referenced this pull request Apr 7, 2026
…htable (valkey-io#3360)

Previously, watchForKey() checked for duplicate watched keys by iterating
through the client's entire watched_keys list with O(N) complexity, where
N is the total number of keys watched by the client. So the time complexity
for the WATCH command could be quite poor and become a slow command.

This commit introduces a per-db hashtable (watched_keys_by_db) in the
client's multiState structure to enable O(1) duplicate key detection.
The hashtable is lazily allocated only when the client starts watching
keys, minimizing memory overhead for clients that don't use WATCH.

The per-db hashtable stores watchedKey* directly as the hashtable entry
since it already contains the key, so no custom destructors are needed.
Memory management remains centralized in the watched_keys list.

This optimization is especially beneficial when a client watches many
keys across different databases, as the check no longer scales with
the total watched key count.

This might be a minor scenario, but there's no harm in optimizing it.

There is a test in multi.tcl, before this patch, it took 15s, and after
this patch, it only took 50ms.
```
        set elements {}
        for {set i 0} {$i < 50000} {incr i} {
            lappend elements key-$i
        }
        r watch {*}$elements
        r watch {*}$elements
```

Signed-off-by: Binbin <binloveplay1314@qq.com>
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request Apr 16, 2026
…htable (valkey-io#3360)

Previously, watchForKey() checked for duplicate watched keys by iterating
through the client's entire watched_keys list with O(N) complexity, where
N is the total number of keys watched by the client. So the time complexity
for the WATCH command could be quite poor and become a slow command.

This commit introduces a per-db hashtable (watched_keys_by_db) in the
client's multiState structure to enable O(1) duplicate key detection.
The hashtable is lazily allocated only when the client starts watching
keys, minimizing memory overhead for clients that don't use WATCH.

The per-db hashtable stores watchedKey* directly as the hashtable entry
since it already contains the key, so no custom destructors are needed.
Memory management remains centralized in the watched_keys list.

This optimization is especially beneficial when a client watches many
keys across different databases, as the check no longer scales with
the total watched key count.

This might be a minor scenario, but there's no harm in optimizing it.

There is a test in multi.tcl, before this patch, it took 15s, and after
this patch, it only took 50ms.
```
        set elements {}
        for {set i 0} {$i < 50000} {incr i} {
            lappend elements key-$i
        }
        r watch {*}$elements
        r watch {*}$elements
```

Signed-off-by: Binbin <binloveplay1314@qq.com>
madolson pushed a commit that referenced this pull request Apr 27, 2026
…htable (#3360)

Previously, watchForKey() checked for duplicate watched keys by iterating
through the client's entire watched_keys list with O(N) complexity, where
N is the total number of keys watched by the client. So the time complexity
for the WATCH command could be quite poor and become a slow command.

This commit introduces a per-db hashtable (watched_keys_by_db) in the
client's multiState structure to enable O(1) duplicate key detection.
The hashtable is lazily allocated only when the client starts watching
keys, minimizing memory overhead for clients that don't use WATCH.

The per-db hashtable stores watchedKey* directly as the hashtable entry
since it already contains the key, so no custom destructors are needed.
Memory management remains centralized in the watched_keys list.

This optimization is especially beneficial when a client watches many
keys across different databases, as the check no longer scales with
the total watched key count.

This might be a minor scenario, but there's no harm in optimizing it.

There is a test in multi.tcl, before this patch, it took 15s, and after
this patch, it only took 50ms.
```
        set elements {}
        for {set i 0} {$i < 50000} {incr i} {
            lappend elements key-$i
        }
        r watch {*}$elements
        r watch {*}$elements
```

Signed-off-by: Binbin <binloveplay1314@qq.com>
@sarthakaggarwal97 sarthakaggarwal97 added the release-notes This issue should get a line item in the release notes label Apr 28, 2026
@sarthakaggarwal97 sarthakaggarwal97 moved this from To be backported to Done in Valkey 9.1 May 16, 2026
roshkhatri pushed a commit to roshkhatri/valkey that referenced this pull request May 26, 2026
…htable (valkey-io#3360)

Previously, watchForKey() checked for duplicate watched keys by iterating
through the client's entire watched_keys list with O(N) complexity, where
N is the total number of keys watched by the client. So the time complexity
for the WATCH command could be quite poor and become a slow command.

This commit introduces a per-db hashtable (watched_keys_by_db) in the
client's multiState structure to enable O(1) duplicate key detection.
The hashtable is lazily allocated only when the client starts watching
keys, minimizing memory overhead for clients that don't use WATCH.

The per-db hashtable stores watchedKey* directly as the hashtable entry
since it already contains the key, so no custom destructors are needed.
Memory management remains centralized in the watched_keys list.

This optimization is especially beneficial when a client watches many
keys across different databases, as the check no longer scales with
the total watched key count.

This might be a minor scenario, but there's no harm in optimizing it.

There is a test in multi.tcl, before this patch, it took 15s, and after
this patch, it only took 50ms.
```
        set elements {}
        for {set i 0} {$i < 50000} {incr i} {
            lappend elements key-$i
        }
        r watch {*}$elements
        r watch {*}$elements
```

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes This issue should get a line item in the release notes run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants