Add hotkeys detection by minchopaskal · Pull Request #14680 · redis/redis

minchopaskal · 2026-01-09T09:21:11Z

Description

Introducing a new method for identifying hotkeys inside a redis server during a tracking time period.

Hotkeys in this context are defined by two metrics:

Percentage of time spend by cpu on the key from the total time during the tracking period
Percentage of network bytes (input+output) used for the key from the total network bytes used by redis during the tracking period

Usage

Although the API is subject to change the general idea is for the user to initiate a hotkeys tracking process which should run for some time. The keys' metrics are recorded inside a probabilistic structure and after that the user is able to fetch the top K of them.

Current API

HOTKEYS START
            <METRICS count [CPU] [NET]>
            [COUNT k] 
            [DURATION duration]
            [SAMPLE ratio]
            [SLOTS count slot…]

HOTKEYS GET
HOTKEYS STOP
HOTKEYS RESET

HOTKEYS START

Start a tracking session if either no is already started, or one was stopped or reset. Return error if one is in progress.

METRICS count [CPU] [NET] - chose one or more metrics to track
COUNT k - track top K keys
DURATION duration - preset how long the tracking session should last
SAMPLE ratio - a key is tracked with probability 1/ratio
SLOTS count slot... - Only track a key if it's in a slot amongst the chosen ones

HOTKEYS GET

Return array of the chosen metrics to track and various other metadata. (nil) if no tracking was started or it was reset.

127.0.0.1:6379> hotkeys get
1) "tracking-active"
2) 1
3) "sample-ratio"
4) <ratio>
5) "selected-slots" (empty array if no slots selected)
6) 1) 0
   2) 5
   3) 6
7) "sampled-command-selected-slots-ms" (show on condition sample-ratio > 1 and selected-slots != empty-array)
8) <time-in-milliseconds>
9) "all-commands-selected-slots-ms" (show on condition selected-slots != empty-array)
10) <time-in-milliseconds>
11) "all-commands-all-slots-ms"
12) <time-in-milliseconds>
13) "net-bytes-sampled-commands-selected-slots" (show on condition sample-ratio > 1 and selected-slots != empty-array)
14) <num-bytes>
15) "net-bytes-all-commands-selected-slots" (show on condition selected-slots != empty-array)
16) <num-bytes>
17) "net-bytes-all-commands-all-slots"
18) <num-bytes>
19) "collection-start-time-unix-ms"
20) <start-time-unix-timestamp-in-ms>
21) "collection-duration-ms"
22) <duration-in-milliseconds>
23) "used-cpu-sys-ms"
24) <duration-in-millisec>
25) "used-cpu-user-ms"
26) <duration-in-millisec>
27) "total-net-bytes"
28) <num-bytes>
29) "by-cpu-time"
30) 1) key-1_1
    2) <millisec>
    ...
    19) key-10_1
    20) <millisec>
31) 1) "by-net-bytes"
32) 1) key-1_2
    2) <num-bytes>
    ...
    19) key-10_2
    20) <num-bytes>

HOTKEYS STOP

Stop tracking session but user can still get results from HOTKEYS GET.

HOTKEYS RESET

Release resources used for hotkeys tracking only when it is stopped. Return error if a tracking is active.

Additional changes

The INFO command now has a "hotkeys" section with 3 fields

tracking_active - a boolean flag indicating whether or not we currently track hotkeys.
used-memory - memory overhead of the structures used for hotkeys tracking.
cpu-time - time in ms spend updating the hotkey structure.

Implementation

Independent of API, implementation is based on a probabilistic structure - Cuckoo Heavy Keeper structure with added min-heap to keep track of top K hotkey's names. CHK is an loosely based on HeavyKeeper which is used in RedisBloom's TopK but has higher throughput.

Random fixed probability sampling via the HOTKEYS start sample <ratio> param. Each key is sampled with probability 1/ratio.

Performance implications

With low enough sample rate (controlled by HOTKEYS start sample <ratio>) there is negligible performance hit. Tracking every key though can incur up to 15% hit in the worst case after running the tests in this bench.

src/server.c

src/chk.c

src/server.c

src/chk.c

src/server.c

src/chk.c

src/chk.h

src/commands/hotkeys-get.json

src/chk.c

src/commands.def

src/chk.c

shahsb

Scalability in Clusters: How does this behave in Redis Cluster? Slot filtering helps, but aggregating across nodes might need future work.

Comparison to Alternatives: How does CHK compare to other sketches like Count-Min or HyperLogLog in this context? A brief rationale would strengthen the case.

src/murmurhash.h

src/server.c

src/commands.def

tests/unit/hotkeys.tcl

src/chk.c

src/commands/hotkeys-get.json

src/chk.c

src/chk.h

src/server.c

tezc · 2026-01-10T13:37:06Z

@minchopaskal I had a quick pass over the PR and left a few minor comments. I assume the API is not final yet so we don't have more details about the commands in the top comment. e.g. what is SLOTS or SAMPLE. I assume we also don't have more tests because of this.

Regarding cuckoo impl, maybe it is a good idea to add some comments over each function (only to important ones) to describe what it is doing. So, people who have no idea about the algorithm will be a chance to understand what is going on.

src/server.c

src/chk.c

src/server.c

src/chk.c

src/server.c

sundb · 2026-01-13T10:00:18Z

I'm thinking about if we can add a new hotkey.c file and put both the chk.c and hotkey commands in that file. It doesn't seem appropriate to put them in server.c. server.c already has over 7000 lines and is too large.

jit-ci · 2026-01-16T13:15:21Z

❌ Security scan failed

Security scan failed: Branch hotkeys-detection does not exist in the remote repository

💡 Need to bypass this check? Comment @sera bypass to override.

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

src/chk.c

src/replication.c

src/chk.h

src/chk.c

jit-ci · 2026-01-16T13:54:46Z

❌ Security scan failed

Security scan failed: Branch hotkeys-detection does not exist in the remote repository

💡 Need to bypass this check? Comment @sera bypass to override.

jit-ci · 2026-01-16T14:02:48Z

❌ Security scan failed

Security scan failed: Branch hotkeys-detection does not exist in the remote repository

💡 Need to bypass this check? Comment @sera bypass to override.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

src/chk.c

src/server.c

jit-ci · 2026-01-16T14:55:46Z

❌ Security scan failed

Security scan failed: Branch hotkeys-detection does not exist in the remote repository

💡 Need to bypass this check? Comment @sera bypass to override.

oranagra

sorry for the delay. a few suggestions for INFO

oranagra · 2026-01-18T06:11:37Z

src/server.c

+            "tracking-active:%d\r\n"
+            "used-memory:%zu\r\n"
+            "cpu-time:%lld\r\n",


the fact these in a "hotkeys" section isn't enough, they should have some prefix.
the sections are just used to filter / ask specific info sections.
but the result is that all fields are then mixed in one dict.

i wonder if the used-memory should be moved to the memory section.
in any case, it should probably be accounted for in the used_memory_overhead metric.
but being able to see a breakdown there (if it can be big) might be a good idea.

It may be big only if for some reason key names are really big as we store the top-K's key names in a heap. Do you think we expect that?

I will add it to used_memory_overhead and fix the oversight of the missing prefixes. A bit reserverd about adding it in the memory section though - I think it should be either in "hotkeys" or in "memory" section (not both) and "hotkeys" seems more logical - WDYT, @oranagra

obviously not in both.
the question is what's the trigger for someone wanting to see the memory overhead of this mechanism. is it that they're looking for hot keys, or that they're looking to explain memory isssues.
e.g. the mem_replication_backlog metric is in memory, not in replication, same goes for mem_aof_buffer.
if we argue that it's very unlikely that it'll be very big, we can just sum it into the overhead, and skip creating a specific metric for it.

I see what you mean. I'll remove the metric then - I don't see a scenario in which overhead becomes more than 1 MB.

@oranagra

Add the memory overhead of the hotkeyStats structure to `used_memory_overhead`, add `hotkeys-` prefix to hotkey keys in INFO and remove `used_memory` in the hotkeys info section as it's unneeded (too little memory for us to care about). Tnx @oranagra for pointing [this](#14680 (comment)) out.

@oranagra

…s#14711) Add the memory overhead of the hotkeyStats structure to `used_memory_overhead`, add `hotkeys-` prefix to hotkey keys in INFO and remove `used_memory` in the hotkeys info section as it's unneeded (too little memory for us to care about). Tnx @oranagra for pointing [this](redis#14680 (comment)) out.

…14749) Follow #14680 Reply of `HOTKEYS GET` is an unordered collection of key-value pairs. It is more reasonable to be a map in resp3 instead of flat array.

minchopaskal requested review from sggeorgiev, skaslev and sundb January 9, 2026 09:23

sggeorgiev reviewed Jan 9, 2026

View reviewed changes

src/server.c Outdated Show resolved Hide resolved

src/server.c Outdated Show resolved Hide resolved

sggeorgiev reviewed Jan 9, 2026

View reviewed changes

src/chk.c Outdated Show resolved Hide resolved

sggeorgiev reviewed Jan 9, 2026

View reviewed changes

src/chk.c Outdated Show resolved Hide resolved

sggeorgiev reviewed Jan 9, 2026

View reviewed changes

src/server.c Outdated Show resolved Hide resolved

sggeorgiev reviewed Jan 9, 2026

View reviewed changes

src/server.c Outdated Show resolved Hide resolved

sggeorgiev reviewed Jan 9, 2026

View reviewed changes

src/chk.c Outdated Show resolved Hide resolved