Skip to content

Add hotkeys detection#14680

Merged
minchopaskal merged 39 commits intoredis:unstablefrom
minchopaskal:hotkeys-detection
Jan 16, 2026
Merged

Add hotkeys detection#14680
minchopaskal merged 39 commits intoredis:unstablefrom
minchopaskal:hotkeys-detection

Conversation

@minchopaskal
Copy link
Copy Markdown
Collaborator

@minchopaskal minchopaskal commented Jan 9, 2026

Description

Introducing a new method for identifying hotkeys inside a redis server during a tracking time period.

Hotkeys in this context are defined by two metrics:

  • Percentage of time spend by cpu on the key from the total time during the tracking period
  • Percentage of network bytes (input+output) used for the key from the total network bytes used by redis during the tracking period

Usage

Although the API is subject to change the general idea is for the user to initiate a hotkeys tracking process which should run for some time. The keys' metrics are recorded inside a probabilistic structure and after that the user is able to fetch the top K of them.

Current API

HOTKEYS START
            <METRICS count [CPU] [NET]>
            [COUNT k] 
            [DURATION duration]
            [SAMPLE ratio]
            [SLOTS count slot…]

HOTKEYS GET
HOTKEYS STOP
HOTKEYS RESET

HOTKEYS START

Start a tracking session if either no is already started, or one was stopped or reset. Return error if one is in progress.

  • METRICS count [CPU] [NET] - chose one or more metrics to track
  • COUNT k - track top K keys
  • DURATION duration - preset how long the tracking session should last
  • SAMPLE ratio - a key is tracked with probability 1/ratio
  • SLOTS count slot... - Only track a key if it's in a slot amongst the chosen ones

HOTKEYS GET

Return array of the chosen metrics to track and various other metadata. (nil) if no tracking was started or it was reset.

127.0.0.1:6379> hotkeys get
1) "tracking-active"
2) 1
3) "sample-ratio"
4) <ratio>
5) "selected-slots" (empty array if no slots selected)
6) 1) 0
   2) 5
   3) 6
7) "sampled-command-selected-slots-ms" (show on condition sample-ratio > 1 and selected-slots != empty-array)
8) <time-in-milliseconds>
9) "all-commands-selected-slots-ms" (show on condition selected-slots != empty-array)
10) <time-in-milliseconds>
11) "all-commands-all-slots-ms"
12) <time-in-milliseconds>
13) "net-bytes-sampled-commands-selected-slots" (show on condition sample-ratio > 1 and selected-slots != empty-array)
14) <num-bytes>
15) "net-bytes-all-commands-selected-slots" (show on condition selected-slots != empty-array)
16) <num-bytes>
17) "net-bytes-all-commands-all-slots"
18) <num-bytes>
19) "collection-start-time-unix-ms"
20) <start-time-unix-timestamp-in-ms>
21) "collection-duration-ms"
22) <duration-in-milliseconds>
23) "used-cpu-sys-ms"
24) <duration-in-millisec>
25) "used-cpu-user-ms"
26) <duration-in-millisec>
27) "total-net-bytes"
28) <num-bytes>
29) "by-cpu-time"
30) 1) key-1_1
    2) <millisec>
    ...
    19) key-10_1
    20) <millisec>
31) 1) "by-net-bytes"
32) 1) key-1_2
    2) <num-bytes>
    ...
    19) key-10_2
    20) <num-bytes>

HOTKEYS STOP

Stop tracking session but user can still get results from HOTKEYS GET.

HOTKEYS RESET

Release resources used for hotkeys tracking only when it is stopped. Return error if a tracking is active.

Additional changes

The INFO command now has a "hotkeys" section with 3 fields

  • tracking_active - a boolean flag indicating whether or not we currently track hotkeys.
  • used-memory - memory overhead of the structures used for hotkeys tracking.
  • cpu-time - time in ms spend updating the hotkey structure.

Implementation

Independent of API, implementation is based on a probabilistic structure - Cuckoo Heavy Keeper structure with added min-heap to keep track of top K hotkey's names. CHK is an loosely based on HeavyKeeper which is used in RedisBloom's TopK but has higher throughput.

Random fixed probability sampling via the HOTKEYS start sample <ratio> param. Each key is sampled with probability 1/ratio.

Performance implications

With low enough sample rate (controlled by HOTKEYS start sample <ratio>) there is negligible performance hit. Tracking every key though can incur up to 15% hit in the worst case after running the tests in this bench.

Copy link
Copy Markdown

@shahsb shahsb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scalability in Clusters: How does this behave in Redis Cluster? Slot filtering helps, but aggregating across nodes might need future work.

Comparison to Alternatives: How does CHK compare to other sketches like Count-Min or HyperLogLog in this context? A brief rationale would strengthen the case.

@tezc
Copy link
Copy Markdown
Collaborator

tezc commented Jan 10, 2026

@minchopaskal I had a quick pass over the PR and left a few minor comments. I assume the API is not final yet so we don't have more details about the commands in the top comment. e.g. what is SLOTS or SAMPLE. I assume we also don't have more tests because of this.

Regarding cuckoo impl, maybe it is a good idea to add some comments over each function (only to important ones) to describe what it is doing. So, people who have no idea about the algorithm will be a chance to understand what is going on.

@sundb
Copy link
Copy Markdown
Collaborator

sundb commented Jan 13, 2026

I'm thinking about if we can add a new hotkey.c file and put both the chk.c and hotkey commands in that file. It doesn't seem appropriate to put them in server.c. server.c already has over 7000 lines and is too large.

@jit-ci
Copy link
Copy Markdown

jit-ci bot commented Jan 16, 2026

❌ Security scan failed

Security scan failed: Branch hotkeys-detection does not exist in the remote repository


💡 Need to bypass this check? Comment @sera bypass to override.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

@jit-ci
Copy link
Copy Markdown

jit-ci bot commented Jan 16, 2026

❌ Security scan failed

Security scan failed: Branch hotkeys-detection does not exist in the remote repository


💡 Need to bypass this check? Comment @sera bypass to override.

@jit-ci
Copy link
Copy Markdown

jit-ci bot commented Jan 16, 2026

❌ Security scan failed

Security scan failed: Branch hotkeys-detection does not exist in the remote repository


💡 Need to bypass this check? Comment @sera bypass to override.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

@minchopaskal minchopaskal added state:needs-doc-pr requires a PR to redis-doc repository state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten release-notes indication that this issue needs to be mentioned in the release notes labels Jan 16, 2026
@jit-ci
Copy link
Copy Markdown

jit-ci bot commented Jan 16, 2026

❌ Security scan failed

Security scan failed: Branch hotkeys-detection does not exist in the remote repository


💡 Need to bypass this check? Comment @sera bypass to override.

@minchopaskal minchopaskal merged commit c93e4a6 into redis:unstable Jan 16, 2026
19 checks passed
Copy link
Copy Markdown
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the delay. a few suggestions for INFO

Comment on lines +6647 to +6649
"tracking-active:%d\r\n"
"used-memory:%zu\r\n"
"cpu-time:%lld\r\n",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the fact these in a "hotkeys" section isn't enough, they should have some prefix.
the sections are just used to filter / ask specific info sections.
but the result is that all fields are then mixed in one dict.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if the used-memory should be moved to the memory section.
in any case, it should probably be accounted for in the used_memory_overhead metric.
but being able to see a breakdown there (if it can be big) might be a good idea.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be big only if for some reason key names are really big as we store the top-K's key names in a heap. Do you think we expect that?

I will add it to used_memory_overhead and fix the oversight of the missing prefixes. A bit reserverd about adding it in the memory section though - I think it should be either in "hotkeys" or in "memory" section (not both) and "hotkeys" seems more logical - WDYT, @oranagra

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obviously not in both.
the question is what's the trigger for someone wanting to see the memory overhead of this mechanism. is it that they're looking for hot keys, or that they're looking to explain memory isssues.
e.g. the mem_replication_backlog metric is in memory, not in replication, same goes for mem_aof_buffer.
if we argue that it's very unlikely that it'll be very big, we can just sum it into the overhead, and skip creating a specific metric for it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. I'll remove the metric then - I don't see a scenario in which overhead becomes more than 1 MB.

@sundb sundb removed the state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten label Jan 19, 2026
minchopaskal added a commit that referenced this pull request Jan 20, 2026
Add the memory overhead of the hotkeyStats structure to
`used_memory_overhead`, add `hotkeys-` prefix to hotkey keys in INFO and
remove `used_memory` in the hotkeys info section as it's unneeded (too
little memory for us to care about).

Tnx @oranagra for pointing
[this](#14680 (comment))
out.
fcostaoliveira pushed a commit to filipecosta90/redis that referenced this pull request Jan 21, 2026
…s#14711)

Add the memory overhead of the hotkeyStats structure to
`used_memory_overhead`, add `hotkeys-` prefix to hotkey keys in INFO and
remove `used_memory` in the hotkeys info section as it's unneeded (too
little memory for us to care about).

Tnx @oranagra for pointing
[this](redis#14680 (comment))
out.
minchopaskal added a commit that referenced this pull request Jan 29, 2026
…14749)

Follow #14680
Reply of `HOTKEYS GET` is an unordered collection of key-value pairs. It
is more reasonable to be a map in resp3 instead of flat array.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes state:needs-doc-pr requires a PR to redis-doc repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants