Skip to content

large ammount of getClientType() within the reply code is costing us around 3% of cpu cycles on LRANGE command (or any command that heavily relies on partial reply building) #10648

@filipecosta90

Description

@filipecosta90

since #9166 fix due to #10020 we've largely increased the amount of times we call getClientType() on a single command. LRANGE is a good example because we call getClientType() for all reply elements.

As we can see below, this is visible even on small reply sizes -- on a 10 elements reply we drop from 640K ops/sec on v6.2.6 to 626K ops/sec on unstable, which is a drop we can directly connect to the increase of CPU cycles spent on getClientType() -- ~3%.

@ShooterIT is there some logic/#calls we can reduce to avoid this extra CPU consumption?
PS: this was caught via https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-1key-list-10-elements-lrange-all-elements.yml

confirmation with profile data

v6.2.6 profile
image

unstable (8192625)
image

benchmark info

To populate the data:

"RPUSH" "list:10" "lysbgqqfqw" "mtccjerdon" "jekkafodvk" "nmgxcctxpn" "vyqqkuszzh" "pytrnqdhvs" "oguwnmniig" "gekntrykfh" "nhfnbxqgol" "cgoeihlnei"

To benchmark:

quick check on an LRANGE command reply with a 10 elements list:

v6.2.6

root@hpe10:~/redis# taskset -c 1-5 memtier_benchmark --command="LRANGE list:10 0 -1"  --hide-histogram --test-time 60 --pipeline 10
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%,  60 secs]  0 threads:    38447210 ops,  641149 (avg:  640765) ops/sec, 133.91MB/sec (avg: 133.83MB/sec),  3.11 (avg:  3.11) msec latency

4         Threads
50        Connections per thread
60        Seconds


ALL STATS
==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
--------------------------------------------------------------------------------------------------
Lranges    640765.27         3.11013         3.34300         4.25500         4.54300    137038.67 
Totals     640765.27         3.11013         3.34300         4.25500         4.54300    137038.67 

unstable

root@hpe10:~/redis# taskset -c 1-5 memtier_benchmark --command="LRANGE list:10 0 -1"  --hide-histogram --test-time 60 --pipeline 10
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 100%,  60 secs]  0 threads:    37609260 ops,  627934 (avg:  626798) ops/sec, 131.15MB/sec (avg: 130.91MB/sec),  3.17 (avg:  3.18) msec latency

4         Threads
50        Connections per thread
60        Seconds


ALL STATS
==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
--------------------------------------------------------------------------------------------------
Lranges    626801.72         3.17964         3.42300         4.35100         4.60700    134052.32 
Totals     626801.72         3.17964         3.42300         4.35100         4.60700    134052.32 

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions