-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Moving client flags to a more cache friendly position within client struct #10697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
oranagra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@filipecosta90 maybe we wanna use this opportunity to also check what happens when we move the reply buffers and query buffers to the top?
will check it. 👍 |
|
How about set the option |
good idea.
I think this way also can effectively reduce the call number of |
@uvletter if we get the follow counters while doing the benchmark we can observe that indeed the improvement comes from a reducal of stall cycles when compared to the total uops executed :
As you can check below, the stall cycles drop is where it matters the most ( stalls_mem_any ). Furthermore we now also execute more uops as expected ( given we're reducing the expensive waits on memory ). full details bellow: unstable 2bcd890 (merge-base ) this PR: |
|
@filipecosta90 are you gonna try some other combinations (like moving the reply/response buffer variables to the top too)? |
@oranagra I'll confirm with perf perf LOC that this is the best scenario. Give me until tomorrow EOD to confirm if this is already an ideal scenario |
|
i didn't mean to move the flags somewhere else, i mean to take querybuf, qb_pos, querybuf_peak, reply, buf, bufpos, buf_usable_size, etc all to the top |
|
@oranagra Following your suggestion re moving stuff around in the client struct I decided to run Cachegrind on the specific |
|
I used cachegrind with the specified benchmark above and saw the following:
I did some dirty modifications to the code both to reorder the fields in the client struct and also use stack based iterators and ran the benchmark locally. Then in cachegrind I couldn't find much more to improve. Running the benchmark again showed consistently better results after the optimization, although they are pretty mild: 3-4%. @filipecosta90 Maybe you'd like to try the patch below and run the benchmark on your setup? I fear maintaining the client struct cache friendly will be a pain, whatever optimization we do now will just break when we change the struct. I'm not sure @oranagra WDYT? Regarding using stack based iterators I think we should transtion this in all the code, we can use what's in the patch below as a guidline. There are many places internally in quicklist.c that use dynamic allocations for the iterators that need to be changed. https://gist.github.com/yoav-steinberg/c881868efa58098fa472a901cd28566c |
|
i think we have no choice but periodically re-optimize the cache efficiency of the client struct. |
|
i'm merging this as a step forward, keeping #10460 open for now and posting a reference to the above comment by yoav. |
…truct (redis#10697) Move the client flags to a more cache friendly position within the client struct we regain the lost 2% of CPU cycles since v6.2 ( from 630532.57 to 647449.80 ops/sec ). These are due to higher rate of calls to getClientType due to changes in redis#9166 and redis#10020
Fixes #10648 .
Following @oranagra 's recommendation to move the client flags to a more cache friendly position within the client struct we indeed regain the lost 2% of CPU cycles since v6.2 ( from 630532.57 to 647449.80 ops/sec ).
to reproduce:
unstable 2bcd890 (merge-base )
this PR