Skip to content

Optimize sdscatrepr by batch processing printable characters#1342

Merged
madolson merged 1 commit into
valkey-io:unstablefrom
RayaCoo:optim-sdscatrepr-func
Nov 25, 2024
Merged

Optimize sdscatrepr by batch processing printable characters#1342
madolson merged 1 commit into
valkey-io:unstablefrom
RayaCoo:optim-sdscatrepr-func

Conversation

@RayaCoo

@RayaCoo RayaCoo commented Nov 22, 2024

Copy link
Copy Markdown
Contributor

#11725 optimize sdscatrepr by reducing realloc calls, furthermore, we can reduce memcpy calls by batch processing of consecutive printable characters.

Signed-off-by: Ray Cao <zisong.cw@alibaba.com>
@madolson

Copy link
Copy Markdown
Member

I wonder if we should do something like #1230, which removes all the reallocation and just does a single initial allocation. Checking through the code, this doesn't seem to be anywhere on the hot path though, so that is probably over optimizing.

@codecov

codecov Bot commented Nov 22, 2024

Copy link
Copy Markdown

Codecov Report

Attention: Patch coverage is 61.11111% with 7 lines in your changes missing coverage. Please review.

Project coverage is 70.79%. Comparing base (377ed22) to head (8b8b7fe).
Report is 7 commits behind head on unstable.

Files with missing lines Patch % Lines
src/sds.c 61.11% 7 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1342      +/-   ##
============================================
+ Coverage     70.77%   70.79%   +0.01%     
============================================
  Files           116      116              
  Lines         63280    63285       +5     
============================================
+ Hits          44787    44802      +15     
+ Misses        18493    18483      -10     
Files with missing lines Coverage Δ
src/sds.c 86.78% <61.11%> (-0.23%) ⬇️

... and 9 files with indirect coverage changes

---- 🚨 Try these New Features:

@RayaCoo

RayaCoo commented Nov 22, 2024

Copy link
Copy Markdown
Contributor Author

As @artikell said in #11725, If there is a monitor, replicationFeedMonitors will
call sdscatrepr frequently. So I use valkey-benchmark --threads 2 -P 10 -r 100 -n 10000000 getset __rand_int__ __rand_int__ to finish the test as @artikell did before.

unstable (no monitor)

Summary:
  throughput summary: 714081.69 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.654     0.112     0.687     0.847     0.879     3.247

unstable (1 monitor)

Summary:
  throughput summary: 332967.06 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        1.449     0.384     1.647     1.855     1.927     4.263

PR (no monitor)

Summary:
  throughput summary: 714030.69 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.645     0.128     0.671     0.839     0.871     2.839

PR (1 monitor)

Summary:
  throughput summary: 395116.38 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        1.221     0.248     1.367     1.575     1.631     3.735
  • The test results show about an 18% improvement with one monitor.

@RayaCoo

RayaCoo commented Nov 22, 2024

Copy link
Copy Markdown
Contributor Author

I wonder if we should do something like #1230, which removes all the reallocation and just does a single initial allocation. Checking through the code, this doesn't seem to be anywhere on the hot path though, so that is probably over optimizing.

@madolson
First, I don't think sdscatrepr can "just does a single initial allocation" , as we don't know the final length of the SDS in advance unless we first iterate through p to calculate it. what #11725 does is pre-allocate memory to avoid frequent reallocations.
Secondly, sdscatrepr is frequently called in our product for auditing purposes. I have also seen some community discussions about audit-related issues, like Improved Redis auditing support, If we consider extending the monitor command in the future, sdscatrepr might become a hot path.
Additionally, I believe the implementation of sdscatrepr is not efficient enough. Printable characters typically appear consecutively, and processing one printable character at a time with frequent calls to memcpy is unnecessary.

@madolson

Copy link
Copy Markdown
Member

Additionally, I believe the implementation of sdscatrepr is not efficient enough. Printable characters typically appear consecutively, and processing one printable character at a time with frequent calls to memcpy is unnecessary.

I agree with optimizing the code, I was wondering we should spend more effort to optimize is than just what you are proposing in this PR.

First, I don't think sdscatrepr can "just does a single initial allocation" , as we don't know the final length of the SDS in advance unless we first iterate through p to calculate it. what redis/redis#11725 does is pre-allocate memory to avoid frequent reallocations.

Yeah, I was suggesting that we could do two passes. Two passes seems like it would be slower, but the work I did with sdscat showed that it was still much faster both because sdscat* functions are slow and because if we have to realloc that is also really slow. If you aren't interesting in trying any alternatives, I think this is worth merging as is though. (I'll take a look at the actual code shortly)

@soloestoy

Copy link
Copy Markdown
Member

Currently, in sdscatrepr, we first use sdsMakeRoomFor to pre-allocate memory. sdsMakeRoomFor is greedy by default, I think that in most scenarios, there is already enough redundant space, and there's no need to use realloc again.

And in my understanding, the use cases for sdscatrepr are usually to generate temporary data that won't persist long-term in the server. Or do you think it's necessary for us to use sdsMakeRoomForNonGreedy instead of sdsMakeRoomFor? At that point, getting the exact length is essential.

@madolson

Copy link
Copy Markdown
Member

And in my understanding, the use cases for sdscatrepr are usually to generate temporary data that won't persist long-term in the server. Or do you think it's necessary for us to use sdsMakeRoomForNonGreedy instead of sdsMakeRoomFor? At that point, getting the exact length is essential.

I'm not as concerned about extra memory usage. I was more interested if we could improve the performance if we avoided sds functions, which tend to be a bit slow compared to direct character manipulation. I'm fine with the PR as is though.

@madolson madolson merged commit cf1a1e0 into valkey-io:unstable Nov 25, 2024
@madolson madolson added performance release-notes This issue should get a line item in the release notes labels Nov 25, 2024
vudiep411 pushed a commit to vudiep411/valkey that referenced this pull request Nov 26, 2024
…io#1342)

Optimize sdscatrepr by reducing realloc calls, furthermore, we can reduce memcpy calls by
batch processing of consecutive printable characters.

Signed-off-by: Ray Cao <zisong.cw@alibaba.com>
Co-authored-by: Ray Cao <zisong.cw@alibaba.com>
Signed-off-by: vudiep411 <vdiep@amazon.com>
vudiep411 pushed a commit to vudiep411/valkey that referenced this pull request Nov 26, 2024
…io#1342)

Optimize sdscatrepr by reducing realloc calls, furthermore, we can reduce memcpy calls by
batch processing of consecutive printable characters.

Signed-off-by: Ray Cao <zisong.cw@alibaba.com>
Co-authored-by: Ray Cao <zisong.cw@alibaba.com>
Signed-off-by: vudiep411 <vdiep@amazon.com>
vudiep411 pushed a commit to vudiep411/valkey that referenced this pull request Nov 26, 2024
…io#1342)

Optimize sdscatrepr by reducing realloc calls, furthermore, we can reduce memcpy calls by
batch processing of consecutive printable characters.

Signed-off-by: Ray Cao <zisong.cw@alibaba.com>
Co-authored-by: Ray Cao <zisong.cw@alibaba.com>
Signed-off-by: vudiep411 <vdiep@amazon.com>
@RayaCoo RayaCoo deleted the optim-sdscatrepr-func branch January 13, 2025 14:09
zuiderkwast pushed a commit that referenced this pull request May 2, 2025
Fixes #2035, a bug introduced
in #1342

---------

Signed-off-by: Fusl <fusl@meo.ws>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
rainsupreme pushed a commit to rainsupreme/valkey that referenced this pull request May 14, 2025
…2036)

Fixes valkey-io#2035, a bug introduced
in valkey-io#1342

---------

Signed-off-by: Fusl <fusl@meo.ws>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
hpatro pushed a commit to hpatro/valkey that referenced this pull request Jun 4, 2025
…2036)

Fixes valkey-io#2035, a bug introduced
in valkey-io#1342

---------

Signed-off-by: Fusl <fusl@meo.ws>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
hpatro pushed a commit to hpatro/valkey that referenced this pull request Jun 4, 2025
…2036)

Fixes valkey-io#2035, a bug introduced
in valkey-io#1342

---------

Signed-off-by: Fusl <fusl@meo.ws>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
hpatro pushed a commit that referenced this pull request Jun 9, 2025
Fixes #2035, a bug introduced
in #1342

---------

Signed-off-by: Fusl <fusl@meo.ws>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
hpatro pushed a commit that referenced this pull request Jun 11, 2025
Fixes #2035, a bug introduced
in #1342

---------

Signed-off-by: Fusl <fusl@meo.ws>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance release-notes This issue should get a line item in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants