Optimize sdscatrepr by batch processing printable characters by RayaCoo · Pull Request #1342 · valkey-io/valkey

RayaCoo · 2024-11-22T05:36:41Z

#11725 optimize sdscatrepr by reducing realloc calls, furthermore, we can reduce memcpy calls by batch processing of consecutive printable characters.

Signed-off-by: Ray Cao <zisong.cw@alibaba.com>

madolson · 2024-11-22T05:47:05Z

I wonder if we should do something like #1230, which removes all the reallocation and just does a single initial allocation. Checking through the code, this doesn't seem to be anywhere on the hot path though, so that is probably over optimizing.

codecov · 2024-11-22T05:51:45Z

Codecov Report

Attention: Patch coverage is 61.11111% with 7 lines in your changes missing coverage. Please review.

Project coverage is 70.79%. Comparing base (377ed22) to head (8b8b7fe).
Report is 7 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/sds.c	61.11%	7 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #1342      +/-   ##
============================================
+ Coverage     70.77%   70.79%   +0.01%     
============================================
  Files           116      116              
  Lines         63280    63285       +5     
============================================
+ Hits          44787    44802      +15     
+ Misses        18493    18483      -10

Files with missing lines	Coverage Δ
src/sds.c	`86.78% <61.11%> (-0.23%)`	⬇️

... and 9 files with indirect coverage changes

---- 🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests
JS Bundle Analysis - Avoid shipping oversized bundles

RayaCoo · 2024-11-22T06:56:27Z

As @artikell said in #11725, If there is a monitor, replicationFeedMonitors will
call sdscatrepr frequently. So I use valkey-benchmark --threads 2 -P 10 -r 100 -n 10000000 getset __rand_int__ __rand_int__ to finish the test as @artikell did before.

unstable (no monitor)

Summary:
  throughput summary: 714081.69 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.654     0.112     0.687     0.847     0.879     3.247

unstable (1 monitor)

Summary:
  throughput summary: 332967.06 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        1.449     0.384     1.647     1.855     1.927     4.263

PR (no monitor)

Summary:
  throughput summary: 714030.69 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.645     0.128     0.671     0.839     0.871     2.839

PR (1 monitor)

Summary:
  throughput summary: 395116.38 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        1.221     0.248     1.367     1.575     1.631     3.735

The test results show about an 18% improvement with one monitor.

RayaCoo · 2024-11-22T09:01:48Z

I wonder if we should do something like #1230, which removes all the reallocation and just does a single initial allocation. Checking through the code, this doesn't seem to be anywhere on the hot path though, so that is probably over optimizing.

@madolson
First, I don't think sdscatrepr can "just does a single initial allocation" , as we don't know the final length of the SDS in advance unless we first iterate through p to calculate it. what #11725 does is pre-allocate memory to avoid frequent reallocations.
Secondly, sdscatrepr is frequently called in our product for auditing purposes. I have also seen some community discussions about audit-related issues, like Improved Redis auditing support, If we consider extending the monitor command in the future, sdscatrepr might become a hot path.
Additionally, I believe the implementation of sdscatrepr is not efficient enough. Printable characters typically appear consecutively, and processing one printable character at a time with frequent calls to memcpy is unnecessary.

madolson · 2024-11-22T16:28:49Z

Additionally, I believe the implementation of sdscatrepr is not efficient enough. Printable characters typically appear consecutively, and processing one printable character at a time with frequent calls to memcpy is unnecessary.

I agree with optimizing the code, I was wondering we should spend more effort to optimize is than just what you are proposing in this PR.

First, I don't think sdscatrepr can "just does a single initial allocation" , as we don't know the final length of the SDS in advance unless we first iterate through p to calculate it. what redis/redis#11725 does is pre-allocate memory to avoid frequent reallocations.

Yeah, I was suggesting that we could do two passes. Two passes seems like it would be slower, but the work I did with sdscat showed that it was still much faster both because sdscat* functions are slow and because if we have to realloc that is also really slow. If you aren't interesting in trying any alternatives, I think this is worth merging as is though. (I'll take a look at the actual code shortly)

soloestoy · 2024-11-25T07:26:27Z

Currently, in sdscatrepr, we first use sdsMakeRoomFor to pre-allocate memory. sdsMakeRoomFor is greedy by default, I think that in most scenarios, there is already enough redundant space, and there's no need to use realloc again.

And in my understanding, the use cases for sdscatrepr are usually to generate temporary data that won't persist long-term in the server. Or do you think it's necessary for us to use sdsMakeRoomForNonGreedy instead of sdsMakeRoomFor? At that point, getting the exact length is essential.

madolson · 2024-11-25T15:12:23Z

And in my understanding, the use cases for sdscatrepr are usually to generate temporary data that won't persist long-term in the server. Or do you think it's necessary for us to use sdsMakeRoomForNonGreedy instead of sdsMakeRoomFor? At that point, getting the exact length is essential.

I'm not as concerned about extra memory usage. I was more interested if we could improve the performance if we avoided sds functions, which tend to be a bit slow compared to direct character manipulation. I'm fine with the PR as is though.

…io#1342) Optimize sdscatrepr by reducing realloc calls, furthermore, we can reduce memcpy calls by batch processing of consecutive printable characters. Signed-off-by: Ray Cao <zisong.cw@alibaba.com> Co-authored-by: Ray Cao <zisong.cw@alibaba.com> Signed-off-by: vudiep411 <vdiep@amazon.com>

Fixes #2035, a bug introduced in #1342 --------- Signed-off-by: Fusl <fusl@meo.ws> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>

…2036) Fixes valkey-io#2035, a bug introduced in valkey-io#1342 --------- Signed-off-by: Fusl <fusl@meo.ws> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>

…2036) Fixes valkey-io#2035, a bug introduced in valkey-io#1342 --------- Signed-off-by: Fusl <fusl@meo.ws> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

Fixes #2035, a bug introduced in #1342 --------- Signed-off-by: Fusl <fusl@meo.ws> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>

Optimize sdscatrepr by batch processing printable characters

8b8b7fe

Signed-off-by: Ray Cao <zisong.cw@alibaba.com>

madolson approved these changes Nov 25, 2024

View reviewed changes

madolson merged commit cf1a1e0 into valkey-io:unstable Nov 25, 2024

madolson added performance release-notes This issue should get a line item in the release notes labels Nov 25, 2024

RayaCoo deleted the optim-sdscatrepr-func branch January 13, 2025 14:09

Fusl mentioned this pull request May 1, 2025

Properly escape double quotes and backslash in sdscatrepr #2036

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize sdscatrepr by batch processing printable characters#1342

Optimize sdscatrepr by batch processing printable characters#1342
madolson merged 1 commit into
valkey-io:unstablefrom
RayaCoo:optim-sdscatrepr-func

RayaCoo commented Nov 22, 2024

Uh oh!

madolson commented Nov 22, 2024

Uh oh!

codecov Bot commented Nov 22, 2024 •

edited

Loading

Uh oh!

RayaCoo commented Nov 22, 2024

Uh oh!

RayaCoo commented Nov 22, 2024 •

edited

Loading

Uh oh!

madolson commented Nov 22, 2024

Uh oh!

soloestoy commented Nov 25, 2024

Uh oh!

madolson commented Nov 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

RayaCoo commented Nov 22, 2024

Uh oh!

madolson commented Nov 22, 2024

Uh oh!

codecov Bot commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

RayaCoo commented Nov 22, 2024

unstable (no monitor)

unstable (1 monitor)

PR (no monitor)

PR (1 monitor)

Uh oh!

RayaCoo commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

madolson commented Nov 22, 2024

Uh oh!

soloestoy commented Nov 25, 2024

Uh oh!

madolson commented Nov 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Nov 22, 2024 •

edited

Loading

RayaCoo commented Nov 22, 2024 •

edited

Loading