Skip to content

Conversation

@enjoy-binbin
Copy link
Contributor

@enjoy-binbin enjoy-binbin commented Feb 20, 2024

Currently (following #11695, and #12822), keys kvstore and expires
kvstore both flag with ON_DEMAND, it means that a cluster node will
only allocate a dict when the slot is assigned to it and populated,
but on the other hand, when the slot is unassigned, the dict will
remain allocated.

We considered releasing the dict when the slot is unassigned, but it
causes complications on replicas. On the other hand, from benchmarks
we conducted, it looks like the performance impact of releasing the
dict when it becomes empty and re-allocate it when a key is added
again, isn't huge.

This PR add KVSTORE_FREE_EMPTY_DICTS to cluster mode keys / expires
kvstore.

The impact is about about 2% performance drop, for this hopefully
uncommon scenario.

Currently keys kvstore and expires kvstore both flag with ON_DEMAND,
it means that a cluster node will only allocate a dict when the slot
is assigned to it and populated, but on the other hand, when the slot
is unassigned, the dict will remain allocated.

This PR added a kvstore interface to explicitly release a specific
empty dict, and call it when a slot is unassigned.
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madolson please review the cluster code.

Co-authored-by: Oran Agra <oran@redislabs.com>
Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually looking at how weird this code looks, I would probably be in favor of just marking the main store as KVSTORE_FREE_EMPTY_DICTS. I see in the previous CR @oranagra proposed but on the other hand, we'd rather not de-allocate a dict every time it becomes empty, and re-allocate it when a key is added again.. I would rather keep a simpler API that just handles both cases without explicit API calls. It seems like not that likely of a case you would be using cluster with so few keys. That was one of the core assumptions we said were were making when we implemented per-slot dictionaries anyways.

@oranagra
Copy link
Member

I would rather keep a simpler API that just handles both cases without explicit API calls.

i don't understand what you mean by that.

It seems like not that likely of a case you would be using cluster with so few keys. That was one of the core assumptions we said were were making when we implemented per-slot dictionaries anyways.

when we implemented the per-slot dicts we made that compromise based on the reasoning that the memory benefit for clusters with dense with keys outweighs the memory damage it does for for sparsely populated ones (with just a few keys in each slot, or many empty slots).
i.e. we had an idea and we could either accept it or reject it, and we had to accept it's downsides (arguing they're less important than the upsides).
i'm not sure we should base any further decisions on that.

besides, that we didn't even bother to allocate only the dicts of the assigned slots, so a cluster with many shards (regardless of how many keys it has per slot) will suffer a completely unnecessary memory hit, and that's something we later "solved" with KVSTORE_ALLOCATE_DICTS_ON_DEMAND, but it was only partially solved since it'll be messed up by resharding, which is what this PR comes to solve.

i'd argue that KVSTORE_ALLOCATE_DICTS_ON_DEMAND was the wrong (or lazy) approach, instead of leaving assigned lots without a dict until a key is added, we could have allocated dicts for all the assigned slots right away.
this isn't that bad as long as we don't keep releasing and re-allocating them when they become empty / non-empty.
i.e. in addition to memory overheads, this would also add performance overheads.

i'd rather just keep the approach in this PR, aiming to have dicts allocated for all assigned slots (even when empty), and free them when they're unassigned.

@madolson
Copy link
Contributor

madolson commented Mar 1, 2024

i don't understand what you mean by that.

I don't like that we have to explicitly indicate to the underlying datastore to cleanup a dict at a given index. It should just handle empty dictionaries.

i'd argue that KVSTORE_ALLOCATE_DICTS_ON_DEMAND was the wrong (or lazy) approach, instead of leaving assigned lots without a dict until a key is added, we could have allocated dicts for all the assigned slots right away.

I don't agree with this. I would prefer kvstore handle cleanup if it wants to, but I don't think it should be "told" to do the cleanup. Your point isn't self-consistent anyways, as there could be an assigned slot (that was assigned during migration) that is populated lazily instead of on creation.

i'd rather just keep the approach in this PR, aiming to have dicts allocated for all assigned slots (even when empty), and free them when they're unassigned.

The current PR is wrong, since replicas won't be freeing the slots correctly. Replicas don't strictly follow their primaries, since replication can lag so they might still have data in a slot even though their primaries have cleared the slot, so we might "delete it" then recreate it. This change also adds more complexity to keep track of for the new and old cluster implementations. The code is in cluster_legacy, which means cluster v2 will need it's own change and API as well to indicate a slot is empty. I think this adds a lot of complexity which I don't think has been justified.

@oranagra
Copy link
Member

oranagra commented Mar 2, 2024

@madolson so what do you suggest? just add KVSTORE_FREE_EMPTY_DICTS? i don't like it.
i'm aiming to achieve two things:

  1. don't waste memory on dicts that are always gonna be empty (specifically considering the cluster topology)
  2. avoid releasing and re-allocating the dict if it becomes empty and then non-empty again rapidly.

@madolson
Copy link
Contributor

madolson commented Mar 5, 2024

so what do you suggest? just add KVSTORE_FREE_EMPTY_DICTS? i don't like it.

Yes, this is my vote.

don't waste memory on dicts that are always gonna be empty (specifically considering the cluster topology)

I agree.

avoid releasing and re-allocating the dict if it becomes empty and then non-empty again rapidly.

@oranagra I don't care about this. I don't know why you care about it though. Maybe we discuss it in our sync up?

@oranagra
Copy link
Member

oranagra commented Mar 5, 2024

avoid releasing and re-allocating the dict if it becomes empty and then non-empty again rapidly.

@oranagra I don't care about this. I don't know why you care about it though. Maybe we discuss it in our sync up?

maybe we can conduct a quick benchmark to see what's the worse case effect of such a change?

@enjoy-binbin
Copy link
Contributor Author

enjoy-binbin commented Mar 5, 2024

maybe we can conduct a quick benchmark to see what's the worse case effect of such a change?

./src/redis-benchmark -n 1000000 -P 32 eval "redis.call('incr', KEYS[1]) redis.call('del', KEYS[1])" 1 key

unstable:

Summary:
  throughput summary: 277700.62 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        5.646     0.808     5.951     6.367     7.087    10.479

Summary:
  throughput summary: 279095.72 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        5.598     0.800     5.911     6.247     6.407    12.255

Summary:
  throughput summary: 281928.41 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        5.560     0.808     5.879     6.191     6.359     7.207

adding KVSTORE_FREE_EMPTY_DICTS::

Summary:
  throughput summary: 263365.81 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        5.942     2.616     6.247     6.647     6.903     9.879

Summary:
  throughput summary: 266951.41 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        5.881     0.792     6.231     6.551     6.719    11.887

Summary:
  throughput summary: 268240.34 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        5.851     0.840     6.199     6.511     6.639     7.095

@oranagra
Copy link
Member

oranagra commented Mar 5, 2024

that looks quite surprising to me, so just to be sure we're not jumping to the wrong conclusions,

  1. did redis get to nearly 100% CPU consumption? i.e. there's no bottleneck elsewhere.
  2. maybe it's something around Lua overheads being even greater than the dict allocation and release overheads?

to check the second point we can either modify redis-benchmark to send the sequence of commands directly (without Lua), or maybe change the script to perform multiple iterations of insertion and deletions inside each call.

@enjoy-binbin
Copy link
Contributor Author

yes, redis-server indeed using 100% of the CPU, ohh i see my -P pipeline is wrong, i updated the comment.

./src/redis-benchmark -n 100000 -P 32 eval "for i=1,100 do redis.call('incr', KEYS[1]) redis.call('del', KEYS[1]) end" 1 key

unstable:

Summary:
  throughput summary: 8092.58 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
      196.618     4.048   199.807   208.767   223.359   234.239

Summary:
  throughput summary: 8097.17 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
      196.505     4.088   199.679   211.839   214.271   228.223

Summary:
  throughput summary: 8216.93 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
      193.654     4.016   198.399   202.623   205.951   229.887

adding KVSTORE_FREE_EMPTY_DICTS:

Summary:
  throughput summary: 7493.44 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
      212.393     4.664   218.367   224.511   233.343   253.951

Summary:
  throughput summary: 7537.50 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
      211.079     4.344   216.319   221.823   223.999   258.303

Summary:
  throughput summary: 7534.09 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
      211.144     4.384   215.679   223.487   224.639   256.255

@oranagra
Copy link
Member

oranagra commented Mar 5, 2024

i still find it hard to explain, INCR and DEL (for a plain string) are so slim, that usually every small overhead causes a noticeable regression.
maybe someone can point me to some fact i'm missing? or the overhead of releasing and re-allocating the dict struct and it's LUT isn't that significant?

@madolson
Copy link
Contributor

madolson commented Mar 7, 2024

i still find it hard to explain, INCR and DEL (for a plain string) are so slim, that usually every small overhead causes a noticeable regression.

Should we run a perf breakdown (or generate flamegraph) to breakdown where all the time is really being spent?

@sundb
Copy link
Collaborator

sundb commented Mar 8, 2024

use modified version (incr+del) of benchmark.
time is mostly spent on the dictRelease in dbGenericDelete() and dictCreate in kvstoreDictAddRaw().

marking the main store as KVSTORE_FREE_EMPTY_DICTS
flamegraph_bad

this PR:
flamegraph_good

@oranagra
Copy link
Member

oranagra commented Mar 10, 2024

so when you modified redis-benchmark you also weren't able to see any significant performance drop between the two approaches?
what was the pipeline argument you used?

@sundb
Copy link
Collaborator

sundb commented Mar 11, 2024

so when you modified redis-benchmark you also weren't able to see any significant performance drop between the two approaches? what was the pipeline argument you used?

yes, the result is very close to binbin, i use the same -P 32 as binbin.

@oranagra
Copy link
Member

oranagra commented Mar 11, 2024

ok, so let's switch to KVSTORE_FREE_EMPTY_DICTS (maybe only in cluster-mode?)

@enjoy-binbin enjoy-binbin changed the title Free the empty dicts when a slot is unassigned Add KVSTORE_FREE_EMPTY_DICTS to cluster mode keys / expires kvstore Mar 11, 2024
@enjoy-binbin
Copy link
Contributor Author

ok, so let's switch to KVSTORE_FREE_EMPTY_DICTS (maybe only in cluster-mode?)

ok, now i am change to use KVSTORE_FREE_EMPTY_DICTS. I haven't followed up on comments very carefully before, so please let me know if i had missed anything. thanks to sundb for helping me analyze the benchmark result (i was too busy to follow up at that time)

@oranagra
Copy link
Member

i was so busy looking for a more significant performance drop (expecting some 30%), i didn't bother to calculate the measured one.
it seems it's about 8%, is that right?

@enjoy-binbin
Copy link
Contributor Author

yes, i do a new test, it is 6%-8%

./src/redis-benchmark -p 30001 -n 1000000 -P 32 eval "redis.call('incr', KEYS[1]) redis.call('del', KEYS[1])" 1 {b}__rand_int__

unstable:

Summary:
  throughput summary: 157010.52 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
       10.055     0.928    10.815    11.639    12.839    24.911

Summary:
  throughput summary: 157282.17 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
       10.051     1.016    10.895    11.591    13.991    23.007

Summary:
  throughput summary: 157903.05 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
       10.017     1.904    10.671    11.343    13.407    42.111

this PR:

Summary:
  throughput summary: 145624.00 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
       10.860     0.936    11.495    13.087    20.751    37.567

Summary:
  throughput summary: 146520.16 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
       10.789     2.288    11.439    12.111    19.935    53.631

Summary:
  throughput summary: 147275.41 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
       10.733     0.976    11.551    12.279    14.303    24.735

@sundb
Copy link
Collaborator

sundb commented Mar 11, 2024

@oranagra i need to update my benchmark report, the previous benchmark change was something wrong.

redis-benchmark -n 50000000 -P 32 incr k

changes of benchmark.c

diff --git a/src/redis-benchmark.c b/src/redis-benchmark.c
index 05d054de1..83b2e23b9 100644
--- a/src/redis-benchmark.c
+++ b/src/redis-benchmark.c
@@ -744,8 +744,13 @@ static client createClient(char *cmd, size_t len, client from, int thread_id) {
             from->obuf+from->prefixlen,
             sdslen(from->obuf)-from->prefixlen);
     } else {
-        for (j = 0; j < config.pipeline; j++)
+        char *buf = NULL;
+        int len1 = redisFormatCommand(&buf, "del k");
+        for (j = 0; j < config.pipeline / 2; j++) {          <-  forget to /2
             c->obuf = sdscatlen(c->obuf,cmd,len);
+            c->obuf = sdscatlen(c->obuf, buf, len1);
+        }
+        free(buf);
     }

benchmarks based on 9a3748e (#13072)
almost a 2.x% performance loss

without KVSTORE_FREE_EMPTY_DICTS for keys

Summary:
  throughput summary: 2911208.25 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.480     0.160     0.455     0.655     0.735     1.175

with KVSTORE_FREE_EMPTY_DICTS for keys

Summary:
  throughput summary: 2846894.00 requests per second
  latency summary (msec):
          avg       min       p50       p95       p99       max
        0.496     0.144     0.471     0.671     0.743     1.791

@oranagra
Copy link
Member

do you mean that KVSTORE_FREE_EMPTY_DICTS is 2% faster?
that's a little bit odd

@sundb
Copy link
Collaborator

sundb commented Mar 11, 2024

@oranagra sorry, i wrote them backwards.

@oranagra
Copy link
Member

ok. 2% is better than 8%

Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the 2% degradation, this approach still makes much more sense to me since this shouldn't be a common case for Redis cluster.

@oranagra
Copy link
Member

oranagra commented Mar 13, 2024

this was discussed and approved in a core-team meeting. we're ok with the 2% degradation on a rapid deletion and insertion of the only key in a slot.

@oranagra oranagra merged commit 3b3d16f into redis:unstable Mar 13, 2024
@enjoy-binbin enjoy-binbin deleted the free_empty_dict branch March 13, 2024 06:33
@enjoy-binbin
Copy link
Contributor Author

there is a use-after-free error in expire scan should skip dictionaries with lot's of empty buckets test:

==5002==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000004bb0 at pc 0x5562b393a49e bp 0x7fff2f7a41c0 sp 0x7fff2f7a41b0
READ of size 2 at 0x606000004bb0 thread T0
    #0 0x5562b393a49d in dictScanDefrag /home/runner/work/redis/redis/src/dict.c:1447
    #1 0x5562b3949eed in kvstoreScan /home/runner/work/redis/redis/src/kvstore.c:393
    #2 0x5562b3c0d338 in activeExpireCycle /home/runner/work/redis/redis/src/expire.c:298
    #3 0x5562b39537c7 in databasesCron /home/runner/work/redis/redis/src/server.c:1060
    #4 0x5562b3962711 in serverCron /home/runner/work/redis/redis/src/server.c:1387
    #5 0x5562b3930b1d in processTimeEvents /home/runner/work/redis/redis/src/ae.c:333
    #6 0x5562b3933ac9 in aeProcessEvents /home/runner/work/redis/redis/src/ae.c:468
    #7 0x5562b3933ac9 in aeMain /home/runner/work/redis/redis/src/ae.c:498
    #8 0x5562b390b2ed in main /home/runner/work/redis/redis/src/server.c:7189
    #9 0x7f3457629d8f  (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f)
    #10 0x7f3457629e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f)
    #11 0x5562b390d374 in _start (/home/runner/work/redis/redis/src/redis-server+0x130374)

0x606000004bb0 is located 48 bytes inside of 64-byte region [0x606000004b80,0x606000004bc0)
freed by thread T0 here:
    #0 0x7f3457ab4537 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
    #1 0x5562b394beef in freeDictIfNeeded /home/runner/work/redis/redis/src/kvstore.c:175
    #2 0x5562b394beef in kvstoreDictDelete /home/runner/work/redis/redis/src/kvstore.c:831
    #3 0x5562b3a101cd in dbGenericDelete /home/runner/work/redis/redis/src/db.c:410
    #4 0x5562b3a16b08 in deleteExpiredKeyAndPropagate /home/runner/work/redis/redis/src/db.c:1730
    #5 0x5562b3c0cb49 in activeExpireCycleTryExpire /home/runner/work/redis/redis/src/expire.c:64
    #6 0x5562b3c0cd8a in activeExpireCycleTryExpire /home/runner/work/redis/redis/src/expire.c:144
    #7 0x5562b3c0cd8a in expireScanCallback /home/runner/work/redis/redis/src/expire.c:133
    #8 0x5562b393a259 in dictScanDefrag /home/runner/work/redis/redis/src/dict.c:1386
    #9 0x5562b3949eed in kvstoreScan /home/runner/work/redis/redis/src/kvstore.c:393
    #10 0x5562b3c0d338 in activeExpireCycle /home/runner/work/redis/redis/src/expire.c:298
    #11 0x5562b39537c7 in databasesCron /home/runner/work/redis/redis/src/server.c:1060
    #12 0x5562b3962711 in serverCron /home/runner/work/redis/redis/src/server.c:1387
    #13 0x5562b3930b1d in processTimeEvents /home/runner/work/redis/redis/src/ae.c:333
    #14 0x5562b3933ac9 in aeProcessEvents /home/runner/work/redis/redis/src/ae.c:468
    #15 0x5562b3933ac9 in aeMain /home/runner/work/redis/redis/src/ae.c:498
    #16 0x5562b390b2ed in main /home/runner/work/redis/redis/src/server.c:7189
    #17 0x7f3457629d8f  (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f)

previously allocated by thread T0 here:
    #0 0x7f3457ab4887 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x5562b399181e in ztrymalloc_usable_internal /home/runner/work/redis/redis/src/zmalloc.c:116
    #2 0x5562b399181e in zmalloc /home/runner/work/redis/redis/src/zmalloc.c:145
    #3 0x5562b393a5d8 in dictCreate /home/runner/work/redis/redis/src/dict.c:189
    #4 0x5562b394b9f5 in createDictIfNeeded /home/runner/work/redis/redis/src/kvstore.c:165
    #5 0x5562b394b9f5 in kvstoreDictAddRaw /home/runner/work/redis/redis/src/kvstore.c:793
    #6 0x5562b3a[1627](https://github.com/sundb/redis/actions/runs/8261449270/job/22598780580#step:6:1628)d in setExpire /home/runner/work/redis/redis/src/db.c:1703
    #7 0x5562b3a64e2d in setGenericCommand /home/runner/work/redis/redis/src/t_string.c:117
    #8 0x5562b3a64e2d in psetexCommand /home/runner/work/redis/redis/src/t_string.c:320
    #9 0x5562b395c481 in call /home/runner/work/redis/redis/src/server.c:3565
    #10 0x5562b395ee17 in processCommand /home/runner/work/redis/redis/src/server.c:4195
    #11 0x5562b39e473f in processCommandAndResetClient /home/runner/work/redis/redis/src/networking.c:2489
    #12 0x5562b39e473f in processInputBuffer /home/runner/work/redis/redis/src/networking.c:2597
    #13 0x5562b39e64df in readQueryFromClient /home/runner/work/redis/redis/src/networking.c:2743
    #14 0x5562b3c913b0 in callHandler /home/runner/work/redis/redis/src/connhelpers.h:79
    #15 0x5562b3c913b0 in connSocketEventHandler /home/runner/work/redis/redis/src/socket.c:298
    #16 0x5562b39337e1 in aeProcessEvents /home/runner/work/redis/redis/src/ae.c:438
    #17 0x5562b39337e1 in aeMain /home/runner/work/redis/redis/src/ae.c:498
    #18 0x5562b390b2ed in main /home/runner/work/redis/redis/src/server.c:7189
    #19 0x7f3457629d8f  (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f)

the reason is that in expireScanCallback, we will delete the dict, and then in dictScan we will continue to use the dict, like we will doing dictResumeRehashing(d); in the end, this casued a use-after-free.

here is one way by checking the scan_cb return: b8213b9

not sure if this diff is worth it, just a quick try

@oranagra
Copy link
Member

oranagra commented Mar 13, 2024

how about relying on the fact dictScan calls dictPauseRehashing?
i.e. if pauserehash is set, don't delete the dict yet, and then when scan returns try to delete it again.
or implement a new protection flag in kvstore for that instead of relying on the one scan sets in dict...

enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Mar 13, 2024
After redis#13072, there is an use-after-free error. In expireScanCallback, we
will delete the dict, and then in dictScan we will continue to use the dict,
like we will doing `dictResumeRehashing(d)` in the end, this casued an error.

In this PR, in freeDictIfNeeded, if the dict's pauserehash is set, don't
delete the dict yet, and then when scan returns try to delete it again.
@enjoy-binbin
Copy link
Contributor Author

It feels like it would be better to add a new protection flag to kvstore, but it seems a bit overkill at this time. about the pauserehash, the diff is small, and we can do this easily in kvstoreScan, so i went this way for now, #13135

@enjoy-binbin
Copy link
Contributor Author

btw, i now realize that i forgot to make the same change in initTempDb and such, i will handle it all together in #13135

oranagra pushed a commit that referenced this pull request Mar 18, 2024
…to respect EMPTY flag (#13135)

After #13072, there is an use-after-free error. In expireScanCallback, we
will delete the dict, and then in dictScan we will continue to use the dict,
like we will doing `dictResumeRehashing(d)` in the end, this casued an error.

In this PR, in freeDictIfNeeded, if the dict's pauserehash is set, don't
delete the dict yet, and then when scan returns try to delete it again.

At the same time, we noticed that there will be similar problems in iterator.
We may also delete elements during the iteration process, causing the dict
to be deleted, so the part related to iter in the PR has also been modified.
dictResetIterator was also missing from the previous kvstoreIteratorNextDict,
we currently have no scenario that elements will be deleted in kvstoreIterator
process, deal with it together to avoid future problems. Added some simple
tests to verify the changes.

In addition, the modification in #13072 omitted initTempDb and emptyDbAsync,
and they were also added. This PR also remove the slow flag from the expire
test (consumes 1.3s) so that problems can be found in CI in the future.
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
…edis#13072)

Currently (following redis#11695, and redis#12822), keys kvstore and expires
kvstore both flag with ON_DEMAND, it means that a cluster node will
only allocate a dict when the slot is assigned to it and populated,
but on the other hand, when the slot is unassigned, the dict will
remain allocated.

We considered releasing the dict when the slot is unassigned, but it
causes complications on replicas. On the other hand, from benchmarks
we conducted, it looks like the performance impact of releasing the
dict when it becomes empty and re-allocate it when a key is added
again, isn't huge.

This PR add KVSTORE_FREE_EMPTY_DICTS to cluster mode keys / expires
kvstore.

The impact is about about 2% performance drop, for this hopefully
uncommon scenario.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
…to respect EMPTY flag (redis#13135)

After redis#13072, there is an use-after-free error. In expireScanCallback, we
will delete the dict, and then in dictScan we will continue to use the dict,
like we will doing `dictResumeRehashing(d)` in the end, this casued an error.

In this PR, in freeDictIfNeeded, if the dict's pauserehash is set, don't
delete the dict yet, and then when scan returns try to delete it again.

At the same time, we noticed that there will be similar problems in iterator.
We may also delete elements during the iteration process, causing the dict
to be deleted, so the part related to iter in the PR has also been modified.
dictResetIterator was also missing from the previous kvstoreIteratorNextDict,
we currently have no scenario that elements will be deleted in kvstoreIterator
process, deal with it together to avoid future problems. Added some simple
tests to verify the changes.

In addition, the modification in redis#13072 omitted initTempDb and emptyDbAsync,
and they were also added. This PR also remove the slow flag from the expire
test (consumes 1.3s) so that problems can be found in CI in the future.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants