-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Add KVSTORE_FREE_EMPTY_DICTS to cluster mode keys / expires kvstore #13072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Currently keys kvstore and expires kvstore both flag with ON_DEMAND, it means that a cluster node will only allocate a dict when the slot is assigned to it and populated, but on the other hand, when the slot is unassigned, the dict will remain allocated. This PR added a kvstore interface to explicitly release a specific empty dict, and call it when a slot is unassigned.
oranagra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@madolson please review the cluster code.
Co-authored-by: Oran Agra <oran@redislabs.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually looking at how weird this code looks, I would probably be in favor of just marking the main store as KVSTORE_FREE_EMPTY_DICTS. I see in the previous CR @oranagra proposed but on the other hand, we'd rather not de-allocate a dict every time it becomes empty, and re-allocate it when a key is added again.. I would rather keep a simpler API that just handles both cases without explicit API calls. It seems like not that likely of a case you would be using cluster with so few keys. That was one of the core assumptions we said were were making when we implemented per-slot dictionaries anyways.
i don't understand what you mean by that.
when we implemented the per-slot dicts we made that compromise based on the reasoning that the memory benefit for clusters with dense with keys outweighs the memory damage it does for for sparsely populated ones (with just a few keys in each slot, or many empty slots). besides, that we didn't even bother to allocate only the dicts of the assigned slots, so a cluster with many shards (regardless of how many keys it has per slot) will suffer a completely unnecessary memory hit, and that's something we later "solved" with KVSTORE_ALLOCATE_DICTS_ON_DEMAND, but it was only partially solved since it'll be messed up by resharding, which is what this PR comes to solve. i'd argue that KVSTORE_ALLOCATE_DICTS_ON_DEMAND was the wrong (or lazy) approach, instead of leaving assigned lots without a dict until a key is added, we could have allocated dicts for all the assigned slots right away. i'd rather just keep the approach in this PR, aiming to have dicts allocated for all assigned slots (even when empty), and free them when they're unassigned. |
I don't like that we have to explicitly indicate to the underlying datastore to cleanup a dict at a given index. It should just handle empty dictionaries.
I don't agree with this. I would prefer
The current PR is wrong, since replicas won't be freeing the slots correctly. Replicas don't strictly follow their primaries, since replication can lag so they might still have data in a slot even though their primaries have cleared the slot, so we might "delete it" then recreate it. This change also adds more complexity to keep track of for the new and old cluster implementations. The code is in cluster_legacy, which means cluster v2 will need it's own change and API as well to indicate a slot is empty. I think this adds a lot of complexity which I don't think has been justified. |
|
@madolson so what do you suggest? just add KVSTORE_FREE_EMPTY_DICTS? i don't like it.
|
Yes, this is my vote.
I agree.
@oranagra I don't care about this. I don't know why you care about it though. Maybe we discuss it in our sync up? |
maybe we can conduct a quick benchmark to see what's the worse case effect of such a change? |
unstable: adding KVSTORE_FREE_EMPTY_DICTS:: |
|
that looks quite surprising to me, so just to be sure we're not jumping to the wrong conclusions,
to check the second point we can either modify redis-benchmark to send the sequence of commands directly (without Lua), or maybe change the script to perform multiple iterations of insertion and deletions inside each call. |
|
yes, redis-server indeed using 100% of the CPU, ohh i see my -P pipeline is wrong, i updated the comment.
unstable: adding KVSTORE_FREE_EMPTY_DICTS: |
|
i still find it hard to explain, INCR and DEL (for a plain string) are so slim, that usually every small overhead causes a noticeable regression. |
Should we run a perf breakdown (or generate flamegraph) to breakdown where all the time is really being spent? |
|
so when you modified redis-benchmark you also weren't able to see any significant performance drop between the two approaches? |
yes, the result is very close to binbin, i use the same |
|
ok, so let's switch to KVSTORE_FREE_EMPTY_DICTS (maybe only in cluster-mode?) |
ok, now i am change to use KVSTORE_FREE_EMPTY_DICTS. I haven't followed up on comments very carefully before, so please let me know if i had missed anything. thanks to sundb for helping me analyze the benchmark result (i was too busy to follow up at that time) |
|
i was so busy looking for a more significant performance drop (expecting some 30%), i didn't bother to calculate the measured one. |
|
yes, i do a new test, it is 6%-8%
unstable: this PR: |
|
@oranagra i need to update my benchmark report, the previous benchmark change was something wrong. changes of benchmark.c diff --git a/src/redis-benchmark.c b/src/redis-benchmark.c
index 05d054de1..83b2e23b9 100644
--- a/src/redis-benchmark.c
+++ b/src/redis-benchmark.c
@@ -744,8 +744,13 @@ static client createClient(char *cmd, size_t len, client from, int thread_id) {
from->obuf+from->prefixlen,
sdslen(from->obuf)-from->prefixlen);
} else {
- for (j = 0; j < config.pipeline; j++)
+ char *buf = NULL;
+ int len1 = redisFormatCommand(&buf, "del k");
+ for (j = 0; j < config.pipeline / 2; j++) { <- forget to /2
c->obuf = sdscatlen(c->obuf,cmd,len);
+ c->obuf = sdscatlen(c->obuf, buf, len1);
+ }
+ free(buf);
}benchmarks based on without KVSTORE_FREE_EMPTY_DICTS for keys with KVSTORE_FREE_EMPTY_DICTS for keys |
|
do you mean that KVSTORE_FREE_EMPTY_DICTS is 2% faster? |
|
@oranagra sorry, i wrote them backwards. |
|
ok. 2% is better than 8% |
madolson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the 2% degradation, this approach still makes much more sense to me since this shouldn't be a common case for Redis cluster.
|
this was discussed and approved in a core-team meeting. we're ok with the 2% degradation on a rapid deletion and insertion of the only key in a slot. |
|
there is a use-after-free error in the reason is that in expireScanCallback, we will delete the dict, and then in dictScan we will continue to use the dict, like we will doing here is one way by checking the scan_cb return: b8213b9 not sure if this diff is worth it, just a quick try |
|
how about relying on the fact dictScan calls dictPauseRehashing? |
After redis#13072, there is an use-after-free error. In expireScanCallback, we will delete the dict, and then in dictScan we will continue to use the dict, like we will doing `dictResumeRehashing(d)` in the end, this casued an error. In this PR, in freeDictIfNeeded, if the dict's pauserehash is set, don't delete the dict yet, and then when scan returns try to delete it again.
|
It feels like it would be better to add a new protection flag to kvstore, but it seems a bit overkill at this time. about the |
|
btw, i now realize that i forgot to make the same change in initTempDb and such, i will handle it all together in #13135 |
…to respect EMPTY flag (#13135) After #13072, there is an use-after-free error. In expireScanCallback, we will delete the dict, and then in dictScan we will continue to use the dict, like we will doing `dictResumeRehashing(d)` in the end, this casued an error. In this PR, in freeDictIfNeeded, if the dict's pauserehash is set, don't delete the dict yet, and then when scan returns try to delete it again. At the same time, we noticed that there will be similar problems in iterator. We may also delete elements during the iteration process, causing the dict to be deleted, so the part related to iter in the PR has also been modified. dictResetIterator was also missing from the previous kvstoreIteratorNextDict, we currently have no scenario that elements will be deleted in kvstoreIterator process, deal with it together to avoid future problems. Added some simple tests to verify the changes. In addition, the modification in #13072 omitted initTempDb and emptyDbAsync, and they were also added. This PR also remove the slow flag from the expire test (consumes 1.3s) so that problems can be found in CI in the future.
…edis#13072) Currently (following redis#11695, and redis#12822), keys kvstore and expires kvstore both flag with ON_DEMAND, it means that a cluster node will only allocate a dict when the slot is assigned to it and populated, but on the other hand, when the slot is unassigned, the dict will remain allocated. We considered releasing the dict when the slot is unassigned, but it causes complications on replicas. On the other hand, from benchmarks we conducted, it looks like the performance impact of releasing the dict when it becomes empty and re-allocate it when a key is added again, isn't huge. This PR add KVSTORE_FREE_EMPTY_DICTS to cluster mode keys / expires kvstore. The impact is about about 2% performance drop, for this hopefully uncommon scenario. --------- Co-authored-by: Oran Agra <oran@redislabs.com>
…to respect EMPTY flag (redis#13135) After redis#13072, there is an use-after-free error. In expireScanCallback, we will delete the dict, and then in dictScan we will continue to use the dict, like we will doing `dictResumeRehashing(d)` in the end, this casued an error. In this PR, in freeDictIfNeeded, if the dict's pauserehash is set, don't delete the dict yet, and then when scan returns try to delete it again. At the same time, we noticed that there will be similar problems in iterator. We may also delete elements during the iteration process, causing the dict to be deleted, so the part related to iter in the PR has also been modified. dictResetIterator was also missing from the previous kvstoreIteratorNextDict, we currently have no scenario that elements will be deleted in kvstoreIterator process, deal with it together to avoid future problems. Added some simple tests to verify the changes. In addition, the modification in redis#13072 omitted initTempDb and emptyDbAsync, and they were also added. This PR also remove the slow flag from the expire test (consumes 1.3s) so that problems can be found in CI in the future.
Currently (following #11695, and #12822), keys kvstore and expires
kvstore both flag with ON_DEMAND, it means that a cluster node will
only allocate a dict when the slot is assigned to it and populated,
but on the other hand, when the slot is unassigned, the dict will
remain allocated.
We considered releasing the dict when the slot is unassigned, but it
causes complications on replicas. On the other hand, from benchmarks
we conducted, it looks like the performance impact of releasing the
dict when it becomes empty and re-allocate it when a key is added
again, isn't huge.
This PR add KVSTORE_FREE_EMPTY_DICTS to cluster mode keys / expires
kvstore.
The impact is about about 2% performance drop, for this hopefully
uncommon scenario.