-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Make dictEntry opaque #11465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make dictEntry opaque #11465
Conversation
oranagra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like the direction here, but i'm assistant to merge it without actually using it (knowing that the next PR that relies on it is is final stages).
also, let's revisit this after #9406 is merged.
src/db.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is replacing one ugly hack with another (accessing the dict->type pointer).
maybe just do dictGetVal and then decrRefcount?
your approach is better since it doesn't assume what's the main dict's destructor function, but the access to type is ugly.
maybe we can add a wrapper (like we have for dictFreeUnlinkedEntry)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making dict opaque would prevent accessing dict->type. Making dictType opaque would prevent accessing type->valDestructor. Those are two separate pieces of work that would be useful.
Here I focused on the minimal changes to allow making dictEntry opaque.
src/dict.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it could be that now that that's not a macro, we lose some performance benefits.
let's make some simple benchmark to try to rule that out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Perhaps with -O3 this will be inlined? We could check the assembly...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We compile with -flto (link-time optimization), which turns on optimizations such as inlining across compilation units, so this function is most likely inlined.
I tried adding an assert(0) in if to see the stack trace. The function dictSetKey was not part of the stack trace, meaning it was inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't think we should try to micro optimize this. We did some brief internal testing at AWS, and found that moving away from #defining sometimes improved performance as we got better branch prediction. Let's just trust the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, adding exlpicit inline keyword like inline void dictSetNext() to those short functions should do good to performance.
Shall we do as this or just make inline a compile option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine with me. So far, the only inline we have are static inline. Not sure how much the compilers care about inline. Does anyone know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will stand by my earlier statement, let's just trust the compiler to be doing inlining. We have O3 and flto enabled. However, I would also be okay with moving these to the header file and do static inline of them. They are all so short, it will probably help with readability to have them in the header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@madolson If we move them to the header file, we can't make the struct opaque.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent point, sort of defeats the purpose of the whole endeavor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree we want to leave that to the compiler.. just thought it's a good idea to make a minimal effort to benchmark that, so that we don't find a regression later on and start looking for the cause.
hopefully no action will be needed, but if there is, we can discuss it.
That's exactly why I had this lying around for a year without opening a PR. The branch I never finished was about encoding something in the dictEntry pointer, in this case a 'has next' flag, so we can allocate 8 bytes less for the dictEntry if it doesn't have a This PR enables encoding various stuff in the pointer, since nobody can access the fields directly. The pointer manipulation can be plugged into the accessor functions. |
Recently, I was thinking of mixing the |
src/dict.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't think we should try to micro optimize this. We did some brief internal testing at AWS, and found that moving away from #defining sometimes improved performance as we got better branch prediction. Let's just trust the compiler.
@judeng Sure, I think it's possible. If an entry has both a next pointer and an expire time, the expire time would have to be stored somewhere else? Another difficulty is how to structure this nicely. Expire time only applies to the db main dict. The dict structure is not aware of this feature. Dicts are used in many places (hash, set, zset, the expire dict and many other places in redis) and every dict can have next entry, but only some dicts have expire time. (Yet another thing is LRU which now is on the redisObject.) |
A few scenarios I can think of:
I haven't thought about whether other dicts should do similar optimizations, and the benefits may not be obvious. In my user scenario, redis is mostly used as a cache, and all of the keys have expiration time, which brings a lot of memory and performance overhead.Considering that there are few bucket collisions in db, reusing the next pointer can save 20% memory and performance boost |
oranagra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WOW.. ~500 lines, that came out much bigger than i expected.
i still think we should keep it. not because it's a cleanup, but because it actually solves a problem for which we didn't have a solution (incrementing the defragged in a few cases).
but since it's so big, i'd like to introduce it in a separate commit, which means i'm not gonna squash-merge this PR, and instead i'll ask to squash other commits in this PR into groups according to topic.
src/defrag.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we could actually just use the dict * as privdata and cut a few more lines. Similar for other scan data. Should I ...?
WOW.. ~500 lines, that came out much bigger than i expected.
Actually this commit is 3 files changed, 118 insertions(+), 175 deletions(-). But you thought it's "some 30 lines" 😆 I think the code got much cleaner without passing around the counter and I don't think we should avoid making simplifications just for the git blame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i only counted the lines doing the increments, and not the ones passing the pointer, and function declarations.
apparently when looking at a commit, git GH line stats still show the whole RP.
not sure we have to proceed cleaning the struct. maybe we'll need them someday soon. your call.
either way since we have several different topics in this PR, each with quite a lot of changed lines, i'd like to separate that change from others, so anyone reading the diff can focus on the important things (like the other changes we did in defrag.c and dict.c)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but it can be hard to reorder some of the commits since they depend on each other, especially when I've merged master.
Perhaps I can factor out this defrag hits cleanup and put it first, then opaque dict after that, so two commits in total?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can probably do some git squashing and rebasing effort when that PR is ready.
the other alternative is to start a new branch, and copy changes from the old one in stages, but that's gonna take some work to find them.
note that i don't think we have to make sure each commit stands for itself and compiles.
i want it mainly for the purpose of code review (now and in the future), so it's easier to focus on important changes by topic without running into a huge refactoring diff.
taking it into a separate PR is also a possibility. but maybe there would still be other things in this PR that are worth splitting?
FYI, some examples where we did such things (on bigger projects): #9320, #9780, #7807
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I squashed them down to 6 commits. I think this is ready, unless you have other comments.
I guess you don't want to merge it until there is use for it, so I'll start working on some kind dict improvement based on this. :-)
344f236 to
2795515
Compare
src/defrag.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stat_active_defrag_hits may not be accurate. You can see function defragSet in defrag.c
long defragSet(redisDb *db, dictEntry *kde) {
long defragged = 0;
robj *ob = dictGetVal(kde);
dict *d, *newd;
serverAssert(ob->type == OBJ_SET && ob->encoding == OBJ_ENCODING_HT);
d = ob->ptr;
if (dictSize(d) > server.active_defrag_max_scan_fields)
defragLater(db, kde);
else
defragged += activeDefragSdsDict(d, DEFRAG_SDS_DICT_NO_VAL);
/* handle the dict struct */
if ((newd = activeDefragAlloc(ob->ptr)))
defragged++, ob->ptr = newd;
/* defrag the dict tables */
defragged += dictDefragTables(ob->ptr);
return defragged;
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
p.s. note that the main purpose of that field is not just for statistics, but also for defrag loop itself (having a sense how how much work was done, and when it's time to call mstime and check the exit conditions).
madolson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now, I guess we're just waiting on the perf to validate?
|
@madolson I can't see any regression. The label-triggered benchmark results are here: https://benchmarksredisio.grafana.net/d/1fWbtb7nz/experimental-oss-spec-benchmarks. I think Oran is waiting for another PR actually making use of this before merging. |
|
so what's the next step? |
I'm experimenting with some ideas ATM. Multiple dictEntries in one allocation (bulk chaining), dictEntries without next, encoding stuff in the LSB of the bucket pointer...
This seems hard to do. I don't know about the API and how to fallback to a regular dictEntry. Possibly as a pluggable bucket implementation... (bucket callbacks in the dictType)
Yes, probably this. When there are no values and no hash conflict, the key pointer can be stored directly in the hashtable without a dictEntry, saving 24 bytes per set element.
Sounds good! 😄 |
it may mean that the interface isn't abstracted enough. i.e. making dictEntry opaque still means we need to return something that's unique but not specifically allocated for the caller (since we didn't require the caller to release it). but where does it actually cause a problem? (IIRC we never use the or maybe i misunderstood you and you're talking of another complication? |
Sure we could return a pointer to the middle of a listpack, but then things like dictSetValue(de) needs a changed API. It would need a pointer to the beginning of the listpack. The various strings sds, robj and other strings is another complication. Currently, a dict key is just a pointer and dictType callbacks decide how they're interpreted. We'd need to change all of that, or add things dictType->getKeyLength(key) and dictType->getKeyChars(key) and then reconstruct the key/value as an sds (etc.) in dictGetKey and dictGetValue. It needs some amount of API design. |
|
ohh, right. i forgot about the problem with listpack values being non "sds compatible". the other comment you made about dictSetValue needing a reference to the listpack is maybe an indication that the interface isn't abstract enough. maybe the way around it is to introduce an alternative interface in dict.h (one that can work side by side with the current one), and then convert just t_set.c and then t_hash.c etc to use it. but actually that's not that different than just creating an alternative dict2.c. i guess that either way, the sds problem is a huge blocker to proceed in that direction. |
There's value in not duplicating code though. The good bits to keep are scan and incremental rehash. We'll see the path of least resistance when we start implementing. 👁️
Yes, I'm doing dicts without value first, for sets (and potentially useful for the keyspace dict in the future if we move the db key to robj). No embedded keys, just pointer bit trix. |
2795515 to
e87d354
Compare
|
Rebased (trivial merge conflicts) |
Use functions for all accesses to dictEntry (except in dict.c). Dict abuses e.g. in defrag.c have been replaced by support functions provided by dict.
Also delete unused function activeDefragSdsListAndDict
…canDefrag This change deletes the dictGetNext and dictGetNextRef functions, so the dict API doesn't expose the next field at all. The bucket function in dictScan is deleted. A separate dictScanDefrag function is added which takes a defrag alloc function to defrag-reallocate the dict entries. "Dirty" code accessing the dict internals in active defrag is removed. An 'afterReplaceEntry' is added to dictType, which allows the dict user to keep the dictEntry metadata up to date after reallocation/defrag/move. Additionally, for updating the cluster slot-to-key mapping, after a dictEntry has been reallocated, we need to know which db a dict belongs to, so we store a pointer to the db in a new metadata section in the dict struct, which is a new mechanism similar to dictEntry metadata. This adds some complexity but provides better isolation.
instead of passing it around to every defrag function
e87d354 to
2bbc891
Compare
Fix #13612 This bug was introduced by #11465 This PR added the incremental defragmentation for db->expires. If the time for the defragmentation cycle has not arrived, it will exit unless both cursor and expires_cursor are 0, meaning both keys and expires have been fragmented. However, the expires_cursor check has now been overlooked, which leads to we will exit immediately even if the defragmentation doesn't reach the end time, which will cause the efficiency of defragmentation to become very low. Note that this bug was already fixed in version 7.4(#13058)
This PR refactors the abstraction of the dictEntry by making it opaque. This enables future optimizations of the dict implementation without affecting the code using it.
The PR contains 5 commits. More detailed commit messages are found in each commit.