add 24 bytes tiny size class to jemalloc#2510
Conversation
|
I love this, thanks @oranagra, do you have information about its safeness? Are we likely to trigger some jemalloc bug given that we have a non standard size class with this patch, or it is a widespread feature? Thanks. |
|
we've been using this patch for quite some time now, and it seems safe. I consulted the jemalloc mailing list about it, and they suggested a different approach (with command line arguments that changes LG_QUANTUM), but i like mine better especially since it fixes what seems to be a minor bug in that script, and i don't think changing the platform quantum is the right thing to do (i rather add more tiny classes instead). bottom line is that if you look at the code this script generates, it just adds one more bin size to a table that the code works with. i don't see any risk in that, and a lot of testing over time already proved it works good. |
|
Thanks for the clarifications, not enough background to follow your reasoning, no idea what jemalloc |
|
⭐ |
|
Hello again @oranagra, I did my homework studying the Jemalloc script and understanding the LG_QUANTUM macro. I'm a bit biased towards the solution of changing LG_QUANTUM to 3, instead of the script, for a couple of reasons:
That said apparently the 16 bytes alignment is only used for SSE, and otherwise the larger C type is 8 bytes so we are unlikely to have problems anyway. That said, I guess that the following change should do for now? diff --git a/deps/jemalloc/include/jemalloc/internal/jemalloc_internal.h.in b/dep
index 574bbb1..df266ab 100644
--- a/deps/jemalloc/include/jemalloc/internal/jemalloc_internal.h.in
+++ b/deps/jemalloc/include/jemalloc/internal/jemalloc_internal.h.in
@@ -242,7 +242,7 @@ static const bool config_ivsalloc =
*/
#ifndef LG_QUANTUM
# if (defined(__i386__) || defined(_M_IX86))
-# define LG_QUANTUM 4
+# define LG_QUANTUM 3
# endif
# ifdef __ia64__
# define LG_QUANTUM 4
@@ -254,7 +254,7 @@ static const bool config_ivsalloc =
# define LG_QUANTUM 4
# endif
# if (defined(__amd64__) || defined(__x86_64__) || defined(_M_X64))
-# define LG_QUANTUM 4
+# define LG_QUANTUM 3
# endif
# ifdef __arm__
# define LG_QUANTUM 3I would just add a few ifdefs in order to make sure only Linux get this change. I'm going to test in a few data sets the difference it makes in practice. Feedbacks welcomed. |
|
p.s. actually |
|
Hi,
If we consider the jemalloc mailinglist reply trustworthy, then i guess it doesn't have any memory alignment side effects. but i didn't test it myself.
Regarding your concerns about alignment. My view regarding applying it only for Linux is that we shouldn't do that out of fear. anyway, my bottom line is that i don't really care this way or the other.. these were my 2 cents.... |
|
Hello Oran, Yep LG_QUANTUM affects more than just the size class, however if you add a 24 size class you are effectively just violated LG_QUANTUM, since jemalloc now returns non 16 bytes aligned stuff for certain allocations. So the risky part is equivalent (but read later about the risk). About moving more classes into tiny compared to quantum, I don't think this is going to have side effects since there are already platforms using LG_QUANTUM set to 3 (apparently ARM is one of those if I read it correctly in I agree about the fact your change fixes the script, but IMHO this should be sent as a pull request to jemalloc for them to fix this, instead of patching this ourselves. I believe it is better if we just depend on the things are standard and tested for jemalloc (even if broken! Like the script). We know there are people running with LG_QUANTUM set to 3 and they are even going to add it to Moreover I really believe that if we go for more fine grained size classes, there is no reason to don't get the 40 size class, and the non 16 bytes aligned others. About the safety of all this, as you said unless a structure is not packed we are safe, moreover... we are using our internal jemalloc only for the goal of Redis. This is very different than targeting a system wide jemalloc, so probably we are completely safe both on Linux and other OSes, since even if Linux has workarounds for the SEE case or others, we just don't care, we don't use any 16 bytes aligned data type. So I agree with you that after all there is no reason to protect with Linux specific defines... Without to mention that Jemalloc itself is compiled by default only on Linux. I'm making the change and committing in a a few. Update ASAP. Thanks! |
|
What about that? Unstable original memory usage (DEBUG POPULATE 1000000): 115.57M, after SDS changes 92.69M, after jemalloc new classes: 77.43M 💃 All tests passing. We can close this as well, thanks @oranagra! |
|
@antirez How about allocating performance? saving memory is amazing :) |
This gives us a 24 bytes size class which is dict.c dictEntry size, thus improving the memory efficiency of Redis significantly. Moreover other non 16 bytes aligned tiny classes are added that further reduce the fragmentation of the allocator. Technically speaking LG_QUANTUM should be 4 on i386 / AMD64 because of SSE types and other 16 bytes types, however we don't use those, and our jemalloc only targets Redis. New versions of Jemalloc will have an explicit configure switch in order to specify the quantum value for a platform without requiring any change to the Jemalloc source code: we'll switch to this system when available. This change was originally proposed by Oran Agra (@oranagra) as a change to the Jemalloc script to generate the size classes define. We ended doing it differently by changing LG_QUANTUM since it is apparently the supported Jemalloc method to obtain a 24 bytes size class, moreover it also provides us other potentially useful size classes. Related to issue #2510.
…de as primary (redis#2510) Picked from valkey-io/valkey#2510 Fixes: redis#2460 With this change we avoid divergence in cluster and replication layer view. I've observed node can be marked as primary in cluster while it can be marked as replica in replication layer view and have a active replication link. Without this change, we used to end up in a invalid replica chain in replication layer.
dictEntry struct, listNode struct, and maybe others, consume 24 bytes each.
but since jemalloc doesn't have a pool for 24 bytes allocations they are taken from the 32 bytes pool, and produce about 33% internal fragmentation overhead.
This patch adds a 24 bytes pool size class to jemalloc.
test (with relatively short strings)
debug populate 10000000
original code's used_memory: 1254709872
with patch used_memory: 1094714048
memory optimization: 14%