Add memory utilization analytics to mallctl by yinan1048576 · Pull Request #1463 · jemalloc/jemalloc

yinan1048576 · 2019-03-18T18:26:42Z

The analytics tool is put under experimental.utilization namespace in
mallctl. Input is one pointer or an array of pointers and the output
is a list of memory utilization statistics.

interwq · 2019-03-18T20:28:07Z

+static int
+experimental_frag_ctl(tsd_t *tsd, const size_t *mib, size_t miblen,
+    void *oldp, size_t *oldlenp, void *newp, size_t newlen) {
+	assert(oldp != NULL && newp != NULL);


We would need to handle the cases (return EINV).

interwq · 2019-03-18T20:28:49Z

+    void *oldp, size_t *oldlenp, void *newp, size_t newlen) {
+	assert(oldp != NULL && newp != NULL);
+
+	/* trim *oldlenp and newlen so that:


Check the coding style here: https://github.com/jemalloc/jemalloc/wiki/Coding-Style

interwq · 2019-03-18T20:30:03Z

+	 * (b) both are trimmed to a multiple of the unit size (if not yet)
+	 * rely on compiler to upgrade multiplication / division to shifts
+	 */
+	if (*oldlenp / (2 * sizeof(uint16_t)) < newlen / sizeof(void *)) {


As mentioned in person, let's use 32-bit.

interwq · 2019-03-18T20:31:26Z

+	 * resides in, where the first one is the number of free regions and
+	 * the second one is the total number of regions
+	 */
+	void * const newend = newp + newlen;


It looks this is causing issues in Windows CI: https://ci.appveyor.com/project/jasone/jemalloc/builds/23165922/job/q69f6p9pi9ijulq1

interwq · 2019-03-18T20:32:38Z

+	for (; newp < newend; newp += sizeof(void *)) {
+		const void *ptr = *(const void **)newp;
+		const extent_t *ext = iealloc(tsd_tsdn(tsd), ptr);
+		if (extent_slab_get(ext)) {


nit: assert(ext != NULL);

interwq · 2019-03-18T20:34:00Z

+	 */
+	void * const newend = newp + newlen;
+	for (; newp < newend; newp += sizeof(void *)) {
+		const void *ptr = *(const void **)newp;


The core logic (the lines in the loop) should live in the extent.c.

davidtgoldblatt · 2019-03-18T20:59:24Z

I don't think it's the most important thing in the world while this is experimental, but I also have doubts that this is the right API. I think better would be to return some token that maps to an extent, and then have ways to access extent information given a token (I suspect that in the future we won't just care about fragmentation information, but also size, sampled status, etc. etc.). We should chat more generally tomorrow.

yinan1048576 · 2019-03-20T17:23:04Z

Revised the change based on our offline discussions.

Return three instead of two pieces of information for each input pointer: number of free regions, the total number of regions, and the total size of the corresponding extent. Application can have whatever criteria they want afterwards for determining need for defragmentation. Did not go with returning the handles of the corresponding extents because we don't want to always guarantee that the underlying extents will stay before the caller tries to query the other information using the extent handles.
The type of extent total size is size_t, so I bumped up the size of all three fields to size_t so that it's convenient for the application to fetch them.
I explicitly terminate and raise error whenever seeing improper input sizes, rather than silently manipulating them as I did previously.
I'm not exactly sure when the extent_t output from iealloc can be NULL. In theory the input pointers can be anything, and I assume it's possible that the iealloc output might output NULL. In such cases rather than doing assertion, I zero the output, continue the computation, but report an error to the caller in the end. The effect is that if the caller made a mistake when supplying the input pointers, which I assume was most likely due to carelessness, they know exactly if and where there are the mistake(s) afterwards, and all the other correct pointers got their output.
The issue reported from the Windows compiler was a legitimate one that gcc should have catched. I made the correction and hopefully the Windows compiler is happy this time...

interwq

We can start adding unit tests for the new API.

interwq · 2019-03-20T17:32:20Z

+		const extent_t *ext = iealloc(
+		    tsd_tsdn(tsd), *(const void **)newp);
+		if (unlikely(ext == NULL)) {
+			memset(oldp, 0, sizeof(size_t) * 3);


Not really need to zero it out. As long as we return an error code, the user is not supposed to look into the values anyway.

interwq · 2019-03-20T17:33:33Z

+	 */
+	void * const newend = (const void **)newp + newlen;
+	for (; newp < newend; newp = (const void **)newp + 1) {
+		const extent_t *ext = iealloc(


Let’s move the lookup into the extent info get function as well, doesn’t look like we need the extent_t here other than as the input to that function.

interwq · 2019-03-20T17:35:29Z

+			extent_frag_get(ext, (size_t *)oldp,
+			    (size_t *)oldp + 1, (size_t *)oldp + 2);
+		}
+		oldp = (size_t *)oldp + 3;


Let’s make a size_t *ptr and adjust that. Better keep the oldp intact.

interwq · 2019-03-20T17:36:34Z

 }

+static inline void
+extent_frag_get(const extent_t *extent,


Nit: not sure about the function name

yinan1048576 · 2019-03-21T23:58:27Z

Made a couple of modifications according to feedback and offline discussions.

Still decide to zero out the output fields for invalid input pointers. I've put the reasoning as comment. One more comment (for the comment): valid pointers (i.e. those returned from previous malloc calls) will surely give non-NULL extent, but invalid pointers may or may not always give NULL extent depending on the underlying implementation, and some "maliciously" invalid pointers may also directly lead to segfault. In my reasoning, for simplicity, I'm assuming: (a) the caller has good intention and provide us pointers they truly want to get fragmentation info about, and (b) NULL extent is a sufficient and necessary indication of invalid pointers.
Encapsulate the extent looking-up and extent reading into a single function and put it in extent.c, with declaration put in extent_extern.h. Again, not sure whether the function name is ideal - I'm calling it extent_frag_stats, which sounds slightly better than extent_frag to me.
Define additional local variables to make the code more intuitive and readable.
Added unit tests. I didn't add unit tests for testing NULL extent though, because it's still a mystery to me what the sufficient and necessary conditions for triggering it would be. In my local micro benchmark, I query for pointers to both the metadata and the content of std::string (where the content may or may not be an independent dynamic memory allocation), and I sometimes bump into EFAULT, but still no clue on the triggering condition.

interwq · 2019-03-22T05:24:03Z

 bool extent_boot(void);

+int
+extent_frag_stats(tsdn_t *tsdn, const void *ptr,


nit: put "int" on same line, use up to 80 chars per line.

re: naming -- maybe, extent_util_stats_get? We already use the term utilization in the stats world.

interwq · 2019-03-22T05:26:36Z

+		ret = EINVAL;
+		goto label_return;
+	}
+	newlen /= sizeof(const void *);


nit: Similarly, keep the original input args, feel free to define a new one with explicit naming.

interwq · 2019-03-22T05:30:17Z

 label_return:
 	return ret;
 }
+


Suggestion: add some example of the return format / layout, something like.

/* * input[0]: pointer_to_query | output[0]: extent_n_free_items * | output[1]: .... ...

Also a simple description regarding what the 4 args do.

interwq · 2019-03-22T05:33:10Z

+	const extent_t *extent = iealloc(tsdn, ptr);
+	if (unlikely(extent == NULL)) {
+		*nfree = *nregs = *size = 0U;
+		return EFAULT;


nit: this can simply return true; the EFAULT etc is for user-facing return codes. Internal functions don't really need to use them.

interwq · 2019-03-22T05:34:05Z

+			*nregs = 1U;
+		}
+		*size = extent_size_get(extent);
+		return 0;


nit: add assert(*size >= nfree * nregs);

interwq · 2019-03-22T05:35:20Z

 static const ctl_named_node_t experimental_node[] = {
-	{NAME("hooks"),		CHILD(named, hooks)}
+	{NAME("hooks"),		CHILD(named, hooks)},
+	{NAME("frag"),		CTL(experimental_frag)},


Naming is hard --frag feels a bit strange. Let's always use fragmentation?

yinan1048576 · 2019-03-22T18:06:32Z

Revised according to feedback.

Improve formatting / naming / readability.
Do not return EFAULT when extent is NULL. It doesn't affect the caller's ability to detect it at all because the output fields are zeroed out.
Improve comments.

yinan1048576 · 2019-03-22T21:10:12Z

Added two assertions:

assert(*nfree <= *nregs);
assert(*nfree * extent_usize_get(extent) <= *size);

As discussed offline, the two <= should always be <, but we allow == since it might not be a big issue even if it somehow appears.

interwq · 2019-03-22T21:51:21Z

+ * decisions are not known to the caller.  Therefore, we permit pointers to
+ * memory usages that may not be returned by previous malloc calls, and we
+ * provide the caller a convenient way to identify such cases.
+ */


nit: let's also add a simple example for the actual defrag work, e.g. disable tcache; free / alloc; re-enable tcache.

interwq · 2019-03-22T21:53:28Z

+ * provide the caller a convenient way to identify such cases.
+ */
+static int
+experimental_fragmentation_ctl(tsd_t *tsd, const size_t *mib, size_t miblen,


how about experimental_fragmentation_query, or maybe utilization instead of fragmentation?

interwq · 2019-03-22T21:54:32Z

+
+	const void **ptrs = (const void **)newp;
+	const void ** const ptrs_end = ptrs + len;
+	size_t *frag_stats = (size_t *)oldp;


nit: let's name this util_stats, to match the extent_util call below.

interwq · 2019-03-22T21:55:52Z

+	out_sz_ref = out_sz /= 2;
+	in_sz /= 2;
+	assert_d_eq(mallctl(
+			"experimental.fragmentation", out, &out_sz, in, in_sz),


nit: spacing is off. Let's always do 4 spaces for second line, regardless of the nested calls.

yinan1048576 · 2019-03-26T17:56:13Z

Revised according to feedback.

yinan1048576 · 2019-03-26T18:25:03Z

Also changed macro naming in tests.

interwq · 2019-03-30T22:54:54Z

+ *
+ * A typical workflow would be composed of the following steps:
+ *
+ * (1) flush and disable tcache using mallctl("thread.tcache.enabled", ...)


nit: explain why tcache needs to be disabled. Also, do the experimental.utilization call first, and we should only disable tcache if defrag is considered as necessary.

David mentioned that using explicitly controlled manual tcache might work too; however I just realized we don't provide a way to replace auto tcache (embedded in TSD) with manual tcache. So disabling / re-enabling tcache is probably the easiest way.

interwq · 2019-03-30T22:58:17Z

+ * A typical workflow would be composed of the following steps:
+ *
+ * (1) flush and disable tcache using mallctl("thread.tcache.enabled", ...)
+ * (2) initialize temporary arrays for input/output


nit: maybe a bit more detailed, e.g. fill the input array with pointers to query fragmentation.

interwq · 2019-03-30T23:20:40Z

+		goto label_return;
+	}
+
+	const void **ptrs = (const void **)newp;


nit: I slightly prefer having an integer index for the loop here, i.e.

typedef struct { size_t nfree; ... } extent_util_stats_t; extent_util_stats_t *output_stats = (extent_util_stats_t *)oldp; void **input_ptrs = (void **)newp; for (unsigned i = 0; i < len; i++) { extent_util_stats_get(tsd, &input_ptrs[i], &output_stats[i].nfree, ...); }

yinan1048576 · 2019-04-02T17:57:07Z

Revised the change -

Split the mallctl API into two: experimental.utilization.query for single pointer query, and experimental.utilization.batch_query for batch querying.
For single pointer query, return an additional field at the beginning: the address of slabcur, which is the place of a potential reallocation. It's up to the application how to use it to determine fragmentation. In the implementation from Redis, fragmentation is set to be true whenever this address of potential reallocation is not equal to the extent the input pointer resides in, which might be an overly strict criteria. If really desired, the application can achieve the same effect by examining whether the deviation of the input pointer from our returned slabcur_addr is below a very low threshold (i.e. the size of a single extent), but it's also possible to define less strict criteria by loosening the deviation threshold.
Do not return slabcur_addr for batch querying, because it'd be quickly changing throughout the defragmentation process and thus returning it for all pointers at query time may not make that much sense for determining defragmentation. The nfree field can also change throughout the defragmentation process, but it is relatively less volatile since it's always local to the extent the pointer resides in, and it has a nicer property: in the middle of the defragmentation process, at the time a particular pointer is defragmented, (a) if nfree decreased from what it was at querying time, then it means the defragmentation process up till this moment favored reallocating to this extent, implying that it hasn't been a problem earlier; (b) if nfree increased, then that means the defragmentation process up till this moment favored reallocating away from this extent, implying that it has already been a problem earlier. In both cases the defragmentation determination at the querying time wouldn't be affected even if we had relied on the nfree at this later time.
Explicitly define structs holding the output fields so as to improve readability. The structs are not exposed out and are only for internal use in src/ctl.c. I didn't make use of the structs in the unit tests because I'd like to mimic what application would do.
Improve the comment and code readability.

interwq · 2019-04-03T16:16:47Z


+static const ctl_named_node_t utilization_node[] = {
+	{NAME("query"),		CTL(experimental_utilization_query)},
+	{NAME("batch_query"),	CTL(experimental_utilization_batch_query)},


nit: no comma at the end.

interwq · 2019-04-03T16:17:10Z

 static const ctl_named_node_t experimental_node[] = {
-	{NAME("hooks"),		CHILD(named, hooks)}
+	{NAME("hooks"),		CHILD(named, hooks)},
+	{NAME("utilization"),	CHILD(named, utilization)},


nit: comma too

interwq · 2019-04-03T16:21:22Z

+ *
+ * It's up to the application how to determine the significance of
+ * fragmentation relying on the statistics returned.  Possible choices are:
+ * (a) if input pointer deviates a lot from potential reallocation address,


nit: I'd put (a) to the last, as I imagine it won't be a popular choice.

interwq · 2019-04-03T16:30:15Z

+ * It can be beneficial to define the following macros to make it easier to read
+ * the returned statistics:
+ *
+ * #define SLABCUR *(void **)oldp


nit: for better readability, define the offsets based on sizeof(size_t) would be better, e.g. NFREE_OFFSET would be sizeof(size_t), with NFREE_READ(output) as *((size_t *)((size_t)output + NFREE_OFFSET)))

interwq · 2019-04-03T16:39:01Z

+ *
+ * A typical workflow would be composed of the following steps:
+ *
+ * (1) flush tcache: mallctl("thread.tcache.flush", ...)


nit: let's not duplicate the workflow example. In the single query case, say refer to the other place is fine. We should try to clean up the comments for the single query, to focus on the difference from the batch case.

interwq · 2019-04-03T17:11:53Z

+		bin_t *bin = &arena->bins[szind].bin_shards[binshard];
+
+		malloc_mutex_lock(tsdn, &bin->lock);
+		const extent_t *cur = bin->slabcur;


nit: no need for the cur var.

interwq · 2019-04-03T17:21:19Z

+void
+extent_util_stats_get(tsdn_t *tsdn, const void *ptr, size_t *nfree,
+    size_t *nregs, size_t *size, void **slabcur_addr) {
+	*nfree = *nregs = *size = 0;


nit: We try avoiding double initialization, mainly for the purpose of explicitly setting the values, instead of relying on the default value to cover multiple cases (more error prone). Let's remove this line and the branch below, and see the comments below for changes.

interwq · 2019-04-03T17:22:03Z

+
+	const extent_t *extent = iealloc(tsdn, ptr);
+	if (unlikely(extent == NULL)) {
+		return;


... add *nfree = *nregs = *size = 0;, then goto label_return;

interwq · 2019-04-03T17:22:28Z

+	*size = extent_size_get(extent);
+	if (!extent_slab_get(extent)) {
+		*nregs = 1;
+		return;


... add *nfree = 0; and then goto label_return.

yinan1048576 · 2019-04-04T00:35:51Z

Revised the following:

Added two additional output fields bin_nfree and bin_nregs for single pointer query. This is to ensure that this new API is fully compatible with redis usage case.
Improved code quality & readability, comment and unit tests.

FYI in the latest redis usage (see https://github.com/antirez/redis/blob/167705519b5f21b2aa3954527be15dc653131221/deps/jemalloc/include/jemalloc/internal/jemalloc_internal_inlines_c.h#L219 for the implementation and https://github.com/antirez/redis/blob/22e9321c3ee238f498980cf18781e10caaa01589/src/defrag.c#L57 for the caller), the fragmentation is triggered at the following condition:

je_get_defrag_hint(ptr, &bin_util, &run_util)
    && run_util <= bin_util
    && run_util < 1<<16

which is equivalent to:

ptr >= SLABCUR_READ(out) && ptr < SLABCUR_READ(out) + SIZE_READ(out)
    && NFREE_READ(out) * BIN_NREGS_READ(out) >= NREGS_READ(out) * BIN_NFREE_READ(out)
    && NFREE_READ(out) > 0

assuming out points to the mallctl output.

yinan1048576 · 2019-04-04T01:02:43Z

Tests fail when --disable-stats is set, which is expected. Fixing it...

yinan1048576 · 2019-04-04T16:16:49Z

Fixed the failure. It also makes me realize that the redis implementation assumed stats are turned on, and might crash otherwise since there can be a "divide by zero".

interwq · 2019-04-04T17:28:53Z

+ * In case of large class allocations, "(a)" will be NULL, and "(e)" and "(f)"
+ * will be zero.  The other three fields will be properly set though the values
+ * are trivial: "(b)" will be 0, "(c)" will be 1, and "(d)" will be the original
+ * allocation size rounded up to the nearest size class.


nit: the term here is usable size.

interwq · 2019-04-04T17:31:49Z

+static int
+experimental_utilization_query_ctl(tsd_t *tsd, const size_t *mib,
+    size_t miblen, void *oldp, size_t *oldlenp, void *newp, size_t newlen) {
+	int ret = 0;


nit: similarly, try avoiding double initialization. Here we only need to define ret. And on the success path, do ret = 0 before label_return.

interwq · 2019-04-04T17:33:00Z

+experimental_utilization_query_ctl(tsd_t *tsd, const size_t *mib,
+    size_t miblen, void *oldp, size_t *oldlenp, void *newp, size_t newlen) {
+	int ret = 0;
+	void *ptr = NULL;


nit: no need to initialize.

interwq · 2019-04-04T17:35:40Z

+	assert(sizeof(extent_util_stats_verbose_t)
+	    == sizeof(void *) + sizeof(size_t) * 5);
+
+	WRITE(ptr, void *);


nit: let's move this to after the input check below; and also check for newp in the branch.

interwq · 2019-04-04T17:37:00Z

+ * This API is mainly intended for small class allocations, where extents are
+ * used as slab.  In case of large class allocations, the outputs are trivial:
+ * "(a)" will be 0, "(b)" will be 1, and "(c)" will be the original allocation
+ * size rounded up to the nearest size class.


nit: term usable size too.

interwq · 2019-04-04T17:39:14Z

+static int
+experimental_utilization_batch_query_ctl(tsd_t *tsd, const size_t *mib,
+    size_t miblen, void *oldp, size_t *oldlenp, void *newp, size_t newlen) {
+	int ret = 0;


similarly, not double initialize ret.

yinan1048576 · 2019-04-04T18:10:38Z

Revised according to feedback.

The analytics tool is put under experimental.utilization namespace in mallctl. Input is one pointer or an array of pointers and the output is a list of memory utilization statistics.

interwq · 2019-04-04T20:48:05Z

 	    test_hooks,
-	    test_hooks_exhaustion);
+	    test_hooks_exhaustion,
+	    test_utilization_query,


The util query tests deserve their own tests I feel. Let's move them to a util_query.c in a follow up diff.

zvi-code · 2024-06-10T07:59:03Z

I plan to issue PR [link TBD] to Valkey, that replaces the existing patch valkey with this API. One peace of information that current algorithm uses and is not included in the API is the number of fullslabs in the bin. The motivation for having the number of fullslabs is to improve the convergence of defrag(the fullslabs are shifting the average and may cause stagnation in defrag process).
What is the best alternative way I can fetch this information with one of the existing API?

Probably this API will do the work:

stats.arenas..bins..curslabs (size_t) r- [--enable-stats]
stats.arenas..bins..nonfull_slabs (size_t) r- [--enable-stats]

zuiderkwast · 2024-06-26T08:58:38Z

It's interesting that there are no links between this PR and #566. Until today I was not aware of this PR.

### Summary of the change This is a base PR for refactoring defrag. It moves the defrag logic to rely on jemalloc [native api](jemalloc/jemalloc#1463 (comment)) instead of relying on custom code changes made by valkey in the jemalloc ([je_defrag_hint](https://github.com/valkey-io/valkey/blob/9f8185f5c80bc98bdbc631b90ccf13929d6a0cbc/deps/jemalloc/include/jemalloc/internal/jemalloc_internal_inlines_c.h#L382)) library. This enables valkey to use latest vanila jemalloc without the need to maintain code changes cross jemalloc versions. This change requires some modifications because the new api is providing only the information, not a yes\no defrag. The logic needs to be implemented at valkey code. Additionally, the api does not provide, within single call, all the information needed to make a decision, this information is available through additional api call. To reduce the calls to jemalloc, in this PR the required information is collected during the `computeDefragCycles` and not for every single ptr, this way we are avoiding the additional api call. Followup work will utilize the new options that are now open and will further improve the defrag decision and process. ### Added files: `allocator_defrag.c` / `allocator_defrag.h` - This files implement the allocator specific knowledge for making defrag decision. The knowledge about slabs and allocation logic and so on, all goes into this file. This improves the separation between jemalloc specific code and other possible implementation. ### Moved functions: [`zmalloc_no_tcache` , `zfree_no_tcache` ](https://github.com/valkey-io/valkey/blob/4593dc2f059661e1c4eb43bba025f68948344228/src/zmalloc.c#L215) - these are very jemalloc specific logic assumptions, and are very specific to how we defrag with jemalloc. This is also with the vision that from performance perspective we should consider using tcache, we only need to make sure we don't recycle entries without going through the arena [for example: we can use private tcache, one for free and one for alloc]. `frag_smallbins_bytes` - the logic and implementation moved to the new file ### Existing API: * [once a second + when completed full cycle] [`computeDefragCycles`](https://github.com/valkey-io/valkey/blob/4593dc2f059661e1c4eb43bba025f68948344228/src/defrag.c#L916) * `zmalloc_get_allocator_info` : gets from jemalloc _allocated, active, resident, retained, muzzy_, `frag_smallbins_bytes` * [`frag_smallbins_bytes`](https://github.com/valkey-io/valkey/blob/4593dc2f059661e1c4eb43bba025f68948344228/src/zmalloc.c#L690) : for each bin; gets from jemalloc bin_info, `curr_regs`, `cur_slabs` * [during defrag, for each pointer] * `je_defrag_hint` is getting a memory pointer and returns {0,1} . [Internally it uses](https://github.com/valkey-io/valkey/blob/4593dc2f059661e1c4eb43bba025f68948344228/deps/jemalloc/include/jemalloc/internal/jemalloc_internal_inlines_c.h#L368) this information points: * #`nonfull_slabs` * #`total_slabs` * #free regs in the ptr slab ## Jemalloc API (via ctl interface) [BATCH][`experimental_utilization_batch_query_ctl`](https://github.com/valkey-io/valkey/blob/4593dc2f059661e1c4eb43bba025f68948344228/deps/jemalloc/src/ctl.c#L4114) : gets an array of pointers, returns for each pointer 3 values, * number of free regions in the extent * number of regions in the extent * size of the extent in terms of bytes [EXTENDED][`experimental_utilization_query_ctl`](https://github.com/valkey-io/valkey/blob/4593dc2f059661e1c4eb43bba025f68948344228/deps/jemalloc/src/ctl.c#L3989) : * memory address of the extent a potential reallocation would go into * number of free regions in the extent * number of regions in the extent * size of the extent in terms of bytes * [stats-enabled]total number of free regions in the bin the extent belongs to * [stats-enabled]total number of regions in the bin the extent belongs to ### `experimental_utilization_batch_query_ctl` vs valkey `je_defrag_hint`? [good] - We can query pointers in a batch, reduce the overall overhead - The per ptr decision algorithm is not within jemalloc api, jemalloc only provides information, valkey can tune\configure\optimize easily [bad] - In the batch API we only know the utilization of the slab (of that memory ptr), we don’t get the data about #`nonfull_slabs` and total allocated regs. ## New functions: 1. `defrag_jemalloc_init`: Reducing the cost of call to je_ctl: use the [MIB interface](https://jemalloc.net/jemalloc.3.html) to get a faster calls. See this quote from the jemalloc documentation: The mallctlnametomib() function provides a way to avoid repeated name lookups for applications that repeatedly query the same portion of the namespace,by translating a name to a “Management Information Base” (MIB) that can be passed repeatedly to mallctlbymib(). 6. `jemalloc_sz2binind_lgq*` : this api is to support reverse map between bin size and it’s info without lookup. This mapping depends on the number of size classes we have that are derived from [`lg_quantum`](https://github.com/valkey-io/valkey/blob/4593dc2f059661e1c4eb43bba025f68948344228/deps/Makefile#L115) 7. `defrag_jemalloc_get_frag_smallbins` : This function replaces `frag_smallbins_bytes` the logic moved to the new file allocator_defrag `defrag_jemalloc_should_defrag_multi` → `handle_results` - unpacks the results 8. `should_defrag` : implements the same logic as the existing implementation [inside](https://github.com/valkey-io/valkey/blob/9f8185f5c80bc98bdbc631b90ccf13929d6a0cbc/deps/jemalloc/include/jemalloc/internal/jemalloc_internal_inlines_c.h#L382) je_defrag_hint 9. `defrag_jemalloc_should_defrag_multi` : implements the hint for an array of pointers, utilizing the new batch api. currently only 1 pointer is passed. ### Logical differences: In order to get the information about #`nonfull_slabs` and #`regs`, we use the query cycle to collect the information per size class. In order to find the index of bin information given bin size, in o(1), we use `jemalloc_sz2binind_lgq*` . ## Testing This is the first draft. I did some initial testing that basically fragmentation by reducing max memory and than waiting for defrag to reach desired level. The test only serves as sanity that defrag is succeeding eventually, no data provided here regarding efficiency and performance. ### Test: 1. disable `activedefrag` 2. run valkey benchmark on overlapping address ranges with different block sizes 3. wait untill `used_memory` reaches 10GB 4. set `maxmemory` to 5GB and `maxmemory-policy` to `allkeys-lru` 5. stop load 6. wait for `mem_fragmentation_ratio` to reach 2 7. enable `activedefrag` - start test timer 8. wait until reach `mem_fragmentation_ratio` = 1.1 #### Results*: (With this PR)Test results: ` 56 sec` (Without this PR)Test results: `67 sec` *both runs perform same "work" number of buffers moved to reach fragmentation target Next benchmarking is to compare to: - DONE // existing `je_get_defrag_hint` - compare with naive defrag all: `int defrag_hint() {return 1;}` --------- Signed-off-by: Zvi Schneider <ezvisch@amazon.com> Signed-off-by: Zvi Schneider <zvi.schneider22@gmail.com> Signed-off-by: zvi-code <54795925+zvi-code@users.noreply.github.com> Co-authored-by: Zvi Schneider <ezvisch@amazon.com> Co-authored-by: Zvi Schneider <zvi.schneider22@gmail.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>

interwq changed the title ~~add fragmentation analytics to mallctl [test only]~~ [not for review yet] add fragmentation analytics to mallctl [test only] Mar 18, 2019

interwq reviewed Mar 18, 2019

View reviewed changes

yinan1048576 changed the title ~~[not for review yet] add fragmentation analytics to mallctl [test only]~~ Add fragmentation analytics to mallctl Mar 20, 2019

interwq reviewed Mar 20, 2019

View reviewed changes

interwq reviewed Mar 22, 2019

View reviewed changes

interwq reviewed Mar 30, 2019

View reviewed changes

interwq reviewed Apr 3, 2019

View reviewed changes

yinan1048576 changed the title ~~Add fragmentation analytics to mallctl~~ Add memory utilization analytics to mallctl Apr 4, 2019

interwq reviewed Apr 4, 2019

View reviewed changes

Add memory utilization analytics to mallctl

c1134b6

The analytics tool is put under experimental.utilization namespace in mallctl. Input is one pointer or an array of pointers and the output is a list of memory utilization statistics.

interwq reviewed Apr 4, 2019

View reviewed changes

interwq approved these changes Apr 4, 2019

View reviewed changes

interwq merged commit 9aab3f2 into jemalloc:dev Apr 4, 2019

interwq mentioned this pull request Feb 20, 2020

Increasing memory consumption - Expectation for fragmentation of arenas #1740

Open

zvi-code mentioned this pull request Jun 25, 2024

defrag: replace je_get_defrag_hint with jemalloc native interface and remove valkey specific changes in jemalloc source code valkey-io/valkey#692

Closed

zvi-code mentioned this pull request Nov 5, 2024

Remove valkey specific changes in jemalloc source code valkey-io/valkey#1266

Merged

Uh oh!

Conversation

yinan1048576 commented Mar 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidtgoldblatt commented Mar 18, 2019

Uh oh!

yinan1048576 commented Mar 20, 2019

Uh oh!

interwq left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yinan1048576 commented Mar 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yinan1048576 commented Mar 22, 2019

Uh oh!

yinan1048576 commented Mar 22, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yinan1048576 commented Mar 26, 2019

Uh oh!

yinan1048576 commented Mar 26, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yinan1048576 commented Apr 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

yinan1048576 commented Mar 18, 2019 •

edited

Loading

yinan1048576 commented Mar 21, 2019 •

edited

Loading

yinan1048576 commented Apr 2, 2019 •

edited

Loading