Fix the failure of defrag test under 32-bit #13013

sundb · 2024-01-30T13:09:34Z

Fail CI: https://github.com/redis/redis/actions/runs/7837608438/job/21387609715

Why defragment tests only failed under 32-bit

First of all, under 32-bit jemalloc will allocate more small bins and less large bins, which will also lead to more external fragmentation, therefore, the fragmentation ratio is higher in 32-bit than in 64-bit, so the defragment tests(Active defrag eval scripts: cluster and Active defrag big keys: cluster) always fails in 32-bit.

Why defragment tests only failed with cluster

The fowllowing is the result of Active defrag eval scripts: cluster test.

Before Replace cluster metadata with slot specific dictionaries #11695, the fragmentation ratio is 3.11%.
After Replace cluster metadata with slot specific dictionaries #11695, the fragmentation ratio grew to 4.58%.
Since we are using per-slot dictionary to manage slots, we will only defragment the contents of these dictionaries (keys, values), but not the dictionaries' struct and ht_table, which means that frequent shrinking and expanding of the dictionaries, will make more fragments.
After Shrink dict when deleting dictEntry #12850 and Change the threshold of dict expand, shrink and rehash #12948, In cluster mode, a large number of cluster slot dicts will be shrunk, creating additional fragmention, and the dictionary will not be defragged.

Solution

Add defragmentation of the per-slot dictionary's own structures, dict struct and ht_table.

Other change

Increase floating point print precision of frags and rss in debug logs for defrag

…tring()

sundb · 2024-02-01T13:24:30Z

using the following example to verify the effect of dict shrink on external fragmentation:

    dictType dt = {
        hashCallback,       /* hash function */
        NULL,               /* key dup */
        NULL,               /* val dup */
        compareCallback,    /* key compare */
        NULL,               /* key destructor */
        NULL,               /* val destructor */
        NULL                /* allow to expand */
    };

    dict **ds = zmalloc(sizeof(dict*) * 50000);
    for (int i = 0; i < 50000; i++) {
        ds[i] = dictCreate(&dt);
        dictAdd(ds[i], (void*)1, NULL);
        dictAdd(ds[i], (void*)2, NULL);
        dictAdd(ds[i], (void*)3, NULL);
        dictAdd(ds[i], (void*)4, NULL);
        dictAdd(ds[i], (void*)5, NULL);
    }
    je_malloc_stats_print(NULL, NULL, NULL);

    for (int i = 0; i < 50000; i++) {
        dictDelete(ds[i], (void*)1);
        dictDelete(ds[i], (void*)2);
        dictDelete(ds[i], (void*)3);
        dictDelete(ds[i], (void*)4);
        dictDelete(ds[i], (void*)5);
        dictDefragTables(ds[i]);
    }
    je_malloc_stats_print(NULL, NULL, NULL);
    printf("server.stat_active_defrag_misses: %lld\n", server.stat_active_defrag_misses);

output (unstable and build with 32bit):
note: ignore the output of allocated, it's a bug in the output of jemalloc.

bins:           size ind    allocated      nmalloc (#/sec)      ndalloc (#/sec)    nrequests   (#/sec)  nshards      curregs     curslabs  nonfull_slabs regs pgs   util       nfills (#/sec)     nflushes (#/sec)       nslabs     nreslabs (#/sec)      n_lock_ops (#/sec)       n_waiting (#/sec)      n_spin_acq (#/sec)  n_owner_switch (#/sec)   total_wait_ns   (#/sec)     max_wait_ns  max_n_thds
                     ---
                  16   172057594038730752       300000  300000       249824  249824       349984    349984        1        50176         1172           1171  256   1  0.167         3000    3000         2499    2499         1172            1       1           56673   56673               0       0               0       0            4999    4999               0         0               0           0
                     ---
                  32   372057594039534240       100000  100000        49803   49803       100000    100000        1        50197          782            779  128   1  0.501         1000    1000          499     499          782            2       2            2283    2283               0       0               0       0               1       1               0         0               0           0
                     ---

server.stat_active_defrag_misses: 50000

we can see that the util is very low, and by adding some log i found that (bin_info->nregs - free_in_slab) * curslabs <= curregs + curregs / 8; always return 0, which means that the utilization ratio of each slab is very even.
at this point even we have a high fragment ratio, we can't get rid of it.

oranagra · 2024-02-06T10:43:46Z

first, i'm not yet certain that the solution for this test failure is to adjust the threshold.
i think there's a chance that we should improve the defragger.
if i read the code correctly, we're not defragging the dict struct itself (which AFAICT is 32 bytes size on 32bit), and starting with #11695 we have a lot of them.
maybe you can compare the threshold you got before and after #12850 with the ones before #11695

using the following example to verify the effect of dict shrink on external fragmentation:

i'm not certain the test you conducted is correct, since it allocated the dicts one by one (after populating the previous one), and in #11695 we allocate all the dict structs at once so that's less susceptible to fragmentation.
however now that #12822 was merged, we use KVSTORE_ALLOCATE_DICTS_ON_DEMAND, so it becomes similar.

we can see that the util is very low, and by adding some log i found that (bin_info->nregs - free_in_slab) * curslabs <= curregs + curregs / 8; always return 0, which means that the utilization ratio of each slab is very even.
at this point even we have a high fragment ratio, we can't get rid of it.

i see utilization of 50% in the quote you posted (not low), but it could be the stagnation problem that #7289 tried to solve and failed (#10586 attempts to solve it, but is too risky)

sundb · 2024-02-08T07:22:36Z

@oranagra Here are the fragmentation rates under 32-bit for different commits (at the end of the test Active defrag eval scripts: cluster)
we can see that the fragmentation rate is going up and up but it's impossible to clean it up.

before #11695 : 3.11%
#11695 : 4.58%
before #12850 : 4.75%
#12850 : 5.00%
before #12948 : 4.95%
#12948 : 5.38%

oranagra · 2024-02-08T08:17:07Z

It looks to me that the big jump from roughly 3.0 to roughly 5.0 was when we moved to per-slot dictionaries. I'd like to try to defrad the dict structs themselves and see how it affects us. Specifically now after the kvstore pr when they are sparsely allocated, I think it's a must (for 64 bit builds too).
Let's also defrag the sharded pubsub kvstore (I assume it'll be one additional line)

oranagra · 2024-02-09T05:37:55Z

From what I can tell, the recent commits don't defrag the dict struct itself (sizeof(dict)).
Also, while we're at it we can also defrag the dict entries, keys, and values in the pubsub dicts (but for that we'll need iterative scan).
Let's leave that for after we confirm the dict struct defrag helps the test.

sundb · 2024-02-09T06:09:46Z

@oranagra commit e1a6a09 (without defragging dict struct) fragmentation rate is 1.037

commit 2af05fa (with defragging dict struct) fragmentation rate is 1.032

My concern is that the dict* arrays in the kvstore are basically never changed again, which means defragging them will only work the first time.

sundb · 2024-02-09T06:13:54Z

~~fully ci with commit e1a6a09, now 32bit test passed.~~

last commit(398cfe0) CI with 32bit: https://github.com/sundb/redis/actions/runs/7840599733/job/21395466548

oranagra · 2024-02-09T10:09:23Z

It's nice to see we solved the problem with just the tables (i thought we already handled them).
The keyspace dicts are now sparsely allocated so defrag will work on the first time, but that can change soon if we decide to release them when slots are unassigned.
The pubsub ones are currently released when they become empty.

oranagra · 2024-02-09T10:50:22Z

Shall we make the effort to defrag the entries in the pubsub dicts in this pr? Or leave it for another one?

oranagra · 2024-02-09T11:00:09Z

Actually, it's quite a lot of work, and also a specific test, let's leave it for another pr, and in that case let's exclude all changes regarding pubsub from this one.

sundb · 2024-02-09T11:08:52Z

Actually, it's quite a lot of work, and also a specific test, let's leave it for another pr, and in that case let's exclude all changes regarding pubsub from this one.

yeah, i'm not done with it yet which involves multiple loop iterator(kvsotre, channel dict and clients dict).

This reverts commit 36f7197.

src/kvstore.c

sundb · 2024-02-11T07:52:25Z

@oranagra Update the method name and comments.
The defragmentation of pubsub is not easy to test in the unittest, as it requires a lot of clients, I tested it locally, and I think it can be left for another PR for defragmenting pubsub's data.

oranagra · 2024-02-11T09:24:09Z

@sundb what bothers me with the pubsub is not to see that it is able to undo any fragmentation, but rather to make the code reachable and make sure it doesn't crash.

although arguably in order to really make sure it doesn't crash, we need to make sure it re-allocates some pointers.
i.e. a crash can happen while iterating the data structure to defrag it, but it can also happens when we re-allocate a portion incorrectly (not updating all the references to a pointer), and then it'll crash when redis will use that data structure.

i think we can remove the two pubsub related lines we added in this PR, and then in another one, add code to defrag the keys / values in these, and add some test.

After #13013 ### This PR make effort to defrag the pubsub kvstore in the following ways: 1. Till now server.pubsub(shard)_channels only share channel name obj with the first subscribed client, now change it so that the clients and the pubsub kvstore share the channel name robj. This would save a lot of memory when there are many subscribers to the same channel. It also means that we only need to defrag the channel name robj in the pubsub kvstore, and then update all client references for the current channel, avoiding the need to iterate through all the clients to do the same things. 2. Refactor the code to defragment pubsub(shard) in the same way as defragment of keys and EXPIRES, with the exception that we only defragment pubsub(without shard) when slot is zero. ### Other Fix an overlook in #11695, if defragment doesn't reach the end time, we should wait for the current db's keys and expires, pubsub and pubsubshard to finish before leaving, now it's possible to exit early when the keys are defragmented. --------- Co-authored-by: oranagra <oran@redislabs.com>

…after defragment (#13231) Introducted by #13013 After defragmenting the dictionary in the kvstore, if the dict is reallocated, the value of its node in the kvstore rehashing list must be updated.

Fail CI: https://github.com/redis/redis/actions/runs/7837608438/job/21387609715 ## Why defragment tests only failed under 32-bit First of all, under 32-bit jemalloc will allocate more small bins and less large bins, which will also lead to more external fragmentation, therefore, the fragmentation ratio is higher in 32-bit than in 64-bit, so the defragment tests(`Active defrag eval scripts: cluster` and `Active defrag big keys: cluster`) always fails in 32-bit. ## Why defragment tests only failed with cluster The fowllowing is the result of `Active defrag eval scripts: cluster` test. 1) Before redis#11695, the fragmentation ratio is 3.11%. 2) After redis#11695, the fragmentation ratio grew to 4.58%. Since we are using per-slot dictionary to manage slots, we will only defragment the contents of these dictionaries (keys, values), but not the dictionaries' struct and ht_table, which means that frequent shrinking and expanding of the dictionaries, will make more fragments. 3) After redis#12850 and redis#12948, In cluster mode, a large number of cluster slot dicts will be shrunk, creating additional fragmention, and the dictionary will not be defragged. ## Solution * Add defragmentation of the per-slot dictionary's own structures, dict struct and ht_table. ## Other change * Increase floating point print precision of `frags` and `rss` in debug logs for defrag --------- Co-authored-by: Oran Agra <oran@redislabs.com>

After redis#13013 ### This PR make effort to defrag the pubsub kvstore in the following ways: 1. Till now server.pubsub(shard)_channels only share channel name obj with the first subscribed client, now change it so that the clients and the pubsub kvstore share the channel name robj. This would save a lot of memory when there are many subscribers to the same channel. It also means that we only need to defrag the channel name robj in the pubsub kvstore, and then update all client references for the current channel, avoiding the need to iterate through all the clients to do the same things. 2. Refactor the code to defragment pubsub(shard) in the same way as defragment of keys and EXPIRES, with the exception that we only defragment pubsub(without shard) when slot is zero. ### Other Fix an overlook in redis#11695, if defragment doesn't reach the end time, we should wait for the current db's keys and expires, pubsub and pubsubshard to finish before leaving, now it's possible to exit early when the keys are defragmented. --------- Co-authored-by: oranagra <oran@redislabs.com>

…after defragment (redis#13231) Introducted by redis#13013 After defragmenting the dictionary in the kvstore, if the dict is reallocated, the value of its node in the kvstore rehashing list must be updated.

sundb added 2 commits January 30, 2024 17:59

Increase threshold for defrag tests under 32-bit

2999136

Increase floating precision for allocator_frag_ratio in getRedisInfoS…

6f139f9

…tring()

sundb marked this pull request as ready for review February 1, 2024 13:29

sundb changed the title ~~Defrag 32bit test~~ Fix the failure of defrag test under 32-bit Feb 1, 2024

Merge branch 'unstable' into defrag_32bit_test

5ef5ba5

sundb added 3 commits February 9, 2024 12:08

Merge branch 'unstable' into defrag_32bit_test

73d6f9e

Defrag kvstore

3dad3fb

filter slot dict created but unused

e1a6a09

defrag dict struct

2af05fa

Revert test about 32bit

398cfe0

sundb added 7 commits February 9, 2024 19:11

Merge defrag dict struct and ht_table

36f7197

Revert "Merge defrag dict struct and ht_table"

aa28acc

This reverts commit 36f7197.

Move activeDefragAlloc ot outside of dictDefragTables

1b132ef

Revert code

910b35c

add comment

1942a45

Update comment about not to limit how to use kvstoreScanDefrag()

ad5f075

Change kvstoreDefragFunctions to kvstoreDefragFunctio

2f9fc35

oranagra reviewed Feb 10, 2024

View reviewed changes

src/kvstore.c Outdated Show resolved Hide resolved

Change method name and comment

559c76b

sundb added 2 commits February 11, 2024 15:41

Fix typo

d5214c8

Fix type

44a5e4a

oranagra added 2 commits February 11, 2024 15:05

dropping the unnecessary change in precision of frag_ratio info

1f35cfd

leaving pubsub defrag for another PR

20b2e15

oranagra approved these changes Feb 11, 2024

View reviewed changes

oranagra merged commit 676f27a into redis:unstable Feb 11, 2024

sundb deleted the defrag_32bit_test branch February 11, 2024 13:13

sundb mentioned this pull request Feb 18, 2024

Implement defragmentation for pubsub kvstore #13058

Merged

sundb mentioned this pull request Apr 22, 2024

Fix forget to update the dict's node in the kvstore's rehashing list after defragment #13231

Merged

Fix the failure of defrag test under 32-bit #13013

Fix the failure of defrag test under 32-bit #13013

Uh oh!

Conversation

sundb commented Jan 30, 2024 • edited by oranagra Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why defragment tests only failed under 32-bit

Why defragment tests only failed with cluster

Solution

Other change

Uh oh!

sundb commented Feb 1, 2024

Uh oh!

oranagra commented Feb 6, 2024

Uh oh!

sundb commented Feb 8, 2024

Uh oh!

oranagra commented Feb 8, 2024

Uh oh!

oranagra commented Feb 9, 2024

Uh oh!

sundb commented Feb 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sundb commented Feb 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra commented Feb 9, 2024

Uh oh!

oranagra commented Feb 9, 2024

Uh oh!

oranagra commented Feb 9, 2024

Uh oh!

sundb commented Feb 9, 2024

Uh oh!

Uh oh!

sundb commented Feb 11, 2024

Uh oh!

oranagra commented Feb 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sundb commented Jan 30, 2024 •

edited by oranagra

Loading

sundb commented Feb 9, 2024 •

edited

Loading

sundb commented Feb 9, 2024 •

edited

Loading