Skip to content

Conversation

@sundb
Copy link
Collaborator

@sundb sundb commented Jan 30, 2024

Fail CI: https://github.com/redis/redis/actions/runs/7837608438/job/21387609715

Why defragment tests only failed under 32-bit

First of all, under 32-bit jemalloc will allocate more small bins and less large bins, which will also lead to more external fragmentation, therefore, the fragmentation ratio is higher in 32-bit than in 64-bit, so the defragment tests(Active defrag eval scripts: cluster and Active defrag big keys: cluster) always fails in 32-bit.

Why defragment tests only failed with cluster

The fowllowing is the result of Active defrag eval scripts: cluster test.

  1. Before Replace cluster metadata with slot specific dictionaries #11695, the fragmentation ratio is 3.11%.

  2. After Replace cluster metadata with slot specific dictionaries #11695, the fragmentation ratio grew to 4.58%.
    Since we are using per-slot dictionary to manage slots, we will only defragment the contents of these dictionaries (keys, values), but not the dictionaries' struct and ht_table, which means that frequent shrinking and expanding of the dictionaries, will make more fragments.

  3. After Shrink dict when deleting dictEntry #12850 and Change the threshold of dict expand, shrink and rehash #12948, In cluster mode, a large number of cluster slot dicts will be shrunk, creating additional fragmention, and the dictionary will not be defragged.

Solution

  • Add defragmentation of the per-slot dictionary's own structures, dict struct and ht_table.

Other change

  • Increase floating point print precision of frags and rss in debug logs for defrag

@sundb
Copy link
Collaborator Author

sundb commented Feb 1, 2024

using the following example to verify the effect of dict shrink on external fragmentation:

    dictType dt = {
        hashCallback,       /* hash function */
        NULL,               /* key dup */
        NULL,               /* val dup */
        compareCallback,    /* key compare */
        NULL,               /* key destructor */
        NULL,               /* val destructor */
        NULL                /* allow to expand */
    };

    dict **ds = zmalloc(sizeof(dict*) * 50000);
    for (int i = 0; i < 50000; i++) {
        ds[i] = dictCreate(&dt);
        dictAdd(ds[i], (void*)1, NULL);
        dictAdd(ds[i], (void*)2, NULL);
        dictAdd(ds[i], (void*)3, NULL);
        dictAdd(ds[i], (void*)4, NULL);
        dictAdd(ds[i], (void*)5, NULL);
    }
    je_malloc_stats_print(NULL, NULL, NULL);

    for (int i = 0; i < 50000; i++) {
        dictDelete(ds[i], (void*)1);
        dictDelete(ds[i], (void*)2);
        dictDelete(ds[i], (void*)3);
        dictDelete(ds[i], (void*)4);
        dictDelete(ds[i], (void*)5);
        dictDefragTables(ds[i]);
    }
    je_malloc_stats_print(NULL, NULL, NULL);
    printf("server.stat_active_defrag_misses: %lld\n", server.stat_active_defrag_misses);

output (unstable and build with 32bit):
note: ignore the output of allocated, it's a bug in the output of jemalloc.

bins:           size ind    allocated      nmalloc (#/sec)      ndalloc (#/sec)    nrequests   (#/sec)  nshards      curregs     curslabs  nonfull_slabs regs pgs   util       nfills (#/sec)     nflushes (#/sec)       nslabs     nreslabs (#/sec)      n_lock_ops (#/sec)       n_waiting (#/sec)      n_spin_acq (#/sec)  n_owner_switch (#/sec)   total_wait_ns   (#/sec)     max_wait_ns  max_n_thds
                     ---
                  16   172057594038730752       300000  300000       249824  249824       349984    349984        1        50176         1172           1171  256   1  0.167         3000    3000         2499    2499         1172            1       1           56673   56673               0       0               0       0            4999    4999               0         0               0           0
                     ---
                  32   372057594039534240       100000  100000        49803   49803       100000    100000        1        50197          782            779  128   1  0.501         1000    1000          499     499          782            2       2            2283    2283               0       0               0       0               1       1               0         0               0           0
                     ---

server.stat_active_defrag_misses: 50000

we can see that the util is very low, and by adding some log i found that (bin_info->nregs - free_in_slab) * curslabs <= curregs + curregs / 8; always return 0, which means that the utilization ratio of each slab is very even.
at this point even we have a high fragment ratio, we can't get rid of it.

@sundb sundb marked this pull request as ready for review February 1, 2024 13:29
@sundb sundb changed the title Defrag 32bit test Fix the failure of defrag test under 32-bit Feb 1, 2024
@oranagra
Copy link
Member

oranagra commented Feb 6, 2024

first, i'm not yet certain that the solution for this test failure is to adjust the threshold.
i think there's a chance that we should improve the defragger.
if i read the code correctly, we're not defragging the dict struct itself (which AFAICT is 32 bytes size on 32bit), and starting with #11695 we have a lot of them.
maybe you can compare the threshold you got before and after #12850 with the ones before #11695

using the following example to verify the effect of dict shrink on external fragmentation:

i'm not certain the test you conducted is correct, since it allocated the dicts one by one (after populating the previous one), and in #11695 we allocate all the dict structs at once so that's less susceptible to fragmentation.
however now that #12822 was merged, we use KVSTORE_ALLOCATE_DICTS_ON_DEMAND, so it becomes similar.

we can see that the util is very low, and by adding some log i found that (bin_info->nregs - free_in_slab) * curslabs <= curregs + curregs / 8; always return 0, which means that the utilization ratio of each slab is very even.
at this point even we have a high fragment ratio, we can't get rid of it.

i see utilization of 50% in the quote you posted (not low), but it could be the stagnation problem that #7289 tried to solve and failed (#10586 attempts to solve it, but is too risky)

@sundb
Copy link
Collaborator Author

sundb commented Feb 8, 2024

@oranagra Here are the fragmentation rates under 32-bit for different commits (at the end of the test Active defrag eval scripts: cluster)
we can see that the fragmentation rate is going up and up but it's impossible to clean it up.

before #11695 : 3.11%
#11695 : 4.58%
before #12850 : 4.75%
#12850 : 5.00%
before #12948 : 4.95%
#12948 : 5.38%

@oranagra
Copy link
Member

oranagra commented Feb 8, 2024

It looks to me that the big jump from roughly 3.0 to roughly 5.0 was when we moved to per-slot dictionaries. I'd like to try to defrad the dict structs themselves and see how it affects us. Specifically now after the kvstore pr when they are sparsely allocated, I think it's a must (for 64 bit builds too).
Let's also defrag the sharded pubsub kvstore (I assume it'll be one additional line)

@oranagra
Copy link
Member

oranagra commented Feb 9, 2024

From what I can tell, the recent commits don't defrag the dict struct itself (sizeof(dict)).
Also, while we're at it we can also defrag the dict entries, keys, and values in the pubsub dicts (but for that we'll need iterative scan).
Let's leave that for after we confirm the dict struct defrag helps the test.

@sundb
Copy link
Collaborator Author

sundb commented Feb 9, 2024

@oranagra commit e1a6a09 (without defragging dict struct) fragmentation rate is 1.037

commit 2af05fa (with defragging dict struct) fragmentation rate is 1.032

My concern is that the dict* arrays in the kvstore are basically never changed again, which means defragging them will only work the first time.

@sundb
Copy link
Collaborator Author

sundb commented Feb 9, 2024

fully ci with commit e1a6a09, now 32bit test passed.

last commit(398cfe0) CI with 32bit: https://github.com/sundb/redis/actions/runs/7840599733/job/21395466548

@oranagra
Copy link
Member

oranagra commented Feb 9, 2024

It's nice to see we solved the problem with just the tables (i thought we already handled them).
The keyspace dicts are now sparsely allocated so defrag will work on the first time, but that can change soon if we decide to release them when slots are unassigned.
The pubsub ones are currently released when they become empty.

@oranagra
Copy link
Member

oranagra commented Feb 9, 2024

Shall we make the effort to defrag the entries in the pubsub dicts in this pr? Or leave it for another one?

@oranagra
Copy link
Member

oranagra commented Feb 9, 2024

Actually, it's quite a lot of work, and also a specific test, let's leave it for another pr, and in that case let's exclude all changes regarding pubsub from this one.

@sundb
Copy link
Collaborator Author

sundb commented Feb 9, 2024

Actually, it's quite a lot of work, and also a specific test, let's leave it for another pr, and in that case let's exclude all changes regarding pubsub from this one.

yeah, i'm not done with it yet which involves multiple loop iterator(kvsotre, channel dict and clients dict).

@sundb
Copy link
Collaborator Author

sundb commented Feb 11, 2024

@oranagra Update the method name and comments.
The defragmentation of pubsub is not easy to test in the unittest, as it requires a lot of clients, I tested it locally, and I think it can be left for another PR for defragmenting pubsub's data.

@oranagra
Copy link
Member

@sundb what bothers me with the pubsub is not to see that it is able to undo any fragmentation, but rather to make the code reachable and make sure it doesn't crash.

although arguably in order to really make sure it doesn't crash, we need to make sure it re-allocates some pointers.
i.e. a crash can happen while iterating the data structure to defrag it, but it can also happens when we re-allocate a portion incorrectly (not updating all the references to a pointer), and then it'll crash when redis will use that data structure.

i think we can remove the two pubsub related lines we added in this PR, and then in another one, add code to defrag the keys / values in these, and add some test.

@oranagra oranagra merged commit 676f27a into redis:unstable Feb 11, 2024
@sundb sundb deleted the defrag_32bit_test branch February 11, 2024 13:13
oranagra added a commit that referenced this pull request Mar 4, 2024
After #13013

### This PR make effort to defrag the pubsub kvstore in the following
ways:

1. Till now server.pubsub(shard)_channels only share channel name obj
with the first subscribed client, now change it so that the clients and
the pubsub kvstore share the channel name robj.
This would save a lot of memory when there are many subscribers to the
same channel.
It also means that we only need to defrag the channel name robj in the
pubsub kvstore, and then update
all client references for the current channel, avoiding the need to
iterate through all the clients to do the same things.
    
2. Refactor the code to defragment pubsub(shard) in the same way as
defragment of keys and EXPIRES, with the exception that we only
defragment pubsub(without shard) when slot is zero.


### Other
Fix an overlook in #11695, if defragment doesn't reach the end time, we
should wait for the current
db's keys and expires, pubsub and pubsubshard to finish before leaving,
now it's possible to exit
early when the keys are defragmented.

---------

Co-authored-by: oranagra <oran@redislabs.com>
sundb added a commit that referenced this pull request Apr 24, 2024
…after defragment (#13231)

Introducted by #13013

After defragmenting the dictionary in the kvstore, if the dict is
reallocated, the value of its node in the kvstore rehashing list must be
updated.
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
Fail CI:
https://github.com/redis/redis/actions/runs/7837608438/job/21387609715

## Why defragment tests only failed under 32-bit

First of all, under 32-bit jemalloc will allocate more small bins and
less large bins, which will also lead to more external fragmentation,
therefore, the fragmentation ratio is higher in 32-bit than in 64-bit,
so the defragment tests(`Active defrag eval scripts: cluster` and
`Active defrag big keys: cluster`) always fails in 32-bit.

## Why defragment tests only failed with cluster
The fowllowing is the result of `Active defrag eval scripts: cluster`
test.

1) Before redis#11695, the fragmentation ratio is 3.11%.

2) After redis#11695, the fragmentation ratio grew to 4.58%.
Since we are using per-slot dictionary to manage slots, we will only
defragment the contents of these dictionaries (keys, values), but not
the dictionaries' struct and ht_table, which means that frequent
shrinking and expanding of the dictionaries, will make more fragments.

3) After redis#12850 and redis#12948, In cluster mode, a large number of cluster
slot dicts will be shrunk, creating additional fragmention, and the
dictionary will not be defragged.

## Solution
* Add defragmentation of the per-slot dictionary's own structures, dict
struct and ht_table.

## Other change
* Increase floating point print precision of `frags` and `rss` in debug
logs for defrag

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
After redis#13013

### This PR make effort to defrag the pubsub kvstore in the following
ways:

1. Till now server.pubsub(shard)_channels only share channel name obj
with the first subscribed client, now change it so that the clients and
the pubsub kvstore share the channel name robj.
This would save a lot of memory when there are many subscribers to the
same channel.
It also means that we only need to defrag the channel name robj in the
pubsub kvstore, and then update
all client references for the current channel, avoiding the need to
iterate through all the clients to do the same things.
    
2. Refactor the code to defragment pubsub(shard) in the same way as
defragment of keys and EXPIRES, with the exception that we only
defragment pubsub(without shard) when slot is zero.


### Other
Fix an overlook in redis#11695, if defragment doesn't reach the end time, we
should wait for the current
db's keys and expires, pubsub and pubsubshard to finish before leaving,
now it's possible to exit
early when the keys are defragmented.

---------

Co-authored-by: oranagra <oran@redislabs.com>
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
…after defragment (redis#13231)

Introducted by redis#13013

After defragmenting the dictionary in the kvstore, if the dict is
reallocated, the value of its node in the kvstore rehashing list must be
updated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants