Skip to content

Conversation

@yoav-steinberg
Copy link
Contributor

Test started failing consistently in 32bit builds after upgrading to jemalloc 5.2.1 (#9623).

Test started failing consistently in 32bit builds after upgrading to jemalloc 5.2.1 (redis#9623).
@oranagra oranagra merged commit 81095b1 into redis:unstable Oct 18, 2021
oranagra added a commit that referenced this pull request Nov 21, 2021
Background:
Following the upgrade to jemalloc 5.2, there was a test that used to be flaky and
started failing consistently (on 32bit), so we disabled it ​(see #9645).

This is a test that i introduced in #7289 when i attempted to solve a rare stagnation
problem, and it later turned out i failed to solve it, ans what's more i added a test that
caused it to be not so rare, and as i mentioned, now in jemalloc 5.2 it became consistent on 32bit.

Stagnation can happen when all the slabs of the bin are equally utilized, so the decision
to move an allocation from a relatively empty slab to a relatively full one, will never
happen, and in that test all the slabs are at 50% utilization, so the defragger could just
keep scanning the keyspace and not move anything.

What this PR changes:
* First, finally in jemalloc 5.2 we have the count of non-full slabs, so when we compare
  the utilization of the current slab, we can compare it to the average utilization of the non-full
  slabs in our bin, instead of the total average of our bin. this takes the full slabs out of the game,
  since they're not candidates for migration (neither source nor target).
* Secondly, We add some 12% (100/8) to the decision to defrag an allocation, this is the part
  that aims to avoid stagnation, and it's especially important since the above mentioned change
  can get us closer to stagnation.
* Thirdly, since jemalloc 5.2 adds sharded bins, we take into account all shards (something
  that's missing from the original PR that merged it), this isn't expected to make any difference
  since anyway there should be just one shard.

How this was benchmarked.
What i did was run the memefficiency test unit with `--verbose` and compare the defragger hits
and misses the tests reported.
At first, when i took into consideration only the non-full slabs, it got a lot worse (i got into
stagnation, or just got a lot of misses and a lot of hits), but when i added the 10% i got back
to results that were slightly better than the ones of the jemalloc 5.1 branch. i.e. full defragmentation
was achieved with fewer hits (relocations), and fewer misses (keyspace scans).
hwware pushed a commit to hwware/redis that referenced this pull request Dec 20, 2021
Background:
Following the upgrade to jemalloc 5.2, there was a test that used to be flaky and
started failing consistently (on 32bit), so we disabled it ​(see redis#9645).

This is a test that i introduced in redis#7289 when i attempted to solve a rare stagnation
problem, and it later turned out i failed to solve it, ans what's more i added a test that
caused it to be not so rare, and as i mentioned, now in jemalloc 5.2 it became consistent on 32bit.

Stagnation can happen when all the slabs of the bin are equally utilized, so the decision
to move an allocation from a relatively empty slab to a relatively full one, will never
happen, and in that test all the slabs are at 50% utilization, so the defragger could just
keep scanning the keyspace and not move anything.

What this PR changes:
* First, finally in jemalloc 5.2 we have the count of non-full slabs, so when we compare
  the utilization of the current slab, we can compare it to the average utilization of the non-full
  slabs in our bin, instead of the total average of our bin. this takes the full slabs out of the game,
  since they're not candidates for migration (neither source nor target).
* Secondly, We add some 12% (100/8) to the decision to defrag an allocation, this is the part
  that aims to avoid stagnation, and it's especially important since the above mentioned change
  can get us closer to stagnation.
* Thirdly, since jemalloc 5.2 adds sharded bins, we take into account all shards (something
  that's missing from the original PR that merged it), this isn't expected to make any difference
  since anyway there should be just one shard.

How this was benchmarked.
What i did was run the memefficiency test unit with `--verbose` and compare the defragger hits
and misses the tests reported.
At first, when i took into consideration only the non-full slabs, it got a lot worse (i got into
stagnation, or just got a lot of misses and a lot of hits), but when i added the 10% i got back
to results that were slightly better than the ones of the jemalloc 5.1 branch. i.e. full defragmentation
was achieved with fewer hits (relocations), and fewer misses (keyspace scans).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants