Skip to content

Conversation

@yoav-steinberg
Copy link
Contributor

@yoav-steinberg yoav-steinberg commented Oct 10, 2021

Upgraded to jemalloc 5.2.1 from 5.1.0. Cherry picked all relevant fixes (by diffing our 5.1.0 to upstream 5.10 and finding relevant commits). Details of what was done:

[cherry-picked] fd7d51c 2021-05-03 Resolve nonsense static analysis warnings (Oran Agra)
[cherry-picked] 448c435 2020-09-29 Fix compilation warnings in Lua and jemalloc dependencies (#7785) (YoongHM)
[skipped - already in upstream] 9216b96 2020-09-21 Fix compilation warning in jemalloc's malloc_vsnprintf (#7789) (YoongHM)
[cherry-picked] 88d71f4 2020-05-20 fix a rare active defrag edge case bug leading to stagnation (Oran Agra)
[skipped - already in upstream] 2fec7d9 2019-05-30 Jemalloc: Avoid blocking on background thread lock for stats.
[cherry-picked] 920158e 2018-07-11 Active defrag fixes for 32bit builds (again) (Oran Agra)
[cherry-picked] e8099ca 2018-06-26 add defrag hint support into jemalloc 5 (Oran Agra)
[re-done] 4e729fc 2018-05-24 Generate configure for Jemalloc. (antirez)

Additionally had to do this:
7727cc248 2021-10-10 Fix defrag to support sharded bins in arena (added in v5.2.1) (Yoav Steinberg)

When reviewing please look at all except the first commit which is just replacing 5.1.0 with 5.2.1 sources.
Also I think we should merge this without squashing to preserve the changes we did to to jemalloc.

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've reviewed all commits except for the first 3, and all modifications outside the jemalloc folder.

i've reviewed the jemalloc ChangeLog, and the only thing that i want to look into is this:

other than that, @yoav-steinberg please confirm that following the procedure you described you are able to upgrade jemalloc in the future without going though these cherry picks again.

oranagra and others added 4 commits October 12, 2021 12:55
There's a rare case which leads to stagnation in the defragger, causing
it to keep scanning the keyspace and do nothing (not moving any
allocation), this happens when all the allocator slabs of a certain bin
have the same % utilization, but the slab from which new allocations are
made have a lower utilization.

this commit fixes it by removing the current slab from the overall
average utilization of the bin, and also eliminate any precision loss in
the utilization calculation and move the decision about the defrag to
reside inside jemalloc.

and also add a test that consistently reproduce this issue.
- The argument `u` in for `ar` is ignored (and generates warnings since `D` became the default.
  All it does is avoid updating unchanged objects (shouldn't have any impact on our build)
- Enable `LUA_USE_MKSTEMP` to force the use of `mkstemp()` instead of `tmpname()` (which is dead
  code in redis anyway).
- Remove unused variable `c` in `f_parser()`
- Removed misleadingly indented space in `luaL_loadfile()` and ``addfield()`

Co-authored-by: Oran Agra <oran@redislabs.com>
@yoav-steinberg
Copy link
Contributor Author

yoav-steinberg commented Oct 12, 2021

i've reviewed the jemalloc ChangeLog, and the only thing that i want to look into is this:

* [jemalloc/jemalloc#1421](https://github.com/jemalloc/jemalloc/pull/1421) - maybe we can now undo #6145 ref: [jemalloc/jemalloc#521](https://github.com/jemalloc/jemalloc/issues/521) (change that was added in jemalloc 5.0)

The background jemalloc (#6145) thread does two things with unused pages:

  1. First do MADV_FREE (after muzzy state timeout)
  2. Then do MADV_DONTNEED and reclaim RSS (after dirty state timeout)

The new jemalloc essentially disables the muzzy state. But there might still be pages that are unused but no one called MADV_DONTNEED yet on them. In practice our tests showed that when freeing up lots of memory we still end up with some RSS which is dirty and not reclaimed by the OS if we disable the background thread (even though much less than in the previous jemalloc). So bottom line is that we still want the background thread.

other than that, @yoav-steinberg please confirm that following the procedure you described you are able to upgrade jemalloc in the future without going though these cherry picks again.

Confirming.

@oranagra oranagra added approval-needed Waiting for core team approval to be merged release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus labels Oct 12, 2021
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@redis/core-team please approve

@oranagra
Copy link
Member

oranagra commented Oct 18, 2021

triggered daily CI:
https://github.com/redis/redis/actions/runs/1353491321
another partial one on a branch with merge from recent unstable:
https://github.com/redis/redis/actions/runs/1353655888

@oranagra
Copy link
Member

looks like we have an issue with active-defrag on 32 bit

@oranagra
Copy link
Member

i've tested it locally and i can reproduce the problem.
the Active defrag edge case test was never very stable (it turns out i didn't succeed in solving the problem for which i added this test).
but now it seems that it consistently fails on 32bit with the new jemalloc, so we have to solve this (or disable the test) before merging.

@oranagra oranagra merged commit c4b4b6c into redis:unstable Oct 18, 2021
yoav-steinberg added a commit to yoav-steinberg/redis that referenced this pull request Oct 18, 2021
Test started failing consistently in 32bit builds after upgrading to jemalloc 5.2.1 (redis#9623).
oranagra pushed a commit that referenced this pull request Oct 18, 2021
Test started failing consistently in 32bit builds after upgrading to jemalloc 5.2.1 (#9623).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approval-needed Waiting for core team approval to be merged release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

6 participants