Skip to content

Conversation

@oranagra
Copy link
Member

@oranagra oranagra commented Jun 2, 2019

jemalloc 5 doesn't immediately release memory back to the OS, instead there's a decaying mechanism, which doesn't work when there's no traffic (no allocations).
this is most evident if there's no traffic after flushdb, the RSS will remain high.

  1. enable jemalloc background purging
  2. explicitly purge in flushdb
  3. bugfix in jemalloc in which a bg thread could have blocked our serverCron

p.s. i did many tests to check for regression (which is how i found 3 above).
when we use a non-async flushdb, it already blocks redis for so long, so purging (which is what would happen with jemalloc 4), doesn't matter.

with flushdb async, after populating a large db, and running traffic, i couldn't observe any performance regression.
and the most i've seen jemalloc thread use is some 50% CPU, while redis and the lazy free thread, each ate 99%.

oranagra added 3 commits June 2, 2019 15:27
Background threads may run for a long time, especially when the # of dirty pages
is high.  Avoid blocking stats calls because of this (which may cause latency
spikes).

see jemalloc/jemalloc#1502

cherry picked from commit 1a71533511027dbe3f9d989659efeec446915d6b
…thread

jemalloc 5 doesn't immediately release memory back to the OS, instead there's a decaying
mechanism, which doesn't work when there's no traffic (no allocations).
this is most evident if there's no traffic after flushdb, the RSS will remain high.

1) enable jemalloc background purging
2) explicitly purge in flushdb
It seeems that since I added the creation of the jemalloc thread redis
sometimes fails to start with the following error:

Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!

This seems to be due to a race bug in ld.so, in which TLS creation on the
thread, collide with dlopen.

Move the creation of BIO and jemalloc threads to after modules are loaded.

plus small bugfix when trying to disable the jemalloc thread at runtime
@antirez
Copy link
Contributor

antirez commented Oct 4, 2019

@oranagra thanks Oran, the PR does no longer apply but it looks good to me. I wanted to ask a few things.

  1. I see the fix to background_thread_stats_read(). You say that it blocks serverCron(), I wonder we call some jemalloc stats gathering function from serverCron()? Like zmalloc_get_allocator_info() and so forth.
  2. Do you know if they'll merge such fix into upstream Jemalloc?

I also wonder if it could be more safe to have the default configuration tuned to get less stats from the allocators and provide less memory information to the user, otherwise we may always be a bit at risk of introducing unexpected latencies.

Thanks.

@oranagra
Copy link
Member Author

oranagra commented Oct 4, 2019

@antirez the fix in jemalloc is not mine, i reported a problem to them, and they fixed it (already released in jemalloc 5.2.1), i just cherry picked this fix into our repo (commit comment has all the details).

The slowdown it caused to serverCron was when calling je_mallctl("epoch" in zmalloc_get_allocator_info.

i don't think it's possible to get less stats, the epoch thing computes them all, it's all or nothing.
it shouldn't cause any problems now AFAIK, it's just a bug they had with the new background thread mechanism.

you can argue that this background thread is largely unused and untested though, and that enabling it by default is risky. but i did some testing and benchmarks and it looks good to me. without it, as i mentioned, there are cases that won't return any memory to the OS.

@antirez
Copy link
Contributor

antirez commented Oct 4, 2019

@oranagra thanks Oran, I guess it's safe to merge. Please could you rebase? Thank you.

@antirez
Copy link
Contributor

antirez commented Oct 4, 2019

Oh actually the conflict is trivial, I can take care.

@oranagra
Copy link
Member Author

oranagra commented Oct 4, 2019

While you're at it, I noticed I forgot to trim the old purge code from object.c, please trim.. (or wait a few minutes, I'll rebase and update the PR)

@oranagra
Copy link
Member Author

oranagra commented Oct 4, 2019

i was already half way there.. so i pushed it now...

@antirez
Copy link
Contributor

antirez commented Oct 4, 2019 via email

@antirez antirez merged commit 14a9da0 into redis:unstable Oct 10, 2019
@antirez
Copy link
Contributor

antirez commented Oct 10, 2019

Thanks @oranagra, merged.

@soloestoy
Copy link
Contributor

Will this PR be backport to 5.0?

JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Dec 24, 2019
purge jemalloc after flush, and enable background purging thread
@oranagra oranagra deleted the jemalloc_purge_bg branch October 4, 2021 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants