Skip to content

Implement opt.background_thread#714

Merged
interwq merged 6 commits intojemalloc:devfrom
interwq:background_threads
May 23, 2017
Merged

Implement opt.background_thread#714
interwq merged 6 commits intojemalloc:devfrom
interwq:background_threads

Conversation

@interwq
Copy link
Contributor

@interwq interwq commented Mar 24, 2017

Added option "background_thread" to enable background async threads, which
handles purging currently.

This resolves #149.

@interwq interwq force-pushed the background_threads branch 2 times, most recently from 08cd1f1 to 4da2fe7 Compare March 24, 2017 20:02
@interwq
Copy link
Contributor Author

interwq commented Mar 27, 2017

I will be leaving this for a while as this depends on #501.

I'll work on #701 and #598 first and come back to this later.

@interwq
Copy link
Contributor Author

interwq commented Apr 6, 2017

I'll keep working on the fast path improvements for the next few days. Will come back and re-open this next week.

@interwq interwq closed this Apr 6, 2017
@interwq
Copy link
Contributor Author

interwq commented Apr 13, 2017

Reopening this as I'm switching back to this part.

@interwq interwq reopened this Apr 13, 2017
@interwq interwq force-pushed the background_threads branch 2 times, most recently from 36e48e0 to 63b9457 Compare April 13, 2017 04:12
@interwq interwq force-pushed the background_threads branch 11 times, most recently from 26370ce to a17e515 Compare May 1, 2017 20:58
@interwq interwq changed the title [Not ready for review] Prototype of background worker threads. Implement opt.background_thread May 1, 2017
@interwq interwq force-pushed the background_threads branch from a17e515 to e3020dc Compare May 1, 2017 21:18
@interwq
Copy link
Contributor Author

interwq commented May 1, 2017

Still missing stats, tests and documentation. But the design is pretty much complete. I feel we should discuss a bit before going further.

Split into two commits:

  1. "First step of implementing background threads." includes the basic support for enabling background threads with fixed interval.
    The main thing worth mentioning here is the epoch used for synchronizing thread creation and lifetime, and support toggling on/off opt.background_thread (the mallctl is rw). A global epoch indicates state (even number means enabled), while each background thread updates its index epoch to the current global epoch at the beginning of its execution. Whenever a background thread detects a mismatch between the index epoch it holds and the global epoch, it terminates itself.

  2. "Implement dynamic interval for background threads."
    First change is we keep arena tick running (to advance epoch) even if background thread is enabled, but we do not attempt actual purging in the decay_tick case.
    The main change is, we read the backlog and computes how many dirty pages we expect to purge after the next N decay_intervals. We get the value of N through an optimized binary-search.

Any input? Or should I proceed with the remaining details (adding stats, tests and doc)?

@interwq interwq force-pushed the background_threads branch from e3020dc to e510258 Compare May 2, 2017 21:39
@interwq
Copy link
Contributor Author

interwq commented May 2, 2017

Still updating. Will ping again when ready for review.

@interwq interwq force-pushed the background_threads branch 2 times, most recently from 432011d to f83f2e4 Compare May 4, 2017 00:46
@interwq
Copy link
Contributor Author

interwq commented May 4, 2017

@jasone : this should be ready for more reviews. Some changes regarding the feedback: 1) I simplified the interval search algorithm to try 1 point at a time, instead of 3; and 2) I ended up not abbreviate "background_thread". Other feedback was all addressed accordingly. Also, I squashed the 2 commits into 1.

From checking the purging progress and intervals / predictions locally with t-test1, I believe this is working as expected.

Still missing tests, doc and some stats I'd like to add.

@interwq interwq force-pushed the background_threads branch from f83f2e4 to d843493 Compare May 4, 2017 00:58
@interwq interwq force-pushed the background_threads branch 3 times, most recently from c0ad80e to 3dbbeec Compare May 22, 2017 08:14
@jasone
Copy link
Member

jasone commented May 22, 2017

Having not yet reviewed the latest version, I don't understand why this approach should be vulnerable to inactive arenas ending up with unpurged memory. In our earlier discussion we agreed that if an application frees memory that generates unused dirty memory, and that dirty memory would potentially go unpurged for too long due to the background thread's scheduled wakeup time being too far in the future, we could signal the thread so it could recalculate and shorten its sleep time. Unless there were some threshold below which we don't bother signaling, we would never sleep forever with unpurged dirty memory.

@interwq
Copy link
Contributor Author

interwq commented May 22, 2017

@jasone : this could only happen if epoch stops advancing due to inactivity. As long as we advance epoch, you are right we signal the threads when reaching the threshold. But in a scenario an arena frees a bunch of pages w/o advancing epoch, and no active calls afterwards, it will cause the issue.

@jasone
Copy link
Member

jasone commented May 22, 2017

Ah, I see. I was suggesting that we do the check that decides whether to signal the background thread whenever we accrue unused dirty pages, rather than only at epoch advances.

@interwq
Copy link
Contributor Author

interwq commented May 22, 2017

Yeah that would partly solve the problem. The issue is in that case, even if we wake up the thread, it won't really be able to "see" the newly freed pages, because the epoch hasn't passed yet so it's not recorded in the backlog. Also, checking the wakeup time of the background thread requires the thread mutex, I'm a bit concerned with taking the mutex eagerly on deallocation.

decay_purge_interval_search(decay, lb, ub, &next_lb, &next_ub);
lb = next_lb;
ub = next_ub;
assert(++n_search <= 5);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this generic, rather than hard-coded, as some function related to lg_floor(SMOOTHSTEP_NSTEPS) (and decay_time)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/* With the current setting, bisect 5 times reaches ~6 intervals. */
unsigned n_search = 0;
size_t next_lb, next_ub;
while (ub > lb + 7) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the intent of the magic number 7 here? If the intent is to terminate the binary search early to reduce computation, I'd argue that the termination condition should be based on how many elements fall within an acceptable time range, which differs according to both SMOOTHSTEP_NSTEPS and decay_time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just updated to remove the arbitrary number here. Please see new comments for the changes.


/* Check if we need to signal the background thread early. */
void
background_thread_interval_check(tsdn_t *tsdn, arena_t *arena,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function appears to operate as I'd expect, but IMO it should be called whenever the number of unused dirty pages accrued since the last epoch advance or signal reaches a new high water mark.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

@jasone
Copy link
Member

jasone commented May 22, 2017

Can we make the wakeup time atomic, so that no mutex acquisition is necessary?

Re: the current epoch being in progress, for generality we should consider any associated unused dirty pages (i.e. the newly created ones) as being at the beginning of their decay. As long as we treat them as such, we never have the potential for sleeping forever with unpurged pages.

@interwq
Copy link
Contributor Author

interwq commented May 22, 2017

re: dirty pages now treated as in current epoch. This sounds good to me.

re: the mutex -- we can make the wakeup time atomic, but may still got the race condition where we read a non-indefinite sleep time and decide not to wakeup, while the background thread was in the progress of deciding to sleep forever. The synchronization feels pretty vulnerable w/ only atomics. (right now it need both decay and thread mutexes). Also we could miss the signal w/o holding the mutex while sending.

Do you think we should still check for background threads interval eagerly? My concern is either mutex overhead, or racy logic w/o mutexes.

@interwq
Copy link
Contributor Author

interwq commented May 22, 2017

I think we can try this:
Add a simpler version of the current interval check to be called eagerly, and this simple function does:

if (sleep_time == INDEFINITE) { 
    check_interval(...); // this one does the current full logic
}

So this handles the inactive issue lock-free, and does locking when necessary. Does this sound reasonable to you @jasone ?

@interwq interwq force-pushed the background_threads branch 3 times, most recently from abcfa8f to 14f2961 Compare May 23, 2017 03:37
@interwq
Copy link
Contributor Author

interwq commented May 23, 2017

@jasone: just updated as we discussed. Added a atomic boolean to indicate whether there is a wakeup scheduled (wakeup time is 64-bit, which would require 64-bit atomic if we switch that). This is added in a separate commit "Check for background thread inactivity on extents_dalloc."

Now I believe the inactivity issue is taken care of: either the application thread will see that the thread is sleeping for ever, or the background thread will see the dirty pages updated by the application threads.

Copy link
Member

@jasone jasone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready to go other than addressing the comments I made earlier today in arena_decay_compute_purge_interval_impl(). Thanks for putting a ton of effort into polishing this, even after some of my design suggestions were total dead ends.

background_thread_wakeup_time_set(tsdn_t *tsdn, background_thread_info_t *info,
uint64_t wakeup_time) {
malloc_mutex_assert_owner(tsdn, &info->mtx);
if (wakeup_time == BACKGROUND_THREAD_INDEFINITE_SLEEP) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easier to read?:

atomic_store_b(&info->indefinite_sleep,
    wakeup_time == BACKGROUND_THREAD_INDEFINITE_SLEEP, ATOMIC_RELEASE);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. Changed.

if (info->npages_to_purge_new > BACKGROUND_THREAD_NPAGES_THRESHOLD) {
should_signal = true;
} else if (unlikely(background_thread_indefinite_sleep(info))) {
if (extents_npages_get(&arena->extents_dirty) > 0 ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe combine this conditional with the enclosing one? I don't have strong feelings either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did start with this branch combined, but looked pretty strange and hard to interpret. Plus the background_thread_indefinite_sleep is kind of more special, so I feel the current way is a bit better.

@interwq
Copy link
Contributor Author

interwq commented May 23, 2017

Thanks for the quick response and all the feedback @jasone!

Yeah I'm still tweaking the search algorithm in arena_decay_compute_purge_interval_impl() to incorporate the feedback. I'll let you take another look after I update it there.

@interwq interwq force-pushed the background_threads branch 2 times, most recently from 93033e0 to 38c58d4 Compare May 23, 2017 07:16
@interwq
Copy link
Contributor Author

interwq commented May 23, 2017

@jasone : Just updated the diff again. Main changes --

  1. to address the feedback on arena_decay_compute_purge_interval_impl(): this function is redesigned and refactored. Please take a look again at the function (changed in the first commit). Now the search stopping condition is npurge_upper_bound - npurge_lower_bound < NPAGES_THRESHOLD.
  2. fixed arena_destory and arena_reset: now before doing the destructive work, we terminate the corresponding background thread (a single thread), and re-create afterwards. This seems to be the safest way, also avoids some locking trickery.
    Now all tests pass even with background_thread default on.

@interwq interwq force-pushed the background_threads branch from 38c58d4 to 2648ed0 Compare May 23, 2017 07:28

size_t lb = BACKGROUND_THREAD_MIN_INTERVAL_NS / decay_interval_ns;
size_t ub = SMOOTHSTEP_NSTEPS;
/* Decay deadline offset is up to 2 decay intervals. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment, nor the reason for setting lb to at least 2. Is this another case of a magic number that only makes sense under default conditions, or perhaps only for limited numbers of dirty pages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this is because we set the epoch deadline to be 1 to 2 intervals here:
https://github.com/jemalloc/jemalloc/blob/dev/src/arena.c#L598

If we don't set it to minimal 2 intervals, we may not advance epoch by the next wakeup, in which case there won't be much useful work to do.

How about we change the comment to

/* Minimal 2 intervals to ensure reaching next epoch deadline. */

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, got it. The comment update looks good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I'll make a couple small tweaks and land this soon.

interwq added 6 commits May 23, 2017 11:05
Added opt.background_thread to enable background threads, which handles purging
currently.  When enabled, decay ticks will not trigger purging (which will be
left to the background threads).  We limit the max number of threads to NCPUs.
When percpu arena is enabled, set CPU affinity for the background threads as
well.

The sleep interval of background threads is dynamic and determined by computing
number of pages to purge in the future (based on backlog).
To avoid background threads sleeping forever with idle arenas, we eagerly check
background threads' sleep time after extents_dalloc, and signal the thread if
necessary.
@interwq interwq force-pushed the background_threads branch from 2648ed0 to 5bd0bad Compare May 23, 2017 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement opt-in per arena background asynchronous worker threads.

3 participants