Conversation
08cd1f1 to
4da2fe7
Compare
|
I'll keep working on the fast path improvements for the next few days. Will come back and re-open this next week. |
|
Reopening this as I'm switching back to this part. |
36e48e0 to
63b9457
Compare
26370ce to
a17e515
Compare
a17e515 to
e3020dc
Compare
|
Still missing stats, tests and documentation. But the design is pretty much complete. I feel we should discuss a bit before going further. Split into two commits:
Any input? Or should I proceed with the remaining details (adding stats, tests and doc)? |
e3020dc to
e510258
Compare
|
Still updating. Will ping again when ready for review. |
432011d to
f83f2e4
Compare
|
@jasone : this should be ready for more reviews. Some changes regarding the feedback: 1) I simplified the interval search algorithm to try 1 point at a time, instead of 3; and 2) I ended up not abbreviate "background_thread". Other feedback was all addressed accordingly. Also, I squashed the 2 commits into 1. From checking the purging progress and intervals / predictions locally with t-test1, I believe this is working as expected. Still missing tests, doc and some stats I'd like to add. |
f83f2e4 to
d843493
Compare
c0ad80e to
3dbbeec
Compare
|
Having not yet reviewed the latest version, I don't understand why this approach should be vulnerable to inactive arenas ending up with unpurged memory. In our earlier discussion we agreed that if an application frees memory that generates unused dirty memory, and that dirty memory would potentially go unpurged for too long due to the background thread's scheduled wakeup time being too far in the future, we could signal the thread so it could recalculate and shorten its sleep time. Unless there were some threshold below which we don't bother signaling, we would never sleep forever with unpurged dirty memory. |
|
@jasone : this could only happen if epoch stops advancing due to inactivity. As long as we advance epoch, you are right we signal the threads when reaching the threshold. But in a scenario an arena frees a bunch of pages w/o advancing epoch, and no active calls afterwards, it will cause the issue. |
|
Ah, I see. I was suggesting that we do the check that decides whether to signal the background thread whenever we accrue unused dirty pages, rather than only at epoch advances. |
|
Yeah that would partly solve the problem. The issue is in that case, even if we wake up the thread, it won't really be able to "see" the newly freed pages, because the epoch hasn't passed yet so it's not recorded in the backlog. Also, checking the wakeup time of the background thread requires the thread mutex, I'm a bit concerned with taking the mutex eagerly on deallocation. |
src/background_thread.c
Outdated
| decay_purge_interval_search(decay, lb, ub, &next_lb, &next_ub); | ||
| lb = next_lb; | ||
| ub = next_ub; | ||
| assert(++n_search <= 5); |
There was a problem hiding this comment.
Can we make this generic, rather than hard-coded, as some function related to lg_floor(SMOOTHSTEP_NSTEPS) (and decay_time)?
src/background_thread.c
Outdated
| /* With the current setting, bisect 5 times reaches ~6 intervals. */ | ||
| unsigned n_search = 0; | ||
| size_t next_lb, next_ub; | ||
| while (ub > lb + 7) { |
There was a problem hiding this comment.
What is the intent of the magic number 7 here? If the intent is to terminate the binary search early to reduce computation, I'd argue that the termination condition should be based on how many elements fall within an acceptable time range, which differs according to both SMOOTHSTEP_NSTEPS and decay_time.
There was a problem hiding this comment.
Just updated to remove the arbitrary number here. Please see new comments for the changes.
|
|
||
| /* Check if we need to signal the background thread early. */ | ||
| void | ||
| background_thread_interval_check(tsdn_t *tsdn, arena_t *arena, |
There was a problem hiding this comment.
This function appears to operate as I'd expect, but IMO it should be called whenever the number of unused dirty pages accrued since the last epoch advance or signal reaches a new high water mark.
|
Can we make the wakeup time atomic, so that no mutex acquisition is necessary? Re: the current epoch being in progress, for generality we should consider any associated unused dirty pages (i.e. the newly created ones) as being at the beginning of their decay. As long as we treat them as such, we never have the potential for sleeping forever with unpurged pages. |
|
re: dirty pages now treated as in current epoch. This sounds good to me. re: the mutex -- we can make the wakeup time atomic, but may still got the race condition where we read a non-indefinite sleep time and decide not to wakeup, while the background thread was in the progress of deciding to sleep forever. The synchronization feels pretty vulnerable w/ only atomics. (right now it need both decay and thread mutexes). Also we could miss the signal w/o holding the mutex while sending. Do you think we should still check for background threads interval eagerly? My concern is either mutex overhead, or racy logic w/o mutexes. |
|
I think we can try this: So this handles the inactive issue lock-free, and does locking when necessary. Does this sound reasonable to you @jasone ? |
abcfa8f to
14f2961
Compare
|
@jasone: just updated as we discussed. Added a atomic boolean to indicate whether there is a wakeup scheduled (wakeup time is 64-bit, which would require 64-bit atomic if we switch that). This is added in a separate commit "Check for background thread inactivity on extents_dalloc." Now I believe the inactivity issue is taken care of: either the application thread will see that the thread is sleeping for ever, or the background thread will see the dirty pages updated by the application threads. |
jasone
left a comment
There was a problem hiding this comment.
I think this is ready to go other than addressing the comments I made earlier today in arena_decay_compute_purge_interval_impl(). Thanks for putting a ton of effort into polishing this, even after some of my design suggestions were total dead ends.
| background_thread_wakeup_time_set(tsdn_t *tsdn, background_thread_info_t *info, | ||
| uint64_t wakeup_time) { | ||
| malloc_mutex_assert_owner(tsdn, &info->mtx); | ||
| if (wakeup_time == BACKGROUND_THREAD_INDEFINITE_SLEEP) { |
There was a problem hiding this comment.
Easier to read?:
atomic_store_b(&info->indefinite_sleep,
wakeup_time == BACKGROUND_THREAD_INDEFINITE_SLEEP, ATOMIC_RELEASE);
src/background_thread.c
Outdated
| if (info->npages_to_purge_new > BACKGROUND_THREAD_NPAGES_THRESHOLD) { | ||
| should_signal = true; | ||
| } else if (unlikely(background_thread_indefinite_sleep(info))) { | ||
| if (extents_npages_get(&arena->extents_dirty) > 0 || |
There was a problem hiding this comment.
Maybe combine this conditional with the enclosing one? I don't have strong feelings either way.
There was a problem hiding this comment.
I did start with this branch combined, but looked pretty strange and hard to interpret. Plus the background_thread_indefinite_sleep is kind of more special, so I feel the current way is a bit better.
|
Thanks for the quick response and all the feedback @jasone! Yeah I'm still tweaking the search algorithm in |
93033e0 to
38c58d4
Compare
|
@jasone : Just updated the diff again. Main changes --
|
38c58d4 to
2648ed0
Compare
src/background_thread.c
Outdated
|
|
||
| size_t lb = BACKGROUND_THREAD_MIN_INTERVAL_NS / decay_interval_ns; | ||
| size_t ub = SMOOTHSTEP_NSTEPS; | ||
| /* Decay deadline offset is up to 2 decay intervals. */ |
There was a problem hiding this comment.
I don't understand this comment, nor the reason for setting lb to at least 2. Is this another case of a magic number that only makes sense under default conditions, or perhaps only for limited numbers of dirty pages?
There was a problem hiding this comment.
Oh this is because we set the epoch deadline to be 1 to 2 intervals here:
https://github.com/jemalloc/jemalloc/blob/dev/src/arena.c#L598
If we don't set it to minimal 2 intervals, we may not advance epoch by the next wakeup, in which case there won't be much useful work to do.
How about we change the comment to
/* Minimal 2 intervals to ensure reaching next epoch deadline. */
There was a problem hiding this comment.
Ahh, got it. The comment update looks good to me.
There was a problem hiding this comment.
Cool. I'll make a couple small tweaks and land this soon.
Added opt.background_thread to enable background threads, which handles purging currently. When enabled, decay ticks will not trigger purging (which will be left to the background threads). We limit the max number of threads to NCPUs. When percpu arena is enabled, set CPU affinity for the background threads as well. The sleep interval of background threads is dynamic and determined by computing number of pages to purge in the future (based on backlog).
To avoid background threads sleeping forever with idle arenas, we eagerly check background threads' sleep time after extents_dalloc, and signal the thread if necessary.
2648ed0 to
5bd0bad
Compare
Added option "background_thread" to enable background async threads, which
handles purging currently.
This resolves #149.