Skip to content

Prevent idle workers in the work stealing policy#1047

Merged
robgjansen merged 3 commits intoshadow:masterfrom
robgjansen:event-sched-bug
Dec 11, 2020
Merged

Prevent idle workers in the work stealing policy#1047
robgjansen merged 3 commits intoshadow:masterfrom
robgjansen:event-sched-bug

Conversation

@robgjansen
Copy link
Copy Markdown
Member

@robgjansen robgjansen commented Dec 9, 2020

This fixes two main issues:

  • Ensures that the very first round is synchronized properly, by waiting until the workers finish setting up all of the host objects before advancing the master clock to scheduling round 1
  • Ensures that during every round, no thread can steal from another thread until the other thread has reset its "unprocessed" hosts queue

Closes #1046

@robgjansen robgjansen added Priority: Critical Major issue that requires immediate attention Type: Bug Error or flaw producing unexpected results Component: Main Composing the core Shadow executable Status: In Progress Work is currently underway labels Dec 9, 2020
@robgjansen robgjansen self-assigned this Dec 9, 2020
@robgjansen robgjansen marked this pull request as ready for review December 9, 2020 22:24
@robgjansen
Copy link
Copy Markdown
Member Author

I'm running a before/after Tor simulation on this to understand the performance implications, but the code could be reviewed in the meantime.

@robgjansen robgjansen requested a review from sporksmith December 9, 2020 22:25
Copy link
Copy Markdown
Contributor

@sporksmith sporksmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving since this looks correct AFAICT, though I have some suggestions for clarity/maintainability.

if (scheduler->policyType != SP_SERIAL_GLOBAL) {
/* when everyone finishes starting the hosts, the main thread can prepare
* round 1 */
countdownlatch_countDownAwait(scheduler->finishBarrier);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit confused both about reusing finishBarrier here, and why wait on prepareRoundBarrier below doesn't already accomplish what you want.

I see now that prepareRoundBarrier is actually waited at the end of a round in scheduler_continueNextRound. Would moving it to the start of a round solve the bug without having to (re)use another barrier here, as well as being more consistent with its name?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had first added a new barrier, but then decided to re-use the finishBarrier since it wasn't needed at the start. But I can see how that could lead to issues later, so I will go back to using a separate one.

The prepareRoundBarrier needs to be at the end, because the state that is updated while the workers are waiting at the barrier (i.e., the clock) is what the workers use to break out of their running while loop. If I moved prepareRoundBarrier to the beginning, then we would need an extra check and break statement---which is certainly possible but also more lines of code. (Perhaps transitionRoundBarrier is a better name.)

Ultimately, I'm not really trying to refactor this admittedly ugly code right now, just trying to minimize the changes needed to get to a correct state. That being said, I do want to rewrite the scheduler in rust (in the dev branch) at some point, because (i) I want to implement a continuous round scheduler to replace the current discrete round scheduler, and (ii) I think we can design a more efficient scheduler/scheduling policy now that we don't have to worry about migrating plugin thread-local storage anymore. But this is all for another month :)

g_rw_lock_reader_unlock(&data->lock);

/* make sure the workload has been updated for this round */
while (!__atomic_load_n(&stolenTdata->isStealable, __ATOMIC_ACQUIRE)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks correct, though could potentially spin more than we'd like, and in general atomic operations outside of implementing higher-level sync primitives is a bit obscure. Maybe a barrier could work instead of a raw atomic?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Or if spinning actually is what you want here, maybe a spin lock?)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a pthread_barrier will work here, because I want the bool functionality - I only want the threads waiting for it to turn true to wait, but not the thread that sets it to true. I think pthread_barrier requires all threads that decrement the count to wait until it reaches 0.

I was planning to use a CountDownLatch but then got excited when I realized I could do it with an atomic with hopefully less synchronization overhead. (The countdown latch uses both a mutex and a condition variable.) I'm using the atomic to prevent a race condition so I don't expect the wait will be very long, so I think spinning is OK.

pthread_spin_lock could work, but it's a bit kludgy. If we are using N workers then it's possible that N-1 of them reach this point. I just want the N-1 workers to check/wait for the boolean to become true. If I use a spin lock, I would need to use the unlocked state to represent true, meaning that the N-1 workers would acquire and immediately release the lock every time they want to check if they can steal from another worker. For N workers, this happens N^2 times per round. The atomic seems simpler.

Thoughts?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a pthread_barrier will work here
...
I was planning to use a CountDownLatch

Oops, yeah CountDownLatch was what I meant.

but then got excited when I realized I could do it with an atomic with hopefully less synchronization overhead. (The countdown latch uses both a mutex and a condition variable.) I'm using the atomic to prevent a race condition so I don't expect the wait will be very long, so I think spinning is OK.

I think the fast path without contention would be the ~same? But yeah, the main practical benefit would be avoiding a potential infinite spin. (re?)reading your comment, it sounds like spinning is probably ok.

pthread_spin_lock could work, but it's a bit kludgy

Hmm, I see what you mean.

Maybe a semaphore (or even Ryan's spinning semaphore) would be a more appropriate abstraction?

Anway, I'm fine with leaving as-is. It makes the code a little more opaque to directly use these low level atomic functions here, but it's not a big deal.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'm leaving as-is (Ryan's spinning semaphore is a great idea but only available in dev branch atm).

@robgjansen
Copy link
Copy Markdown
Member Author

robgjansen commented Dec 10, 2020

In a small 1% Tor network workload, Shadow run time is nearly identical before and after this PR (though maybe it would help more on other workloads like Ryan's cpu-based workloads).

run_time.pdf

@robgjansen robgjansen merged commit 21d5432 into shadow:master Dec 11, 2020
@robgjansen robgjansen deleted the event-sched-bug branch December 11, 2020 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: Main Composing the core Shadow executable Priority: Critical Major issue that requires immediate attention Status: In Progress Work is currently underway Type: Bug Error or flaw producing unexpected results

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug in work stealing scheduler policy causing idle workers

2 participants