Skip to content

time: sim-time thread safety and move guard-dog fully into abstract time.#6369

Merged
jmarantz merged 15 commits intoenvoyproxy:masterfrom
jmarantz:guardog-timer
Mar 27, 2019
Merged

time: sim-time thread safety and move guard-dog fully into abstract time.#6369
jmarantz merged 15 commits intoenvoyproxy:masterfrom
jmarantz:guardog-timer

Conversation

@jmarantz
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz commented Mar 23, 2019

Description: This PR addresses a TODO left behind earlier in guarddog_impl.cc which was bypassing the time-source and working directly with real time. Instead we use timers and a dispatcher to run the guard dog.

There was a race previously in guarddog_impl_test.cc which had to be resolved to integrate this (inserting a sleep at the current impl: #6239 (comment)), and another race in the interlock mechanism due to a race between the test sleep()ing (moving time forward) and the impl reading time to compute misses and megamisses. This was resolved by incorporating time into the interlock mechanism.

Note: SimulatedTimeSystem had threading issues when mutating alarms on one thread while executing them on another. The thread-annotation needed overrides due to the challenges the compiler faces proving that time_system_->alarm_->time_system == time_system, and we want to use a single mutex for the time-system and its alarms.

Risk Level: medium -- guarddog runs all the time. OTOH a lot of test iterations show this appears to be robust both in real-time and system-time.
Testing: //test/...
Docs Changes: n/a
Release Notes: n/a

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
… wind up using.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
… but not in body).

Signed-off-by: Joshua Marantz <jmarantz@google.com>
…ia. Both work with tsan, runs_per_test=1000.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
…s comments.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
…ed case).

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
#include "server/guarddog_impl.h"

#include <chrono>
#include <iostream>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is removed already in a commit I haven't pushed yet because I don't want to waste CI :) Will push when addressing other comments.

// directly calling condvar waitFor works is that it doesn't advance
// simulated time, which the test is carefully controlling.
//
// One alternative approach that would be easier to test is to use a private
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this is exactly what I did in this PR.

@jmarantz jmarantz marked this pull request as ready for review March 25, 2019 12:15
@jmarantz
Copy link
Copy Markdown
Contributor Author

jmarantz commented Mar 25, 2019

Note: this PR does not resolve #6239 but my plan is to use the same pattern to address that one.

@jmarantz jmarantz changed the title WiP time: sim-time thread safety and move guard-dog fully into abstract time. time: sim-time thread safety and move guard-dog fully into abstract time. Mar 25, 2019
Signed-off-by: Joshua Marantz <jmarantz@google.com>
// advancement of time, and thus be more robust. Another variation would be
// to run this watchdog on the main-thread dispatcher, though such an approach
// could not detect when the main-thread was stuck.
exit_event_.waitFor(exit_lock_, loop_interval_); // NO_CHECK_FORMAT(real_time)
Copy link
Copy Markdown
Contributor Author

@jmarantz jmarantz Mar 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth pointing out here that this condvar-wait is entirely replaced by libevent in the new impl, via the use of a new private dispatcher owned by the guard-dog. We still need a condvar to avoid spinning over dispatcher-run while waiting for loop_interval_, but it's an untimed condvar wait, signaled by the firing of a timer.

It occurs to me that I could alternatively implement the Event::Scheduler interface with a new class CondvarScheduler which would the boil down to the exact same system-call we have here in the current version of guarddog_impl.cc. The benefit of this would be that we'd avoid carrying a dispatcher and a libevent base-ptr in the guarddog when there are no network or file-events to watch, simplifying the logic in operation. The cost is that implementing CondvarScheduler in general would add complexity to the PR, probably refactoring some of the SimulatedTimeSystem implementation of the alarm-set into something that would be used at runtime. As this functionality is already in libevent it seems simpler to re-use that.

Copy link
Copy Markdown
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this makes sense. Some minor comments but overall looks good

Signed-off-by: Joshua Marantz <jmarantz@google.com>
snowp
snowp previously approved these changes Mar 26, 2019
Copy link
Copy Markdown
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, before reviewing in detail I have a high level question.

/wait-any

…emove condvar and simplify code.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this is way easier to review now. :) Few small comments.

/wait

base_scheduler_.blockingLoop();
switch (type) {
case RunType::NonBlock:
base_scheduler_.runActivatedEvents();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on just passing the enum through to a run method on the scheduler? Seems like it would simplify things a bit in intermediate sites?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this simplified anything (moved a switch from one file to another) but I'm fine with it this way. One minor drawback is libevent_scheduler_lib now has to depend on the dispatcher interface, which feels a little awkward to me but doesn't cause any great problems.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you prefer it the other way nbd.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@jmarantz
Copy link
Copy Markdown
Contributor Author

I'm fine with it this way. Thanks for the review and pointing out the great simplification using libevent's new feature. The 1-hour timer also would've been a good hack but the new feature is nicer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants