time: sim-time thread safety and move guard-dog fully into abstract time.#6369
time: sim-time thread safety and move guard-dog fully into abstract time.#6369jmarantz merged 15 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
… wind up using. Signed-off-by: Joshua Marantz <jmarantz@google.com>
… but not in body). Signed-off-by: Joshua Marantz <jmarantz@google.com>
…ia. Both work with tsan, runs_per_test=1000. Signed-off-by: Joshua Marantz <jmarantz@google.com>
…s comments. Signed-off-by: Joshua Marantz <jmarantz@google.com>
…ed case). Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
source/server/guarddog_impl.cc
Outdated
| #include "server/guarddog_impl.h" | ||
|
|
||
| #include <chrono> | ||
| #include <iostream> |
There was a problem hiding this comment.
this is removed already in a commit I haven't pushed yet because I don't want to waste CI :) Will push when addressing other comments.
| // directly calling condvar waitFor works is that it doesn't advance | ||
| // simulated time, which the test is carefully controlling. | ||
| // | ||
| // One alternative approach that would be easier to test is to use a private |
There was a problem hiding this comment.
note: this is exactly what I did in this PR.
|
Note: this PR does not resolve #6239 but my plan is to use the same pattern to address that one. |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
| // advancement of time, and thus be more robust. Another variation would be | ||
| // to run this watchdog on the main-thread dispatcher, though such an approach | ||
| // could not detect when the main-thread was stuck. | ||
| exit_event_.waitFor(exit_lock_, loop_interval_); // NO_CHECK_FORMAT(real_time) |
There was a problem hiding this comment.
It's probably worth pointing out here that this condvar-wait is entirely replaced by libevent in the new impl, via the use of a new private dispatcher owned by the guard-dog. We still need a condvar to avoid spinning over dispatcher-run while waiting for loop_interval_, but it's an untimed condvar wait, signaled by the firing of a timer.
It occurs to me that I could alternatively implement the Event::Scheduler interface with a new class CondvarScheduler which would the boil down to the exact same system-call we have here in the current version of guarddog_impl.cc. The benefit of this would be that we'd avoid carrying a dispatcher and a libevent base-ptr in the guarddog when there are no network or file-events to watch, simplifying the logic in operation. The cost is that implementing CondvarScheduler in general would add complexity to the PR, probably refactoring some of the SimulatedTimeSystem implementation of the alarm-set into something that would be used at runtime. As this functionality is already in libevent it seems simpler to re-use that.
snowp
left a comment
There was a problem hiding this comment.
Thanks, this makes sense. Some minor comments but overall looks good
Signed-off-by: Joshua Marantz <jmarantz@google.com>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks, before reviewing in detail I have a high level question.
/wait-any
…emove condvar and simplify code. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks this is way easier to review now. :) Few small comments.
/wait
| base_scheduler_.blockingLoop(); | ||
| switch (type) { | ||
| case RunType::NonBlock: | ||
| base_scheduler_.runActivatedEvents(); |
There was a problem hiding this comment.
Thoughts on just passing the enum through to a run method on the scheduler? Seems like it would simplify things a bit in intermediate sites?
There was a problem hiding this comment.
I'm not sure if this simplified anything (moved a switch from one file to another) but I'm fine with it this way. One minor drawback is libevent_scheduler_lib now has to depend on the dispatcher interface, which feels a little awkward to me but doesn't cause any great problems.
There was a problem hiding this comment.
If you prefer it the other way nbd.
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
I'm fine with it this way. Thanks for the review and pointing out the great simplification using libevent's new feature. The 1-hour timer also would've been a good hack but the new feature is nicer. |
Description: This PR addresses a TODO left behind earlier in guarddog_impl.cc which was bypassing the time-source and working directly with real time. Instead we use timers and a dispatcher to run the guard dog.
There was a race previously in guarddog_impl_test.cc which had to be resolved to integrate this (inserting a sleep at the current impl: #6239 (comment)), and another race in the interlock mechanism due to a race between the test sleep()ing (moving time forward) and the impl reading time to compute misses and megamisses. This was resolved by incorporating time into the interlock mechanism.
Note: SimulatedTimeSystem had threading issues when mutating alarms on one thread while executing them on another. The thread-annotation needed overrides due to the challenges the compiler faces proving that time_system_->alarm_->time_system == time_system, and we want to use a single mutex for the time-system and its alarms.
Risk Level: medium -- guarddog runs all the time. OTOH a lot of test iterations show this appears to be robust both in real-time and system-time.
Testing: //test/...
Docs Changes: n/a
Release Notes: n/a