test: track simulated timers per dispatcher to simplify thread interactions and locking.#12609
Conversation
… cross thread interactions. Signed-off-by: Antonio Vicente <avd@google.com>
…tcher thread to avoid data race found by tsan. Signed-off-by: Antonio Vicente <avd@google.com>
jmarantz
left a comment
There was a problem hiding this comment.
flushing comments as I'll be in meetings for quite a bit starting now.
This looks great so far.
|
|
||
| bool isEnabled(Alarm& alarm); | ||
| void enableAlarm(Alarm& alarm, const std::chrono::microseconds& duration); | ||
| void disableAlarm(Alarm& alarm) { |
There was a problem hiding this comment.
nit: ABSL_LOCKS_EXCLUDED(mutex_) here and similar APIs on this object.
|
|
||
| class AlarmSet { | ||
| public: | ||
| bool empty() { return alarms_.empty(); } |
| public: | ||
| bool empty() { return alarms_.empty(); } | ||
|
|
||
| const AlarmRegistration& next() { return *alarms_.begin(); } |
| } | ||
|
|
||
| private: | ||
| std::set<AlarmRegistration> alarms_; |
There was a problem hiding this comment.
nit (this was pre-existing): call this ordered_alarms_, just for clarity at use?
There was a problem hiding this comment.
changed to sorted_alarms_
Signed-off-by: Joshua Marantz <jmarantz@google.com>
jmarantz
left a comment
There was a problem hiding this comment.
flushing more comments.
| MonotonicTime current_monotonic_time_ ABSL_GUARDED_BY(mutex_); | ||
| SystemTime current_system_time_ ABSL_GUARDED_BY(mutex_); | ||
|
|
||
| MonotonicTime next_monotonic_time_ ABSL_GUARDED_BY(mutex_); |
There was a problem hiding this comment.
what this means is the timestamp of the next scheduled event on this scheduler? Add comment?
There was a problem hiding this comment.
I found that the two fields were not needed after all. Changed to just monotonic_time_ and system_time_
| if (armed_) { | ||
| disableTimer(); | ||
| if (!registered_alarms_.empty()) { | ||
| return registered_alarms_.next().time_; |
There was a problem hiding this comment.
is it possible that a registered but not-yet-triggered alarm will be earlier than a triggered alarm?
There was a problem hiding this comment.
Fairly sure the answer was no, but I'm removing this method since it is no longer needed after Matt's awesome WaitFor refactor.
| time_system_.decPendingLockHeld(); | ||
|
|
||
| if (duration.count() == 0) { | ||
| if (Runtime::runtimeFeatureEnabled( |
There was a problem hiding this comment.
should we cache this bool in the helper class? Or do we expect it to change during the course of a test method?
There was a problem hiding this comment.
I can see someone setting runtime features after creating the SimulatedScheduler but before creating the first alarm. Does it hurt to check the runtime features this often?
|
Given @antoniovicente is OOO for the rest of the week, let's continue in #12614. |
…r_dispatcher Signed-off-by: Antonio Vicente <avd@google.com>
…ving draft status from the PR. Signed-off-by: Antonio Vicente <avd@google.com>
Signed-off-by: Antonio Vicente <avd@google.com>
|
please fix format so CI runs. |
Signed-off-by: Antonio Vicente <avd@google.com>
| SchedulableCallbackPtr run_alarms_cb_; | ||
|
|
||
| absl::Mutex mutex_; | ||
| bool not_running_cbs_ ABSL_GUARDED_BY(mutex_) = true; |
There was a problem hiding this comment.
comment that this bool has negative sense to make it easier to use with absl::Condition.
There was a problem hiding this comment.
Changed implementation to avoid the double negatives.
| } | ||
|
|
||
| private: | ||
| std::set<AlarmRegistration> sorted_alarms_; |
There was a problem hiding this comment.
should we have thread annotations on this member var and the methods above? or if it is to be left unguarded due to a threading discipline can you call that out in comments?
There was a problem hiding this comment.
All synchronization for this object is external. See:
AlarmSet registered_alarms_ ABSL_GUARDED_BY(mutex_);
AlarmSet triggered_alarms_ ABSL_GUARDED_BY(mutex_);
Signed-off-by: Antonio Vicente <avd@google.com>
|
|
||
| envoy_cc_test( | ||
| name = "guarddog_impl_test", | ||
| size = "small", |
There was a problem hiding this comment.
Explanation of this change:
Before other changes in this PR this test occasionally times out due to flaky alarm behavior as seen in #12638
This test usually runs in about 5s to 10s in standard confirgurations, and 20 seconds under clang-tsan. It may make sense to reduce the configured timeout for this test to small which has an associated timeout of 60 secs in order to get faster feedback when this test fails due to timeout.
| } | ||
|
|
||
| private: | ||
| std::set<AlarmRegistration> sorted_alarms_; |
There was a problem hiding this comment.
All synchronization for this object is external. See:
AlarmSet registered_alarms_ ABSL_GUARDED_BY(mutex_);
AlarmSet triggered_alarms_ ABSL_GUARDED_BY(mutex_);
| SchedulableCallbackPtr run_alarms_cb_; | ||
|
|
||
| absl::Mutex mutex_; | ||
| bool not_running_cbs_ ABSL_GUARDED_BY(mutex_) = true; |
There was a problem hiding this comment.
Changed implementation to avoid the double negatives.
mattklein123
left a comment
There was a problem hiding this comment.
Awesome, this makes sense at a high level. I didn't analyze all of the locking, etc. deeply. One API thought for consideration. Thank you for working on this!
/wait
| } | ||
|
|
||
| template <typename DurationType> void advanceTimeAndLoop(DurationType duration) { | ||
| time_system_.advanceTimeAsync(duration); |
There was a problem hiding this comment.
WDYT about having advanceTimeAsync take a dispatcher and whether to run it blocking or non-blocking? It seems like it would be almost always broken to not do this pattern when using this API? If there are some cases I guess the dispatcher could be optional in the API but it would at least make test writers think about it?
There was a problem hiding this comment.
@jmarantz @antoniovicente ping on this one. I see a new commit but I'm unsure of the plan here. If this is not needed we can merge but wanted to check.
/wait-any
There was a problem hiding this comment.
Sorry, was doing some research in order to provide a complete reply to your comment.
As you noticed I rolled back this file since the changes to it are no longer required once I moved to updating monotonic time directly instead of updating next_monotonic_time and propagating it to event loops next time they run.
The idea of taking a dispatcher on advanceTimeAsync is an interesting suggestion. I found that most uses of this function happen in cases where there is a single active Dispatcher, but there's 1 exception to this in ServerStatsTest.FlushStats. Still it makes sense to have a SimulatedTimeSystem function that moves time forward and then runs the dispatcher event loop. Preferences for doing that change in this PR or a followup cleanup?
There was a problem hiding this comment.
Preferences for doing that change in this PR or a followup cleanup?
Up to you, I'm fine either way. Thank you!
There was a problem hiding this comment.
I'll do the cleanup as a followup.
Signed-off-by: Antonio Vicente <avd@google.com>
Commit Message: test: track simulated timers per dispatcher to simplify thread interactions and locking.
Additional Description:
Risk Level: low, test-only
Testing: unit
Docs Changes:
Release Notes:
Issue #12585 #12638