testing: fix multiple race conditions in simulated time tests#12527
testing: fix multiple race conditions in simulated time tests#12527mattklein123 merged 14 commits intomasterfrom
Conversation
|
cc @jmarantz @wrowe @sunjayBhatia This is not done and I'm still working through various issues but I wanted to let you see my current progress. I think the idea here is sound, however see my comment around Ref: https://github.com/envoyproxy/envoy/pull/12527/files#diff-f2c85459672519c620a47b880b9b0d20R317-R319 |
b8e85f7 to
3b90839
Compare
|
@jmarantz I'm about to quit for today but this passes except for 2 tests which don't compile which should not be difficult to fix. From a test perspective this is a pretty scary change, but overall I think this makes everything much simpler to reason about and cleans up a bunch of stuff. Feel free to start reviewing and helping me to fix things. |
|
Running TSAN and seeing some errors. None of them look too bad to fix so will work on that next. |
3b90839 to
9add2b7
Compare
|
No good deed goes unpunished. The TSAN issues are internal to abseil. I think they were recently fixed with: But now when I pull current abseil there are TSAN errors without any other changes: see abseil/abseil-cpp#760 |
9add2b7 to
3d947c9
Compare
|
@jmarantz this passes all tests for me locally now under fastbuild and tsan. There are some flakes that I have hit. It's unclear if they are new or if they are pre-existing and exacerbated by the alternate TSAN lock implementation we are now using. It will be better to merge the other PR with the abseil bump first and see how that goes. |
3d947c9 to
30c6ef4
Compare
30c6ef4 to
af6e1fd
Compare
Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
|
@jmarantz this is passing all tests on fastbuild and I think is ready for real review. I'm going to start looking for flakes. |
jmarantz
left a comment
There was a problem hiding this comment.
flushing comments; mostly nits
jmarantz
left a comment
There was a problem hiding this comment.
looks great; just a few nits, mostly about clarity and comments.
|
@jmarantz updated. Great suggestion about the time bounds class. Much cleaner! |
|
ARM flake is a known different issue #12638 |
jmarantz
left a comment
There was a problem hiding this comment.
Awesome, thank you for finally cleaning this up!
Up to you if you want to apply the syntactic tweaks or just leave that for next time.
| auto thread = Thread::threadFactoryForTest().createThread([this, &mutex, &done]() { | ||
| for (;;) { | ||
| { | ||
| absl::MutexLock lock(&mutex); |
There was a problem hiding this comment.
taste test, as this syntax works now (for you golang fans):
for (;;) {
if (absl::MutexLock lock(&mutex); done) {
return;
}
base_scheduler_.run(Dispatcher::RunType::Block);
}
Looking at this code is the first time it occurred to me to use it in C++.
There was a problem hiding this comment.
Oh yeah that's good. I will fix that in a follow up. I want to get this merged so we can see how we are doing with flakes.
| auto thread = Thread::threadFactoryForTest().createThread([this, &mutex, &done]() { | ||
| for (;;) { | ||
| { | ||
| absl::MutexLock lock(&mutex); |
There was a problem hiding this comment.
golang syntax here if you like.
| public: | ||
| template <class D> | ||
| RealTimeBound(const D& duration) | ||
| : end_time_(std::chrono::steady_clock::now() + duration) // NO_CHECK_FORMAT(real_time) |
There was a problem hiding this comment.
i'm guessing you were swayed to use this style by the convenience of not bothering to pass timeSystem() into the ctor here.
Regardless; this turned out really well. Thanks!
* master: (67 commits) logger: support log control in admin interface and command line option for Fancy Logger (envoyproxy#12369) test: fix http_timeout_integration_test flake (envoyproxy#12654) [fuzz]added an input check in writefilter fuzzer and added test cases (envoyproxy#12628) add 'explicit' restriction. (envoyproxy#12643) scoped_rds_integration_test migrate from api v2 to api v3. (envoyproxy#12633) fuzz: added fuzz test for listener filter tls_inspector (envoyproxy#12617) testing: fix multiple race conditions in simulated time tests (envoyproxy#12527) [tls] Move handshaking behavior into SslSocketInfo. (envoyproxy#12571) header: getting rid of exception-throwing behaviors in header files [the rest] (envoyproxy#12611) router: add new ratelimited retry backoff strategy (envoyproxy#12202) [redis_proxy] added a constraint for route.prefix().size() (envoyproxy#12637) network: add tcp listener backlog config (envoyproxy#12625) runtime: debug log that condition is always true when fractionalPercent numerator > denominator (envoyproxy#12068) WatchDog Extension hook (envoyproxy#12416) router: add dynamic metadata header formatter (envoyproxy#11858) statsd: revert visibility to public (envoyproxy#12621) Fix regression of /build_* in gitignore (envoyproxy#12630) Added a missing extension point to documentation. (envoyproxy#12620) Reverts proxy protocol test on windows (envoyproxy#12619) caching: Improved the tests and coverage of the CacheFilter tree (envoyproxy#12544) ... Signed-off-by: Michael Puncel <mpuncel@squareup.com>
This PR fixes multiple race conditions in tests. The summary is:
all time systems. This means that all network operations are now
"instantaneous" and makes all time advances for alarms explicit. This
required fixes in a few tests but should make simulated time much easier
to reason about.
Fixes #12480
Fixes #10568
Risk Level: None for prod code, high for tests
Testing: Existing and fixed tests
Docs Changes: N/A
Release Notes: N/A