Service: address back to back execution race by oschaaf · Pull Request #118 · envoyproxy/nighthawk

oschaaf · 2019-08-19T21:11:38Z

Address a race discovered via repeated parallel execution of
ServiceTest, BackToBackExecution by adding a short timeout
before concluding that there's still a previous future active and handling
the earlier test.

Theoretically it's still possible to run into this, but in practice I think this
is fair enough for actual usage, and it is sufficient to deflake the
test which detected it on my machine.

Thanks @htuch for discovering the issue

Signed-off-by: Otto van der Schaaf oschaaf@we-amp.com

@htuch

Address a race discovered via repeated parallel execution of `ServiceTest, BackToBackExecution` by adding a short timeout before concluding that there's still a previous future active and handling the earlier test. Theoretically it's still possible to run into this, but in practice I think this is fair enough for actual usage, and it is sufficient to deflake the test which detected it on my machine. Thanks @htuch for discovering the issue Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

source/client/service_impl.cc

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

oschaaf · 2019-08-19T22:06:56Z

Updated to eliminate the race altogether. Ready for another look!

/assign htuch

source/client/service_impl.cc

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

oschaaf

Ready for another pass
/assign htuch

source/client/service_impl.cc

htuch · 2019-08-20T15:21:09Z

source/client/service_impl.cc

      // ServiceTest.BackToBackExecution.
-      if (busy_) {
+      if (future_.valid() &&
+          future_.wait_for(std::chrono::seconds(0)) != std::future_status::ready && busy_) {


Looks good, thanks. One last question; why do we still need busy_? Isn't knowing that the future isn't ready sufficient to know that it's still in-flight?

I think the thing is that the future may have send the reply and still not be ready when the next request comes in. So we may even receive the next request before the future has torn down.
To illustrate: the print I added in the following diff will be seen in the out when reproducing the failure in parallell/repeated tests (note that the busy flag is no longer checked):

diff --git a/source/client/service_impl.cc b/source/client/service_impl.cc index fca8290..853cb77 100644 --- a/source/client/service_impl.cc +++ b/source/client/service_impl.cc @@ -75,7 +75,8 @@ void ServiceImpl::writeResponse(const nighthawk::client::ExecutionResponse& resp // future has progressed in a state where we can do another one. This avoids the odd flake in // ServiceTest.BackToBackExecution. if (future_.valid() && - future_.wait_for(std::chrono::seconds(0)) != std::future_status::ready && busy_) { + future_.wait_for(std::chrono::seconds(0)) != std::future_status::ready) { + std::cerr << "expected to not be busy" << std::endl; return finishGrpcStream(false, "Only a single benchmark session is allowed at a time."); } else { busy_ = true;

Yes, that seems entirely possible. I think busy_ is OK to unblock the import, but it's also super icky IMHO, since we're doing manual concurrency control via atomics. Ideally we would have something more like a giant select on the message loop, rather than a gRPC Read. Combining futures and gRPC APIs is kind of messy.Did we explore maybe using something like https://en.cppreference.com/w/cpp/thread/mutex/try_lock and a mutex?

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

…ack-execution-race-fix

- Uses mutexes and a condvar, removing the atomic bool. - Adds handling for declining concurrenct clients - Adds a test for concurrency clients Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

oschaaf · 2019-08-28T20:28:18Z

To tackle this in a clean way, I investigated the async api's for the grpc server, but unfortunately I wan't able to find any examples for that when it comes to bidirectional streaming.

So instead, I updated the approach here to rely on CondVar & BasicLockableMutex (with a few comments to describe workings and gotcha's).

Also added handling and a test for concurrent clients. This is ready for another look!

/assign htuch

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

source/client/service_impl.h

source/client/service_impl.cc

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

htuch

Looks good, but more complicated than it would be ideally, I really wish gRPC had better event loop or std::futures interop.

source/client/service_impl.cc

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

oschaaf · 2019-08-28T21:53:03Z

@htuch re: complexity
On idea: maybe someday we could use Envoy's grpc client implementation and run it on NH's main dispatcher to connect back to a new internal grpc service method.
That client would then be running on our own event loop, and the grpc service would lean up and just be passing messages back and forth.

oschaaf · 2019-08-28T21:53:21Z

Oh, also, this is ready for another look :)

htuch · 2019-08-28T22:34:40Z

source/client/service_impl.h

  Envoy::Thread::MutexBasicLockable log_lock_;
-  ::grpc::ServerReaderWriter<::nighthawk::client::ExecutionResponse,
-                             ::nighthawk::client::ExecutionRequest>* stream_;
+  static ::grpc::ServerReaderWriter<::nighthawk::client::ExecutionResponse,


Why is this one static?

It's static so we can assert when multiple service instances are created because of multiple clients connecting; we assert when we do so.. any suggestions?

htuch

LGTM except for the static stream. Re: using Envoy's gRPC implementation, that's a very definite option, you might want to file an issue. OTOH, if this works, we can leave as is for now.

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

htuch

Thanks!

oschaaf requested a review from htuch August 19, 2019 21:11

htuch reviewed Aug 19, 2019

View reviewed changes

source/client/service_impl.cc Outdated Show resolved Hide resolved

Eliminate the race

74b6ee6

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

repokitteh-read-only bot assigned htuch Aug 19, 2019

htuch mentioned this pull request Aug 19, 2019

Google import #119

Closed

4 tasks

htuch reviewed Aug 19, 2019

View reviewed changes

source/client/service_impl.cc Outdated Show resolved Hide resolved

Several enhancements

bf3ff83

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

oschaaf commented Aug 20, 2019

View reviewed changes

source/client/service_impl.cc Outdated Show resolved Hide resolved

htuch reviewed Aug 20, 2019

View reviewed changes

oschaaf added 6 commits August 21, 2019 00:34

Address an unfixed code path

19bc09f

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

Merge remote-tracking branch 'upstream/master' into service-back-to-b…

b55078a

…ack-execution-race-fix

Merge remote-tracking branch 'upstream/master' into service-back-to-b…

661db62

…ack-execution-race-fix

Concurrency handling improvements

206d9a0

- Uses mutexes and a condvar, removing the atomic bool. - Adds handling for declining concurrenct clients - Adds a test for concurrency clients Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

Clean up & update comments

7d38d2b

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

Clean up & add/improve descriptive comments

7cd6f82

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

Tweak comment

b669979

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

htuch reviewed Aug 28, 2019

View reviewed changes

source/client/service_impl.h Outdated Show resolved Hide resolved

source/client/service_impl.cc Outdated Show resolved Hide resolved

Back out global lock in favor of assert

39424f4

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

htuch reviewed Aug 28, 2019

View reviewed changes

source/client/service_impl.cc Outdated Show resolved Hide resolved

Scope busy_lock

dad61bf

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

htuch reviewed Aug 28, 2019

View reviewed changes

Back out staticness of stream & multi-instance detection

4740b76

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>

htuch approved these changes Aug 28, 2019

View reviewed changes

htuch merged commit 62fc3cf into envoyproxy:master Aug 29, 2019

Conversation

oschaaf commented Aug 19, 2019

Uh oh!

Uh oh!

oschaaf commented Aug 19, 2019

Uh oh!

Uh oh!

oschaaf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

htuch Aug 20, 2019

Choose a reason for hiding this comment

Uh oh!

oschaaf Aug 20, 2019

Choose a reason for hiding this comment

Uh oh!

htuch Aug 21, 2019

Choose a reason for hiding this comment

Uh oh!

oschaaf commented Aug 28, 2019

Uh oh!

Uh oh!

Uh oh!

htuch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

oschaaf commented Aug 28, 2019

Uh oh!

oschaaf commented Aug 28, 2019

Uh oh!

htuch Aug 28, 2019

Choose a reason for hiding this comment

Uh oh!

oschaaf Aug 28, 2019

Choose a reason for hiding this comment

Uh oh!

htuch left a comment

Choose a reason for hiding this comment

Uh oh!

htuch left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants