grpc: per-silo shared completion queues for Google gRPC client library. by htuch · Pull Request #2527 · envoyproxy/envoy

htuch · 2018-02-04T03:29:42Z

Previously, we had one thread per stream, which is terrible for
performance but simple. In this patch, a per-TLS thread completion
thread is setup by the AsyncClientManager and shared amongst all clients
(and streams). There is now somewhat complex cross-thread shared state
(including some narrow locked sections) to deal with stream shutdown
during inflight operations.

Risk Level: Low (only Google gRPC client impacted, not in prod use).
Testing: existing tests, with --runs_per_test=1000 for {grpc_client_integration_test,ads_integration_test,metrics_service_integration_Test}. Also added some new unit tests for Google gRPC client call creation in google_async_client_impl_test and client destruction in grpc_client_integration_test.

Signed-off-by: Harvey Tuch htuch@google.com

Previously, we had one thread per stream, which is terrible for performance but simple. In this patch, a per-TLS thread completion thread is setup by the AsyncClientManager and shared amongst all clients (and streams). There is now somewhat complex cross-thread shared state (including some narrow locked sections) to deal with stream shutdown during inflight operations. Risk Level: Low (only Google gRPC client impacted, not in prod use). Testing: existing tests only so far, will add new tests once design level issues worked through. Signed-off-by: Harvey Tuch <htuch@google.com>

htuch · 2018-02-04T03:33:49Z

@mattklein123 this is WiP, I need to sort out some teardown issues first (will consult @vjpai). Meanwhile, can you take a look at the locking that's now happening between handleOpCompletion() on the silo threads and completionThread and LMK how sad it makes you.

I would ideally like to have this lock-free, and we can probably do that with atomics (and probably memory barriers), but wanted to start with something that would allow us to reason about correctness easily.

BTW, I plan to add the missing gRPC coverage in this PR also when I get around to writing new tests once the basic approach is finalized.

htuch · 2018-02-04T03:43:28Z

source/common/grpc/google_async_client_impl.cc

+  ENVOY_LOG(debug, "completionThread running");
+  void* tag;
+  bool ok;
+  while (cq_.Next(&tag, &ok)) {


@vjpai this is the one thread per core (approximately) addition to the Envoy Google gRPC client. I've hit some issues that I'd like to get your thoughts on. As implemented, when I execute with --runs_per_test=1000, I get occasional crashes in cq_.Next().

Previously, each stream had its own completion thread, which was simple but slow. When the stream was destroyed, it would join the completion thread.

Now, we have one completion thread that is shared by many streams. When the stream is destroyed, it sets a draining_cq_ flag to prevent further completions and issues a TryCancel() on it client context. However, it seems that some in-flight ops for the stream may still be occurring in Next, since TryCancel is only best effort. These depend on stats in the stream's AsyncReaderWriter, hence we have use-after-free.

The underlying issue here is that when an op is issued via AsyncReaderWriter, it's unclear (to me at least) what will happen following a TryCancel(). Will we always receive a completion via Next (success or failure) for every pending read/write/final op? In that case, we could reference count pending ops per stream and solve this by deferring the stream deletion until this is zero. Anecdotally, looking at logs, it didn't seem that this is not the case, hence the current implementation, where we expect some in-flights ops to disappear, and some to come back as success/fail via Next.

Any clarification on the above would be helpful, as would recommendations on the best practices for destruction of stream objects that have their completion queue outlive them.

On the first issue, our contract is that every tag in leads to a tag out, so even if a ctx gets TryCancel'ed, all of its tags are still supposed to post on Next. If this isn't happening, please let us know so that we can call it a bug.

On the more general issue, I think I see an issue that I should have raised previously. For many classes of multithreaded code (essentially any code where the CQ thread could be different from the op-invoking thread), you shouldn't use stub_->Call . We really consider that a deprecated method since it is error prone. The problem is that that can cause things to start happening before rw_ is fully constructed. If a tag for this struct posts, then you might be in trouble interpreting this. Even a hero programmer (and I can tell you a name offline) found the need to use CVs to protect against this.

Instead, you should use rw_ = stub_->PrepareCall(....) with all the arguments except for the tag and then in the next line do rw_->StartCall(TAG).

mattklein123

LMK how sad it makes you.

This whole situation makes me very sad, but I'm not sure there is much we can do about that right now. Seems fine to start out with.

mattklein123 · 2018-02-05T18:05:08Z

source/common/grpc/async_client_manager_impl.cc

                                               ThreadLocal::Instance& tls)
-    : cm_(cm), tls_(tls) {}
+    : cm_(cm), tls_(tls) {
+#ifdef ENVOY_GOOGLE_GRPC


I think we should probably split into two different concrete implementations at this point. Not super crazy abotu the ifdefs and things like google_tls_slot, etc. I think as the logic gets more and more complicated it will be easier to reason about to just split them now?

The problem there is that we can dynamically switch between using the two different gRPC client libraries in the API. Right now, the AsyncClientManager hides this, you just feed it config. So, if we want to split implementation, we'd need to export out from ClusterManager a getter for both of them and we'll still need a function to figure out from the config which one to use. I can do this, but it feels a bit cleaner to encapsulate all the multi-client logic in the single module here.

Ah OK. Alright that's fine.

mattklein123 · 2018-02-05T18:14:50Z

source/common/grpc/google_async_client_impl.h

+  // posting, it might post a new completion after cleanup() is invoked. So, we
+  // need to share state here to allow the completion thread to avoid doing
+  // this.
+  bool draining_cq_{}; // Guarded by cq_lock.


Can we use the clang thread annotations potentially?

I'd like to do this. Do you know if the Clang Mac OS X issues that we previously hit are still a thing?

I think the only reason I had an issue was due to using C++14 R/W lock . I think you should probably be fine.

vjpai · 2018-02-05T20:52:07Z

source/common/grpc/google_async_client_impl.cc

+  ENVOY_LOG(debug, "completionThread running");
+  void* tag;
+  bool ok;
+  while (cq_.Next(&tag, &ok)) {


On the first issue, our contract is that every tag in leads to a tag out, so even if a ctx gets TryCancel'ed, all of its tags are still supposed to post on Next. If this isn't happening, please let us know so that we can call it a bug.

vjpai · 2018-02-05T20:58:50Z

source/common/grpc/google_async_client_impl.cc

+  ENVOY_LOG(debug, "completionThread running");
+  void* tag;
+  bool ok;
+  while (cq_.Next(&tag, &ok)) {


On the more general issue, I think I see an issue that I should have raised previously. For many classes of multithreaded code (essentially any code where the CQ thread could be different from the op-invoking thread), you shouldn't use stub_->Call . We really consider that a deprecated method since it is error prone. The problem is that that can cause things to start happening before rw_ is fully constructed. If a tag for this struct posts, then you might be in trouble interpreting this. Even a hero programmer (and I can tell you a name offline) found the need to use CVs to protect against this.

Instead, you should use rw_ = stub_->PrepareCall(....) with all the arguments except for the tag and then in the next line do rw_->StartCall(TAG).

Signed-off-by: Harvey Tuch <htuch@google.com>

htuch · 2018-02-08T16:24:10Z

@vjpai I've incorporated your feedback, PTAL for gRPC library sanity if you get a chance. Thanks.

htuch · 2018-02-08T16:29:23Z

source/common/grpc/google_async_client_impl.cc

+  // Note, we expect that we're not in a dispatcher loop here, and participating
+  // in global teardown.
+  if (!streams_.empty()) {
+    dispatcher_.run(Event::Dispatcher::RunType::NonBlock);


@mattklein123 I couldn't find a better way to do this off-the-to-of-my-head. Basically, we have this situation:

Per-silo TLS for GoogleAsyncClient is being torn down during server exit, as the slot gets blown away. We're on a given silo and issue resetStream() to the various streams still in existence.

Streams may have in-flight ops on their CQ thread that is buddy to the silo thread.

We shutdown the CQ thread and join it. There may still be in-flight completions that the CQ thread has posted back to the silo thread.

To avoid leaking memory (which gets picked up in tests) and bound lifetime of the stream cleanup, we need to execute some dispatcher events on the loop to cleanup up these inflight completions on the dispatcher.

The solution here is a bit weird, we let the dispatcher run a bit. However, this is likely happening after the usual dispatcher main loop finish for a worker, so seems like there might be dragons, and it's kind of special snowflake behavior.

We can maybe chat today when you visit Google a bit about this.

Harvey and I just talked about this IRL. I think the plan is to not use dispatcher post for this but to implement posting functionality directly in this code which will make cleanup easier.

Signed-off-by: Harvey Tuch <htuch@google.com>

htuch · 2018-02-09T03:36:57Z

@mattklein123 @vjpai this is now ready for final review.

htuch · 2018-02-09T05:29:19Z

@mattklein123 @zuercher FWIW, GUARDED_BY is still broken on OS X, see https://circleci.com/gh/envoyproxy/envoy/25147?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link. TF also had this issue, tensorflow/tensorflow#955, I don't think the warnings caused failures for them (?).

This reverts commit 0bd36f9. Signed-off-by: Harvey Tuch <htuch@google.com>

vjpai

Looks good from a gRPC usage pov. I am interested in the particular mechanism choice for delivering completed ops and will have to ping you offline for context on this; I found that quite interesting.

htuch · 2018-02-09T17:18:36Z

@dnoe can you take a pass on this?

dnoe · 2018-02-09T18:05:35Z

Will take a look.

zuercher · 2018-02-09T18:22:01Z

I think the reason GUARDED_BY didn't fail the TF builds is that they don't have -Werror set. So they got a lot of warnings.

I think if you had

LOCKABLE std::mutex completed_ops_lock_;

then GUARDED_BY should work. You'll also need to tag locks on the mutex with SCOPED_LOCKABLE, e.g.:

SCOPED_LOCKABLE std::unique_lock<std::mutex> lock(stream.completed_ops_lock_);

These could be simplified with some typedefs. for the mutex and unique_lock.

mattklein123 · 2018-02-09T18:23:21Z

Another option if we can't get it to work on OSX is to just have a private include for thread_guards.h which defines some things and then makes them nothing for OSX.

mattklein123

I didn't go through in a super high level of detail, but generally looks sane to me. I left a few random comments.

mattklein123 · 2018-02-09T18:45:16Z

source/common/grpc/google_async_client_impl.cc


-GoogleAsyncStreamImpl::~GoogleAsyncStreamImpl() { ASSERT(rw_ == nullptr); }
+GoogleAsyncStreamImpl::~GoogleAsyncStreamImpl() {
+  ENVOY_LOG(debug, "GoogleAsyncStreamImpl destruct");


nit: not sure if it's worth leaving this debug print here or not. Up to you.

It's been fairly useful when chasing down races, on balance I think it's better to leave it.

mattklein123 · 2018-02-09T18:49:23Z

source/common/grpc/google_async_client_impl.cc

-  // Ignore op completions while CQ is shutting down.
-  if (cq_shutdown_in_progress_) {
+void GoogleAsyncStreamImpl::onCompletedOp() {
+  std::unique_lock<std::mutex> lock(completed_ops_lock_);


Do you want to drop the lock when calling handleOpCompletion()? Or does it not matter that much for contention?

Yeah, good point, should release it.

mattklein123 · 2018-02-09T18:52:24Z

source/common/grpc/google_async_client_impl.cc

+    ENVOY_LOG(trace, "completionThread CQ event {} {}", op, ok);
+    std::unique_lock<std::mutex> lock(stream.completed_ops_lock_);
+    stream.completed_ops_.emplace_back(op, ok);
+    stream.dispatcher_.post([&stream] { stream.onCompletedOp(); });


For perf reasons, you might want to only post if the number of completed ops is going from 0 -> 1, though from below it seems like you only want to process 1 at a time. I'm not exactly sure why that is, but might be worth throwing in a TODO here for future perf follow ups. (You could just keep track of number of per silo pending events, and post into that queue at the thread level).

This was actually the original design. Unfortunately, if you batch process completed_ops_, rather than have a 1:1 onCompleteOp association with completed_ops_, you get into a race, where the streams may have processed the batched ops and then destroyed itself (when inflight ops hits zero), and then a dangling dispatcher post is executed and tries to access the freed stream.

Actually, if we maintain the invariant that there is at most one post pending when the queue is non-empty, I think this could be safe. Will see if this holds in a bit.

I think what's written here is safe, while not the most efficient. I'd prefer a TODO now rather than complicating this PR further. It would be easier to reason about the improved behavior separately.

Signed-off-by: Harvey Tuch <htuch@google.com>

htuch · 2018-02-09T21:28:36Z

@zuercher @mattklein123 thanks for the ideas, let's handle the thread annotation issue separately in #2571, since I don't want to conflate with this already large PR.

htuch · 2018-02-09T21:29:30Z

This is ready for another pass, I didn't remove locking of handleOpCompletion, since I wanted to ensure we atomically handle an entire post set of op completions before another one is possible, per the invariant.

dnoe · 2018-02-09T21:02:24Z

source/common/grpc/google_async_client_impl.cc

+  // This is also required to satisfy the contract that once Shutdown is called,
+  // streams no longer queue any additional tags.
+  for (auto it = streams_.begin(); it != streams_.end();) {
+    (*it++)->resetStream();


Any particular reason why you haven't written this as (*it)->resetStream() and done the iterator increment in the for loop params?

Added comment; it has to do with resetStream() performing a potential erase(), which would invalidate the current iterator.

Awesome. I've fixed at least one bug of that sort in Envoy, so I appreciate both that you got it right and the comment.

dnoe · 2018-02-09T21:13:40Z

source/common/grpc/google_async_client_impl.cc

+    ENVOY_LOG(trace, "completionThread CQ event {} {}", op, ok);
+    std::unique_lock<std::mutex> lock(stream.completed_ops_lock_);
+    stream.completed_ops_.emplace_back(op, ok);
+    stream.dispatcher_.post([&stream] { stream.onCompletedOp(); });


I think what's written here is safe, while not the most efficient. I'd prefer a TODO now rather than complicating this PR further. It would be easier to reason about the improved behavior separately.

dnoe · 2018-02-09T21:28:04Z

source/common/grpc/google_async_client_impl.cc

+void GoogleAsyncStreamImpl::deferredDelete() {
+  ENVOY_LOG(debug, "Deferred delete");
+  tls_.unregisterStream(this);
+  dispatcher_.deferredDelete(std::unique_ptr<GoogleAsyncStreamImpl>(this));


std::unique_ptr<FooType>(this) looks really wrong. Are we confident nothing else can hold a pointer or reference to this object? A comment here would make it look less shocking.

There seem to be cases in the Envoy code base where there is already a unique_ptr<GoogleAsyncStreamImpl>. Am I missing something or couldn't this cause a double free?

Added comment; we only get here after cleanup(), which hands the object self-ownership of memory. After this method completes, nothing should be invoked on the stream, it's dead.

dnoe · 2018-02-09T21:37:08Z

source/common/grpc/google_async_client_impl.cc

+    // It's an invariant that there must only be one pending post for arbitrary
+    // length completed_ops_, otherwise we can race in stream destruction, where
+    // we process multiple events in onCompletedOps() but have only partially
+    // consumed the posts on the dispatcher.


The only problem I see with this approach is that it isn't compatible with limiting the number of completed ops processed in the dispatcher post before relinquishing control back to the dispatcher, which could lead to starving the silo thread from processing other events if there are a huge number of completed ops. I think that's OK for now.

Yeah, fair point. I've added a TODO, we can be smarter about bounding completion run sizes in the future if this does manifest as an issue.

dnoe · 2018-02-09T21:45:29Z

source/common/grpc/google_async_client_impl.h

  GoogleAsyncClientImpl& parent_;
+  GoogleAsyncClientThreadLocal& tls_;
+  // Latch our own version of this reference, so that completionThread() doesn't
+  // try and access via parent_, which might not exist in teardown.


During tear down is the dispatcher itself guaranteed to still be valid? What protects against this case?

Added comment; dispatchers outlive threads in the server loop.

dnoe · 2018-02-09T21:48:09Z

source/common/grpc/google_async_client_impl.h

+  uint32_t inflight_tags_{};
+  // Queue of completed (op, ok) passed from completionThread() to
+  // handleOpCompletion().
+  std::list<std::pair<GoogleAsyncTag::Operation, bool>> completed_ops_;


Did you consider an std::deque for this?

Signed-off-by: Harvey Tuch <htuch@google.com>

dnoe · 2018-02-12T14:32:15Z

source/common/grpc/google_async_client_impl.cc

+  // This is also required to satisfy the contract that once Shutdown is called,
+  // streams no longer queue any additional tags.
+  for (auto it = streams_.begin(); it != streams_.end();) {
+    (*it++)->resetStream();


Awesome. I've fixed at least one bug of that sort in Envoy, so I appreciate both that you got it right and the comment.

* add response flag * update stats plugin * comment and format * response flag * license

Follow-up on envoyproxy/envoy-mobile#2526 Some instances were missed. Signed-off-by: JP Simard <jp@jpsim.com>

htuch commented Feb 4, 2018

View reviewed changes

mattklein123 reviewed Feb 5, 2018

View reviewed changes

vjpai suggested changes Feb 5, 2018

View reviewed changes

htuch added 4 commits February 6, 2018 14:05

Merge remote-tracking branch 'upstream/master' into grpc-tls

8f207a3

Call -> PrepareCall/StartCall.

0229dd3

Signed-off-by: Harvey Tuch <htuch@google.com>

Tag-in, tag-out changes.

fa0d26b

Signed-off-by: Harvey Tuch <htuch@google.com>

Merge remote-tracking branch 'upstream/master' into grpc-tls

d315477

htuch commented Feb 8, 2018

View reviewed changes

htuch added 9 commits February 8, 2018 17:12

Merge remote-tracking branch 'upstream/master' into grpc-tls

85891dc

Independent completion queue for ops.

8908250

Signed-off-by: Harvey Tuch <htuch@google.com>

Remove vestigial dispatcher field.

29dc695

Signed-off-by: Harvey Tuch <htuch@google.com>

Fixup.

7bbde1f

Signed-off-by: Harvey Tuch <htuch@google.com>

TSAN search-and-destroy with --runs_per_test=1000.

01ad6c8

Signed-off-by: Harvey Tuch <htuch@google.com>

Unit tests for GoogleAsyncClientImpl stub call failure.

da43b13

Signed-off-by: Harvey Tuch <htuch@google.com>

Thread annotations.

0bd36f9

Signed-off-by: Harvey Tuch <htuch@google.com>

Fix --define google_grpc=disabled

6809cf1

Signed-off-by: Harvey Tuch <htuch@google.com>

Merge remote-tracking branch 'upstream/master' into grpc-tls

61fdd5c

Revert "Thread annotations."

c0d7169

This reverts commit 0bd36f9. Signed-off-by: Harvey Tuch <htuch@google.com>

vjpai approved these changes Feb 9, 2018

View reviewed changes

htuch assigned dnoe Feb 9, 2018

mattklein123 reviewed Feb 9, 2018

View reviewed changes

htuch added 2 commits February 9, 2018 16:05

Merge remote-tracking branch 'upstream/master' into grpc-tls

7c90914

Batch process ops, safely.

091bc4c

Signed-off-by: Harvey Tuch <htuch@google.com>

htuch mentioned this pull request Feb 9, 2018

Use absl thread annotations #2571

Closed

dnoe reviewed Feb 9, 2018

View reviewed changes

htuch added 3 commits February 11, 2018 05:29

Merge remote-tracking branch 'upstream/master' into grpc-tls

20cbe1b

Review feedback.

01a8f20

Signed-off-by: Harvey Tuch <htuch@google.com>

Merge remote-tracking branch 'upstream/master' into grpc-tls

5d3c28f

dnoe approved these changes Feb 12, 2018

View reviewed changes

htuch merged commit 104219c into envoyproxy:master Feb 12, 2018

htuch deleted the grpc-tls branch February 12, 2018 14:49

Shikugawa pushed a commit to Shikugawa/envoy that referenced this pull request Mar 28, 2020

Fill response flag for stats plugin (envoyproxy#2527)

c3fda9c

* add response flag * update stats plugin * comment and format * response flag * license

jpsim added a commit that referenced this pull request Nov 28, 2022

ci: fix Sonatype deploys (#2527)

a473fc0

Follow-up on envoyproxy/envoy-mobile#2526 Some instances were missed. Signed-off-by: JP Simard <jp@jpsim.com>

jpsim added a commit that referenced this pull request Nov 29, 2022

ci: fix Sonatype deploys (#2527)

a61afa0

Follow-up on envoyproxy/envoy-mobile#2526 Some instances were missed. Signed-off-by: JP Simard <jp@jpsim.com>

Conversation

htuch commented Feb 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

htuch commented Feb 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

htuch Feb 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

htuch commented Feb 8, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

htuch commented Feb 9, 2018

Uh oh!

htuch commented Feb 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vjpai left a comment

Choose a reason for hiding this comment

Uh oh!

htuch commented Feb 9, 2018

Uh oh!

dnoe commented Feb 9, 2018

Uh oh!

zuercher commented Feb 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattklein123 commented Feb 9, 2018

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

htuch Feb 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

htuch commented Feb 9, 2018

Uh oh!

htuch commented Feb 4, 2018 •

edited

Loading

htuch commented Feb 4, 2018 •

edited

Loading

htuch Feb 4, 2018 •

edited

Loading

htuch commented Feb 9, 2018 •

edited

Loading

zuercher commented Feb 9, 2018 •

edited

Loading

htuch Feb 9, 2018 •

edited

Loading

dnoe Feb 9, 2018 •

edited

Loading