Skip to content

core: expose functionality for flushing stats manually#707

Merged
rebello95 merged 4 commits intomasterfrom
expose-flush
Mar 3, 2020
Merged

core: expose functionality for flushing stats manually#707
rebello95 merged 4 commits intomasterfrom
expose-flush

Conversation

@rebello95
Copy link
Copy Markdown
Contributor

@rebello95 rebello95 commented Feb 27, 2020

Adds functionality built on top of the upstream Envoy changes in envoyproxy/envoy#10097 which will allow the bridging layers to manually flush stats based on lifecycle changes in the mobile application.

Note: Stats must be flushed from the main Envoy thread, as validated by the following assertion failure that occurs when attempting to flush from another thread:

[2020-02-27 14:38:43.181][4716580][critical][assert] [external/envoy/source/common/thread_local/thread_local_impl.cc:190] assert failure: std::this_thread::get_id() == main_thread_id_.

Follow-up PRs will call these functions from Java/Objective-C.

Part of #573.

Signed-off-by: Michael Rebello me@michaelrebello.com

Adds functionality built on top of the upstream Envoy changes in envoyproxy/envoy#10097 which will allow the bridging layers to manually flush stats based on lifecycle changes in the mobile application.

Follow-up PRs will call these functions from Java/Objective-C.

Part of #573.

Signed-off-by: Michael Rebello <me@michaelrebello.com>
@rebello95 rebello95 requested a review from junr03 February 27, 2020 22:02
Signed-off-by: Michael Rebello <me@michaelrebello.com>
rebello95 added a commit that referenced this pull request Feb 27, 2020
Builds on #707 by consuming this interface to flush stats when the app is backgrounded or terminated. This will help ensure that stats are sent to the server when possible before the app potentially loses its in-memory stats cache.

Part of #573.

Signed-off-by: Michael Rebello <me@michaelrebello.com>
Copy link
Copy Markdown
Member

@junr03 junr03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. Just a couple comments. Also this is definitely a place where we are lacking tests :) will add to our list

Signed-off-by: Michael Rebello <me@michaelrebello.com>
@rebello95 rebello95 requested a review from junr03 February 28, 2020 01:06
@rebello95
Copy link
Copy Markdown
Contributor Author

Thanks for the review @junr03! Updated.

@rebello95 rebello95 requested a review from goaway March 2, 2020 22:25
Envoy::Server::ServerLifecycleNotifier::HandlePtr postinit_callback_handler_;
std::unique_ptr<Http::Dispatcher> http_dispatcher_;
std::unique_ptr<MainCommon> main_common_ GUARDED_BY(mutex_);
Server::Instance* server_;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I do think this will be racy. The raw pointer is set and read from different threads, so there's at least a window where threads calling flush_stats may not see the server even though it's been set. The only time I think it would crash though is potentially on teardown. However, we've spent some time chasing down shutdown crashes, so I don't know if we really want to add to that risk.

I would suggest leveraging the http_dispatcher and its mutex for accessing Envoy's event_dispatcher instead for now, since I think this is a risky pattern to promulgate without further protections. Additionally, I think we should limit the surface area of cross-thread interactions just to make it easier to reason about logic and improve safety.

Long-term I'd like to split the business logic of encoding/decoding HTTP requests and handling dispatch into two separate entities, instead of having a single HTTPDispatcher.

Copy link
Copy Markdown
Member

@junr03 junr03 Mar 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the long term vision of splitting concerns in the http::dispatcher.

And with the fact that we might not get a flush due to visibility ordering of the server pointer (although, not sure if we should be concerned about a stronger guarantee there?)

But I don't think I agree with tacking more on to the http dispatcher. We still need a reference to the Server::Instance somewhere because it is through that interface that we flush stats. I am not sure it makes sense to make the http::dispatcher do that. Because at that point we are really making the http::dispatcher be a kitchen sync.

I wholeheartedly agree with Additionally, I think we should limit the surface area of cross-thread interactions just to make it easier to reason about logic and improve safety. And I would like to think how we can do that but without just accumulating everything in the same class.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the kitchen sink argument and I covered a little bit more in a slack thread, but I think that routing things through the dispatcher right now will actually make it easier to port things over to a dedicated cross-thread dispatch mechanism later, since everything that needs to be ported in the same place.

I think my main two concerns remain:

  1. I think there are probably circumstances this can crash.
  2. I think there's risk in needing to track down unguarded multi-threaded access later when we do clean things up.

With respect to 2), right now every multi-threaded access does occur right now through a safe mechanism. Exceptional cases like preferred_network are atomic, and primary cases use the core mutexes inside the Dispatcher. This will be the first case of unguarded access to a raw pointer, which I think is something to be avoided in and of itself.

I'm a bit less worried about (temporary) cruft in the dispatcher, because everything we can clean up once dispatch is split out will be located in one place.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think the risks both perspectives cite are probably not that big. I still think there's a chance we could crash due to unsafe access here, but hey, also we might not. Since you're not here to discuss in person @junr03 you get to win this one by default. ;)

I'm going to file an issue about splitting out a generalized interface for cross-thread access, and this server_ ref can be flagged with that issue.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a good discussion to have 😃 I went ahead and linked the issue inline in the code so we can update after

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I disappeared, it was late here!

re: 1 and 2 above.

  1. I don't think there is a risk (famous last words) because of the a) weak_ptr semantics that you introduced @goaway, and b) the engine holding ownership of main_common, which in turn holds ownership of the server object. This means that as long as we are able to lock the engine, we will have a valid server pointer. Right?

  2. Yep, I think we have probably reached a point with out accesses that we need to straighten things out in a more digestible way. Thanks for filing http and threading: split Http::Dispatcher monolith into parts #720, looking forward to chatting about that when I am back.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for linking the issue inline @rebello95

Signed-off-by: Michael Rebello <me@michaelrebello.com>
@rebello95 rebello95 merged commit 9b25734 into master Mar 3, 2020
@rebello95 rebello95 deleted the expose-flush branch March 3, 2020 00:48
rebello95 added a commit that referenced this pull request Mar 3, 2020
Adds functionality built on top of the upstream Envoy changes in envoyproxy/envoy#10097 which will allow the bridging layers to manually flush stats based on lifecycle changes in the mobile application.

Note: Stats must be flushed from the main Envoy thread, as validated by the following assertion failure that occurs when attempting to flush from another thread:

```
[2020-02-27 14:38:43.181][4716580][critical][assert] [external/envoy/source/common/thread_local/thread_local_impl.cc:190] assert failure: std::this_thread::get_id() == main_thread_id_.
```

Follow-up PRs will call these functions from Java/Objective-C.

Part of #573.

Signed-off-by: Michael Rebello <me@michaelrebello.com>
rebello95 added a commit that referenced this pull request Mar 3, 2020
Adds functionality built on top of the upstream Envoy changes in envoyproxy/envoy#10097 which will allow the bridging layers to manually flush stats based on lifecycle changes in the mobile application.

Note: Stats must be flushed from the main Envoy thread, as validated by the following assertion failure that occurs when attempting to flush from another thread:

```
[2020-02-27 14:38:43.181][4716580][critical][assert] [external/envoy/source/common/thread_local/thread_local_impl.cc:190] assert failure: std::this_thread::get_id() == main_thread_id_.
```

Follow-up PRs will call these functions from Java/Objective-C.

Part of #573.

Signed-off-by: Michael Rebello <me@michaelrebello.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants