Avoid blocking on channel close on network thread by Tim-Brooks · Pull Request #25521 · elastic/elasticsearch

Tim-Brooks · 2017-07-03T15:02:43Z

Currently when we close a channel in Netty4Utils.closeChannels we
block until the closing is complete. This introduces the possibility
that a network selector thread will block while waiting until a
separate network selector thread closes a channel.

For instance: T1 closes channel 1 (which is assigned to a T1 selector).
Channel 1's close listener executes the closing of the node. That
means that T1 now tries to close channel 2. However, channel 2 is
assigned to a selector that is running on T2. T1 now must wait until T2
closes that channel at some point in the future.

This commit addresses this by adding a boolean to closeChannels
indicating if we should block on close. We only set this boolean to true
if we are closing down the server channels at shutdown. This call is
never made from a network thread. When we call the closeChannels method
with that boolean set to false, we do not block on close.

Currently when we close a channel in `Netty4Utils.closeChannels` we block until the closing is complete. This introduces the possibility that a network selector thread will block while waiting until a separate network selector thread closes a channel. For instance: T1 closes channel 1 (which is assigned to a T1 selector). Channel 1's close listener executes the closing of the node. That means that T1 now tries to close channel 2. However, channel 2 is assigned to a selector that is running on T2. T1 now must wait until T2 closes that channel at some point in the future. This commit addresses this by dispatching the "node disconnect" to the generic thread pool.

Tim-Brooks · 2017-07-03T15:05:41Z

We have some assertions that check we never dereference our futures on network threads. But obviously those protections don't catch netty futures.

Another approach would be to not "wait" on the future completion when closing channels. Just close and move on. We could attach a listener to log on exception which we do now when we wait. If we went with that approach, we would might need a different code path at shutdown where we do wait on the channel being closed (to ensure that everything is stopped).

s1monw · 2017-07-05T09:37:39Z

what test triggered this behavior? I just removed this, IMO overengineered, behavior and we don't have any failures do we?

s1monw · 2017-07-05T18:04:38Z

@tbrooks8 can we maybe instead try to not block on the close future in the implementations and instead register each future in a concurrent set such that we can wait for them when we close the transport? to maintain the set we can also register a listener and clean it up once it's closed? does this all make sense?

Tim-Brooks · 2017-07-05T22:11:44Z

Yes. I can make some changes. I'll let you now when I'm ready for another review.

…ct_from_node

Tim-Brooks · 2017-07-10T02:43:11Z

@s1monw This is ready for another review. Instead of managing a set, I just added a boolean indicating if we wanted the close to be synchronous. This is valuable because at shutdown, we want the closing of the server sockets to be complete before the method returns.

In terms of the non-server channels at shutdown, both the Netty4Transport and NioTransport independently ensure that all the channels are closed during shutdown (by blocking on futures).

s1monw

great solution! LGTM

s1monw · 2017-07-10T07:48:31Z

core/src/main/java/org/elasticsearch/transport/TcpTransport.java

+     * thread.
+     *
+     * @param channels the channels to close
+     * @param synchronous whether the channels should be closed synchronously


it's a matter of taste so I will leave it to you my suggestion is to call it blocking instead. but again up to you!

…ubble up and disconnect the node #25521 changed channel closing to be handled async on anything but transport stop. This means it may take a while before calling `connection.close()` and the node being removed from the `connectedNodes` list (but the connection is immediately unusuable). Fixes #25686

Currently when we close a channel in Netty4Utils.closeChannels we block until the closing is complete. This introduces the possibility that a network selector thread will block while waiting until a separate network selector thread closes a channel. For instance: T1 closes channel 1 (which is assigned to a T1 selector). Channel 1's close listener executes the closing of the node. That means that T1 now tries to close channel 2. However, channel 2 is assigned to a selector that is running on T2. T1 now must wait until T2 closes that channel at some point in the future. This commit addresses this by adding a boolean to closeChannels indicating if we should block on close. We only set this boolean to true if we are closing down the server channels at shutdown. This call is never made from a network thread. When we call the closeChannels method with that boolean set to false, we do not block on close.

This commit adapts the Netty 3 transport implementation to changes to avoid blocking on channel close on a network thread. These changes were backported from master yet not adapted for Netty 3 during the backport so this handles that. Relates #25521

* 5.6: Adapt Netty 3 to avoid blocking on channel close Avoid blocking on channel close on network thread (#25521) Scripting: Deprecate scripts.max_compilation_per_minute setting (#26420) Revert "[Docs] Update Java Low-Level documentation to reflect shaded deps (#25882)" [DOCS] Updates 5.6.0 release notes [DOCS] Updates 5.5.3 release notes Revert "Scripting: Deprecate scripts.max_compilation_per_minute setting (#26402)" Scripting: Deprecate scripts.max_compilation_per_minute setting (#26402)

Tim-Brooks added :Distributed/Network Http and internode communication implementations >non-issue review v6.0.0 labels Jul 3, 2017

Tim-Brooks requested review from jasontedor and s1monw July 3, 2017 15:02

Close channels async

80dcadb

Tim-Brooks added 3 commits July 9, 2017 12:52

Merge remote-tracking branch 'upstream/master' into dispatch_disconne…

60fbb54

…ct_from_node

Add an indicator if closing should be asynchronous

8132919

Fix compile issue

a3fe1ff

s1monw approved these changes Jul 10, 2017

View reviewed changes

Rename parameter

b5a7876

Tim-Brooks changed the title ~~Dispatch to generic pool for node disconnect~~ Avoid blocking on channel close on network thread Jul 10, 2017

Tim-Brooks merged commit b22bbf9 into elastic:master Jul 10, 2017

colings86 added v6.0.0-beta1 and removed v6.0.0 labels Jul 31, 2017

Tim-Brooks deleted the dispatch_disconnect_from_node branch November 14, 2018 14:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid blocking on channel close on network thread#25521

Avoid blocking on channel close on network thread#25521
Tim-Brooks merged 6 commits intoelastic:masterfrom
Tim-Brooks:dispatch_disconnect_from_node

Tim-Brooks commented Jul 3, 2017 •

edited

Loading

Uh oh!

Tim-Brooks commented Jul 3, 2017 •

edited

Loading

Uh oh!

s1monw commented Jul 5, 2017

Uh oh!

s1monw commented Jul 5, 2017

Uh oh!

Tim-Brooks commented Jul 5, 2017

Uh oh!

Tim-Brooks commented Jul 10, 2017 •

edited

Loading

Uh oh!

s1monw left a comment

Uh oh!

s1monw Jul 10, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Tim-Brooks commented Jul 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tim-Brooks commented Jul 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

s1monw commented Jul 5, 2017

Uh oh!

s1monw commented Jul 5, 2017

Uh oh!

Tim-Brooks commented Jul 5, 2017

Uh oh!

Tim-Brooks commented Jul 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

s1monw left a comment

Choose a reason for hiding this comment

Uh oh!

s1monw Jul 10, 2017

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Tim-Brooks commented Jul 3, 2017 •

edited

Loading

Tim-Brooks commented Jul 3, 2017 •

edited

Loading

Tim-Brooks commented Jul 10, 2017 •

edited

Loading