Skip to content

Avoid blocking on channel close on network thread#25521

Merged
Tim-Brooks merged 6 commits intoelastic:masterfrom
Tim-Brooks:dispatch_disconnect_from_node
Jul 10, 2017
Merged

Avoid blocking on channel close on network thread#25521
Tim-Brooks merged 6 commits intoelastic:masterfrom
Tim-Brooks:dispatch_disconnect_from_node

Conversation

@Tim-Brooks
Copy link
Copy Markdown
Contributor

@Tim-Brooks Tim-Brooks commented Jul 3, 2017

Currently when we close a channel in Netty4Utils.closeChannels we
block until the closing is complete. This introduces the possibility
that a network selector thread will block while waiting until a
separate network selector thread closes a channel.

For instance: T1 closes channel 1 (which is assigned to a T1 selector).
Channel 1's close listener executes the closing of the node. That
means that T1 now tries to close channel 2. However, channel 2 is
assigned to a selector that is running on T2. T1 now must wait until T2
closes that channel at some point in the future.

This commit addresses this by adding a boolean to closeChannels
indicating if we should block on close. We only set this boolean to true
if we are closing down the server channels at shutdown. This call is
never made from a network thread. When we call the closeChannels method
with that boolean set to false, we do not block on close.

Currently when we close a channel in `Netty4Utils.closeChannels` we
block until the closing is complete. This introduces the possibility
that a network selector thread will block while waiting until a
separate network selector thread closes a channel.

For instance: T1 closes channel 1 (which is assigned to a T1 selector).
Channel 1's close listener executes the closing of the node. That
means that T1 now tries to close channel 2. However, channel 2 is
assigned to a selector that is running on T2. T1 now must wait until T2
closes that channel at some point in the future.

This commit addresses this by dispatching the "node disconnect" to the
generic thread pool.
@Tim-Brooks Tim-Brooks added :Distributed/Network Http and internode communication implementations >non-issue review v6.0.0 labels Jul 3, 2017
@Tim-Brooks Tim-Brooks requested review from jasontedor and s1monw July 3, 2017 15:02
@Tim-Brooks
Copy link
Copy Markdown
Contributor Author

Tim-Brooks commented Jul 3, 2017

We have some assertions that check we never dereference our futures on network threads. But obviously those protections don't catch netty futures.

Another approach would be to not "wait" on the future completion when closing channels. Just close and move on. We could attach a listener to log on exception which we do now when we wait. If we went with that approach, we would might need a different code path at shutdown where we do wait on the channel being closed (to ensure that everything is stopped).

@s1monw
Copy link
Copy Markdown
Contributor

s1monw commented Jul 5, 2017

what test triggered this behavior? I just removed this, IMO overengineered, behavior and we don't have any failures do we?

@s1monw
Copy link
Copy Markdown
Contributor

s1monw commented Jul 5, 2017

@tbrooks8 can we maybe instead try to not block on the close future in the implementations and instead register each future in a concurrent set such that we can wait for them when we close the transport? to maintain the set we can also register a listener and clean it up once it's closed? does this all make sense?

@Tim-Brooks
Copy link
Copy Markdown
Contributor Author

Yes. I can make some changes. I'll let you now when I'm ready for another review.

@Tim-Brooks
Copy link
Copy Markdown
Contributor Author

Tim-Brooks commented Jul 10, 2017

@s1monw This is ready for another review. Instead of managing a set, I just added a boolean indicating if we wanted the close to be synchronous. This is valuable because at shutdown, we want the closing of the server sockets to be complete before the method returns.

In terms of the non-server channels at shutdown, both the Netty4Transport and NioTransport independently ensure that all the channels are closed during shutdown (by blocking on futures).

Copy link
Copy Markdown
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great solution! LGTM

* thread.
*
* @param channels the channels to close
* @param synchronous whether the channels should be closed synchronously
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a matter of taste so I will leave it to you my suggestion is to call it blocking instead. but again up to you!

@Tim-Brooks Tim-Brooks changed the title Dispatch to generic pool for node disconnect Avoid blocking on channel close on network thread Jul 10, 2017
@Tim-Brooks Tim-Brooks merged commit b22bbf9 into elastic:master Jul 10, 2017
bleskes added a commit that referenced this pull request Jul 15, 2017
…ubble up and disconnect the node

#25521 changed channel closing to be handled async on anything but transport stop. This means it may take a while before
calling `connection.close()` and the node being removed from the `connectedNodes` list (but the connection is immediately unusuable).

Fixes #25686
jasontedor pushed a commit that referenced this pull request Aug 29, 2017
Currently when we close a channel in Netty4Utils.closeChannels we
block until the closing is complete. This introduces the possibility
that a network selector thread will block while waiting until a
separate network selector thread closes a channel.

For instance: T1 closes channel 1 (which is assigned to a T1 selector).
Channel 1's close listener executes the closing of the node. That
means that T1 now tries to close channel 2. However, channel 2 is
assigned to a selector that is running on T2. T1 now must wait until T2
closes that channel at some point in the future.

This commit addresses this by adding a boolean to closeChannels
indicating if we should block on close. We only set this boolean to true
if we are closing down the server channels at shutdown. This call is
never made from a network thread. When we call the closeChannels method
with that boolean set to false, we do not block on close.
jasontedor added a commit that referenced this pull request Aug 29, 2017
This commit adapts the Netty 3 transport implementation to changes to
avoid blocking on channel close on a network thread. These changes were
backported from master yet not adapted for Netty 3 during the backport
so this handles that.

Relates #25521
jasontedor added a commit that referenced this pull request Aug 29, 2017
* 5.6:
  Adapt Netty 3 to avoid blocking on channel close
  Avoid blocking on channel close on network thread (#25521)
  Scripting: Deprecate scripts.max_compilation_per_minute setting (#26420)
  Revert "[Docs] Update Java Low-Level documentation to reflect shaded deps (#25882)"
  [DOCS] Updates 5.6.0 release notes
  [DOCS] Updates 5.5.3 release notes
  Revert "Scripting: Deprecate scripts.max_compilation_per_minute setting (#26402)"
  Scripting: Deprecate scripts.max_compilation_per_minute setting (#26402)
@Tim-Brooks Tim-Brooks deleted the dispatch_disconnect_from_node branch November 14, 2018 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Network Http and internode communication implementations >non-issue v6.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants