Avoid blocking on channel close on network thread#25521
Avoid blocking on channel close on network thread#25521Tim-Brooks merged 6 commits intoelastic:masterfrom
Conversation
Currently when we close a channel in `Netty4Utils.closeChannels` we block until the closing is complete. This introduces the possibility that a network selector thread will block while waiting until a separate network selector thread closes a channel. For instance: T1 closes channel 1 (which is assigned to a T1 selector). Channel 1's close listener executes the closing of the node. That means that T1 now tries to close channel 2. However, channel 2 is assigned to a selector that is running on T2. T1 now must wait until T2 closes that channel at some point in the future. This commit addresses this by dispatching the "node disconnect" to the generic thread pool.
|
We have some assertions that check we never dereference our futures on network threads. But obviously those protections don't catch netty futures. Another approach would be to not "wait" on the future completion when closing channels. Just close and move on. We could attach a listener to log on exception which we do now when we wait. If we went with that approach, we would might need a different code path at shutdown where we do wait on the channel being closed (to ensure that everything is stopped). |
|
what test triggered this behavior? I just removed this, IMO overengineered, behavior and we don't have any failures do we? |
|
@tbrooks8 can we maybe instead try to not block on the close future in the implementations and instead register each future in a concurrent set such that we can wait for them when we close the transport? to maintain the set we can also register a listener and clean it up once it's closed? does this all make sense? |
|
Yes. I can make some changes. I'll let you now when I'm ready for another review. |
|
@s1monw This is ready for another review. Instead of managing a set, I just added a boolean indicating if we wanted the close to be synchronous. This is valuable because at shutdown, we want the closing of the server sockets to be complete before the method returns. In terms of the non-server channels at shutdown, both the |
| * thread. | ||
| * | ||
| * @param channels the channels to close | ||
| * @param synchronous whether the channels should be closed synchronously |
There was a problem hiding this comment.
it's a matter of taste so I will leave it to you my suggestion is to call it blocking instead. but again up to you!
…ubble up and disconnect the node #25521 changed channel closing to be handled async on anything but transport stop. This means it may take a while before calling `connection.close()` and the node being removed from the `connectedNodes` list (but the connection is immediately unusuable). Fixes #25686
Currently when we close a channel in Netty4Utils.closeChannels we block until the closing is complete. This introduces the possibility that a network selector thread will block while waiting until a separate network selector thread closes a channel. For instance: T1 closes channel 1 (which is assigned to a T1 selector). Channel 1's close listener executes the closing of the node. That means that T1 now tries to close channel 2. However, channel 2 is assigned to a selector that is running on T2. T1 now must wait until T2 closes that channel at some point in the future. This commit addresses this by adding a boolean to closeChannels indicating if we should block on close. We only set this boolean to true if we are closing down the server channels at shutdown. This call is never made from a network thread. When we call the closeChannels method with that boolean set to false, we do not block on close.
This commit adapts the Netty 3 transport implementation to changes to avoid blocking on channel close on a network thread. These changes were backported from master yet not adapted for Netty 3 during the backport so this handles that. Relates #25521
* 5.6: Adapt Netty 3 to avoid blocking on channel close Avoid blocking on channel close on network thread (#25521) Scripting: Deprecate scripts.max_compilation_per_minute setting (#26420) Revert "[Docs] Update Java Low-Level documentation to reflect shaded deps (#25882)" [DOCS] Updates 5.6.0 release notes [DOCS] Updates 5.5.3 release notes Revert "Scripting: Deprecate scripts.max_compilation_per_minute setting (#26402)" Scripting: Deprecate scripts.max_compilation_per_minute setting (#26402)
Currently when we close a channel in Netty4Utils.closeChannels we
block until the closing is complete. This introduces the possibility
that a network selector thread will block while waiting until a
separate network selector thread closes a channel.
For instance: T1 closes channel 1 (which is assigned to a T1 selector).
Channel 1's close listener executes the closing of the node. That
means that T1 now tries to close channel 2. However, channel 2 is
assigned to a selector that is running on T2. T1 now must wait until T2
closes that channel at some point in the future.
This commit addresses this by adding a boolean to
closeChannelsindicating if we should block on close. We only set this boolean to true
if we are closing down the server channels at shutdown. This call is
never made from a network thread. When we call the closeChannels method
with that boolean set to false, we do not block on close.