Await all pending activity in testConnectAndDisconnect#40037
Merged
DaveCTurner merged 4 commits intoelastic:masterfrom Mar 15, 2019
Merged
Await all pending activity in testConnectAndDisconnect#40037DaveCTurner merged 4 commits intoelastic:masterfrom
DaveCTurner merged 4 commits intoelastic:masterfrom
Conversation
It looks like this test can finish while there are still asynchronous connection attempts in flight, causing test failures as reported in elastic#40030. This change ensures that the test waits for everything to settle down before the end of the test. NB I was unable to reproduce this failure, although it's affected quite a number of CI runs. This is my best guess. This also unmutes `testOnlyBlocksOnConnectionsToNewNodes`: the only failures of _this_ test that I could see were related to earlier failures. Closes elastic#40030.
Collaborator
|
Pinging @elastic/es-distributed |
We call `ensureConnections()` to undo the effects of a disruption. However, it is possible that one or more targets are currently CONNECTING and have been since the disruption was active, and that the connection attempt was thwarted by a concurrent disruption to the connection. If so, we cannot simply add our listener to the queue because it will be notified when this CONNECTING activity completes even though it was disrupted. We must therefore wait for all the current activity to finish and then go through and reconnect to any missing nodes. Closes elastic#40030.
henningandersen
approved these changes
Mar 14, 2019
Contributor
henningandersen
left a comment
There was a problem hiding this comment.
LGTM.
I added one small comment.
| } finally { | ||
| assertTrue(stopReconnecting.compareAndSet(false, true)); | ||
| reconnectionThread.join(); | ||
| ensureConnections(service); |
Contributor
There was a problem hiding this comment.
I would prefer to put ensureConnections outside the finally block to ensure that if the test fails, we see the original error rather than potentially an error from ensureConnections.
DaveCTurner
added a commit
that referenced
this pull request
Mar 15, 2019
We call `ensureConnections()` to undo the effects of a disruption. However, it is possible that one or more targets are currently CONNECTING and have been since the disruption was active, and that the connection attempt was thwarted by a concurrent disruption to the connection. If so, we cannot simply add our listener to the queue because it will be notified when this CONNECTING activity completes even though it was disrupted. We must therefore wait for all the current activity to finish and then go through and reconnect to any missing nodes. Closes #40030.
Member
Author
|
Relates #39629 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
It looks like this test can finish while there are still asynchronous
connection attempts in flight, causing test failures as reported in #40030.
This change ensures that the test waits for everything to settle down before
the end of the test.
NB I was unable to reproduce this failure, although it's affected quite a
number of CI runs. This is my best guess.
This also unmutes
testOnlyBlocksOnConnectionsToNewNodes: the only failures ofthis test that I could see were related to earlier failures.
Closes #40030