NETWORKING: MockTransportService Wait for Close#35038
NETWORKING: MockTransportService Wait for Close#35038original-brownbear merged 8 commits intoelastic:masterfrom original-brownbear:34990
Conversation
* Make `MockTransportService` wait `30s` for close listeners to run before failing the assertion * Closes #34990
|
Pinging @elastic/es-distributed |
Tim-Brooks
left a comment
There was a problem hiding this comment.
Can we just do:
try {
ESTestCase.assertBusy(() -> {
assert openConnections.size() == 0 : "still open connections: " + openConnections;
});
} catch (AssertionFailure e) {
throw e;
} catch (Exception e) {
throw new IOException(e);
}
The wait / notify semantics are kind of tricky to handle (for example I think you should have the notify call be in a synchronized block).
You could also add an assertBusyUnchecked the takes a Runnable opposed to a CheckedRunnable if you want to avoid the weird exception handling
|
@tbrooks8 the We would also need to do: ESTestCase.assertBusy(() -> {
synchronized(openConnections) {
assert openConnections.size() == 0 : "still open connections: " + openConnections;
}
});here to release to lock on the |
|
I am not 100% convinced that this is the right fix. #34990 seems to affect freshly-opened connections, which to me makes it feel like a race condition between opening connections and closing the transport down, and waiting for some arbitrary timeout makes me a little uneasy. The synchronisation in Would it be sufficient to call |
|
@DaveCTurner I think you're 100% right, this def. another possible race. Your suggested sounds just fine too => added it in 7f5cd24 |
|
With the extra synchronisation, do we need the wait/notify as well? |
|
@DaveCTurner yea because the close callback may be executed in a different thread and hence could run after the assertion if we don't wait for the map to empty. |
|
Do you mean that the call to This doesn't look true - I can see a couple of places where we modify |
|
@DaveCTurner no this is my bad (sorry for the confusion). I mixed up the connection and the channel close. The connection close callback is invoked in the same thread that invokes. Simplified the PR in fb843f5. |
DaveCTurner
left a comment
There was a problem hiding this comment.
LGTM but I'd like @tbrooks8's opinion too.
|
@DaveCTurner I think the two concurrency spots in gives us the close lock for And for: we should be fine too. The iterator from a CHM should be consistent with concurrent removal from the CHM ( |
|
I think the issue might just be that the comment is misleading, as comments often are. Both of the calls to This line isn't under a lock by the way, it's in a handler that may not run until the lock is released: |
Sorry, you're right my eyes aren't so great today apparently :D
Fixed :) |
Can someone explain this hypothesis before I review this? I don't understand how this can lead to this issue. Even if the channel is closed before the listener added, the listener is still called. The listener is just executed by the thread that is adding the listener. So I guess my point is that the additional synchronization does not seem to add much here. I'm not opposed to it as this is just test code, I just want to make sure I understand the proposed issues at play here.
YESish. The assertions introduced in the As a separate note, I want to point out that discussions about the |
|
@tbrooks8
At least I did not know that, sorry about the confusion here.
Yea, if you're ok with it I'd vote to keep it. It makes the whole thing a lot easier to think through (IMO).
Ah, that makes perfect sense :) Thanks! (and sorry for the annoying back and forth here, but now it's all much clearer to me :)) => I added the wait back now in ce879da |
Sure...
The issue seems to be that we can:
The failing assertion
Indeed - I wasn't tracing things very accurately. |
Tim-Brooks
left a comment
There was a problem hiding this comment.
Okay. I think we are all agreement about the potential problems here and how the PR addresses them.
LGTM
|
@tbrooks8 thanks! |
MockTransportServicewait30sfor close listeners to run before failing the assertion