Shutdown socket in some cases to prevent endless hang on blocking operation#26898
Shutdown socket in some cases to prevent endless hang on blocking operation#26898wfurt wants to merge 9 commits intodotnet:masterfrom
Conversation
|
@dotnet-bot test Outerloop Linux x64 Debug Build |
|
@dotnet-bot test innerloop Fedora24 debug |
|
@dotnet-bot test Outerloop Linux x64 Debug Build |
| } | ||
| } | ||
|
|
||
| private void TryWakeup(InnerSafeCloseSocket innerSocket) |
There was a problem hiding this comment.
nit: I would rename this to remove the 'Try' prefix since methods with that prefix have an expected pattern that doesn't match this. Comment applies to other files as well.
| } | ||
| } | ||
|
|
||
| public bool GetNonBlocking() { |
There was a problem hiding this comment.
nit: bracing style doesn't match other methods in file. should use Allman-style bracing.
|
|
||
| private void TryWakeup(InnerSafeCloseSocket innerSocket) | ||
| { | ||
| if ((AsyncContext == null || !AsyncContext.GetNonBlocking()) && innerSocket != null && !_underlyingHandleNonBlocking && !innerSocket.IsClosed && !innerSocket.IsInvalid) |
There was a problem hiding this comment.
This is a fairly complicated condition and it's not clear to me exactly what it's validating. Can you add a comment?
Also, are both the IsClosed and IsInvalid checks necessary?
There was a problem hiding this comment.
I'll check if I can simplify this. What I want is to find out if we have valid OS blocking socket. In all other cases I want to skip the shutdown and use existing behavior. There are existing test verifying we throw if dispose is called in middle of pending async operation. I did not want to change that behavior.
| { | ||
| if ((AsyncContext == null || !AsyncContext.GetNonBlocking()) && innerSocket != null && !_underlyingHandleNonBlocking && !innerSocket.IsClosed && !innerSocket.IsInvalid) | ||
| { | ||
| Interop.Sys.Shutdown(innerSocket, SocketShutdown.Receive); |
There was a problem hiding this comment.
Just Receive? Will this fix blocked sends/connects, or do those require shutting down the Send direction as well?
There was a problem hiding this comment.
I was thinking about it @stephentoub . But I did not want to interfere with writing logic. At least on Unix, close() would always finish writing data. Also the write would finish or fail even if that may take some some. I'm not sure if there is case when it would hang forever. I can revisit that if you think we should.
|
|
||
| private void TryWakeup(InnerSafeCloseSocket innerSocket) | ||
| { | ||
| return; |
There was a problem hiding this comment.
Nit: this isn't necessary; you can just remove the "return;" and leave the method empty. You might also add a comment indicating that it's empty on purpose, and explaining why
There was a problem hiding this comment.
I may need to revisit this. comments from the bugs suggest that Windows may have problems in single core configuration. I plan to test it but I was using this PR to get wide test coverage from CI run.
| Dispose(); | ||
| if (innerSocket != null) | ||
| { | ||
| TryWakeup(innerSocket); |
There was a problem hiding this comment.
What is the "Wait until it's safe" below referring to? I'm wondering if it's ok to do this here (or even required to do this here), or if it should move down to before the InnerReleaseHandle.
| // Blocking read. | ||
| try | ||
| { | ||
| client.Receive(buffer); |
There was a problem hiding this comment.
It'd be good to have tests for the other main blocking operations, e.g. Send, Connect, Accept, etc.
There was a problem hiding this comment.
yes, I was planning to. There is separate bug for accept. The problem is that the shutdown does not work for that on OSX (works ok on Linux) I'm still investigating.
| catch (ObjectDisposedException) | ||
| { | ||
| // Closed and disposed by parent thread. | ||
| return; |
There was a problem hiding this comment.
We don't still want to call client.Close in this case?
| clientThread.Start(); | ||
| Socket s = server.Accept(); | ||
| // Close client socket from parent thread. | ||
| client.Close(); |
There was a problem hiding this comment.
Why is the client.Close duplicated here as well?
There was a problem hiding this comment.
This is way how to force close from other thread. The clientThread should be blocked in receive() . This mimics example from #26034
|
I updated unit tests to cover both known issues. Note, that the shutdown() trick does not work on OSX for accept() and it fails with ENOTCONN. It works just fine on Linux. Also note I did not find reliable way how to detect that child thread is blocked. The documentation says Running = "The thread has been started, it is not blocked" but that does not seems to count cases when we are blocked on IO. I added Send/Receive sequence and that seems to be reasonably reliable in my tests (Linux & OSX) If timing does not work right, the thread will get exception that object was disposed. That could possibly return TestSkip one day. |
|
@dotnet-bot test Outerloop Linux x64 Debug Build |
|
@dotnet-bot test Outerloop Linux x64 Debug Build |
|
@wfurt on Linux, you can use |
I compared this with |
|
Thanks for the note @tmds. The shutdown should also flush any pending data from socket buffers. I'll experiment with AF_UNSPEC on OSX. I still did not figure out how to wake up blocking accept there. |
|
@wfurt what is left to do here? Any ETA? |
|
I'm not sure @karelz. Perkaps @stephentoub and @geoffkizer can take another look. |
|
Why are the tests failing? Once the test runs are clean, please work with @stephentoub to clarify his comment & to get final code review. |
| } | ||
| } | ||
|
|
||
| private void UnblockSocket(InnerSafeCloseSocket innerSocket) {} |
There was a problem hiding this comment.
Please add comments to this to explain why this method is a no-op on Windows.
There was a problem hiding this comment.
Would you see harm of calling shutdown() on Windows even if it is not needed @davidsh -> just for consistency reason? For now, I was trying to keep the change minimal as possible.
There was a problem hiding this comment.
Would you see harm of calling shutdown() on Windows even if it is not needed
We already have an explicit API that developers can call to do Shutdown(). I don't think it is a good idea to do this automatically.
I also don't understand why these changes are needed in this PR. It seems like it isn't clear as to the root cause.
There was a problem hiding this comment.
The difference is that on Unix calling read() or accept() will block calling thread until operation fails or succeeds -> possibly indefinitely. Unlike Async or non-blocking, there is no timeout associated with that call. That part seems to be working different way on Windows where we are really never blocked e.g. when there is request for cancelation we can stop pending operation from our socket code.
There was a problem hiding this comment.
So, if this is a platform difference in behavior (Windows vs. Linux), then you shouldn't add any code for Windows. But you should still add comments to this no-op method to explain. And the words above are probably what you need in your comment for source code.
|
I don't know why tests are failing @karelz. I get 404 for all the links. I can try to kick the tests again. |
|
@dotnet-bot test Outerloop Linux x64 Debug Build |
|
most test failed on authentication failures or for unknown reasons. Valid runs have failures unrelated to this pr. (like crypto) |
|
@dotnet-bot test Outerloop Linux x64 Debug Build |
|
@dotnet-bot test Linux x64 Release Build |
|
@wfurt, are we still pursuing this? |
|
Yes, I would like to. I need to investigate the CI failures. |
|
@dotnet-bot test Outerloop Linux x64 Debug Build |
|
rebased with current master |
|
@wfurt this is ancient PR. If it is not ready for merge, let's close it and reopen once you have time to finish it please. |
|
@dotnet-bot test Outerloop Linux x64 Debug Build |
|
I updated the tests and I did several hundreds runs with full parallel suite. |
|
@wfurt looks like hte only failures passed on rerun. |
| { | ||
| // This test verifies blocking behavior. Always run it as remote task | ||
| // to isolate any possible failures. | ||
| RemoteInvoke((address) => |
There was a problem hiding this comment.
Guess: failure could manifest as exception on threadpool thread?
There was a problem hiding this comment.
Isn't Task.Run going to take care of all of the exceptions softly? Could you give example (or simulation) of failure which we are afraid of?
There was a problem hiding this comment.
I wanted to be sure that if something goes wrong we do not block whole xunit suite @krwq.
RemoteProcess seems more resilient. (even if maybe harder to debug)
There was a problem hiding this comment.
I agree with @krwq here. If the test is failing, it doesn't really matter whether it is blocking the rest of xunit or not. Adding RemoteInvoke just seems to make the test more complicated.
|
@dotnet-bot test Outerloop OSX x64 Debug Build |
|
all the OSX failures are in test going to external echo server and seems unrelated to this PR. |
|
@wfurt please merge it today or close it. |
|
|
||
| private void UnblockSocket(InnerSafeCloseSocket innerSocket) | ||
| { | ||
| if ((AsyncContext == null || !AsyncContext.GetNonBlocking()) && innerSocket != null && !_underlyingHandleNonBlocking && !innerSocket.IsClosed && !innerSocket.IsInvalid) |
There was a problem hiding this comment.
I think we are doing all these tests to try to minimize the impact of this change. I.e. we want to only do the shutdown call in the case where we know it's hanging today. Correct?
I think it's fine to be cautious here, but we should add a comment explaining that these checks are basically just to be cautious (unless that's not true for some of them). If/when we revisit the socket close logic (and I hope we will do this at some point in 3.0), we will want to revisit this, and a comment will help explain the rationale behind this.
There was a problem hiding this comment.
yes. I can add comment with explanation. For Async/non-blocking we have no problem being stuck in system call.
| byte[] buffer = new byte[1]; | ||
| buffer[0] = Convert.ToByte('a'); | ||
|
|
||
| Thread clientThread = new Thread(() => |
There was a problem hiding this comment.
Use Task.Run instead of creating a new thread
|
@wfurt how far are we from merging? As I said earlier, if we can't finish it in couple of days, please close it for now. |
Fixes #22564
Fixes #26034
Use Shutdown() when closing on blocking sockets to prevent infinite blockage.