comm/tcp listener: do not pass comm with failed handshake to comm_handler#4240
Conversation
fjetter
left a comment
There was a problem hiding this comment.
Thanks @jochen-ott-by for spotting this!
Will wait a bit to give other maintainers the possibility to chime in before I merge but I think this is a safe thing
| logger.info("Connection closed before handshake completed") | ||
| return |
There was a problem hiding this comment.
This looks good. BTW this has been added to the inproc handling of the handshake, already. A pity that this diverges
distributed/distributed/comm/inproc.py
Lines 266 to 270 in d5c8af6
|
Both py3.7/3.8 fail in |
They fail only on windows. Not sure how to debug this tbh; I don't have access to a Windows machine here and the tests run fine for me on my linux machine for python3.6 and python3.8. I'd need some advice here from someone more familiar with the test setup and the differences between Windows and Linux that may come in here. |
|
Restarting Windows CI to see if they fail differently next time It looks like |
|
Thank you for finding this @jochen-ott-by |
|
|
Indeed, they did: the first time, python3.7/python3.8 had errors while python3.6 was fine. The second run was the the exact opposite: python3.6 had errors (not test failure, only 1 error: "ERROR at teardown of test_get_task_stream_save"), python3.7/python3.8 were fine. So we've seen exactly one green build for each of python3.6/3.7/3.8 on windows. |
|
I tried restarting the GH actions one more time since I haven't seen the other breaking test anywhere, yet. However, the github security changes now break the builds. The fix is already on master. @jochen-ott-by can you please rebase once more? |
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks for your work on this @jochen-ott-by! Just a couple of small comments
|
|
||
| assert not handle_comm_called | ||
|
|
||
| writer.close() |
There was a problem hiding this comment.
We should also wait until writer has closed
| writer.close() | |
| writer.close() | |
| await writer.wait_closed() |
There was a problem hiding this comment.
I added it in the second commit, if the wait_closed method exists (it does not for python3.6).
| host, port = listener.get_host_port() | ||
| # connect without handshake: | ||
| reader, writer = await asyncio.open_connection(host=host, port=port) | ||
| await asyncio.sleep(0.02) |
There was a problem hiding this comment.
I'm curious what prompted the addition of this asyncio.sleep
There was a problem hiding this comment.
This ensures we actually hit the timeout of the handshake path on the server side. Without this, the test would be useless. I'll add a comment.
ad0e50d to
181ac92
Compare
Co-authored-by: Lucas Rademaker <44430780+lr4d@users.noreply.github.com>
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks for the fix @jochen-ott-by! And @fjetter @lr4d for reviewing
A return was missing here, which would pass on
commto the handler, even if the handshake had errors.