Raise when workers initialization times out#2784
Conversation
This changes Worker / Nanny startup to raise when they timeout. This bubbles up to the `dask-worker` CLI. Closes dask#2781
|
Here's the output Could maybe cleanup that second traceback a bit, but OK for now I hope. |
| logger.info("-" * 49) | ||
| while True: | ||
| if self.death_timeout and time() > start + self.death_timeout: | ||
| logger.exception( |
There was a problem hiding this comment.
I can't easily comment on it here, but down in https://github.com/dask/distributed/pull/2784/files#diff-3f78e3c9a9a81b6355c8e594dce0b8f3R766 we also may handle TimeoutError. In that case we logger.info and move on.
I'm not quite sure when that is reached, but I think since we're under this while True we'll still end up here.
| except (KeyboardInterrupt, TimeoutError): | ||
| except TimeoutError: | ||
| # We already log the exception in nanny / worker. Don't do it again. | ||
| raise TimeoutError("Timed out starting worker.") from None |
There was a problem hiding this comment.
Whoa, raise from? New syntax for me.
There was a problem hiding this comment.
Welcome to the brave new world of Python 3 :)
raise ... from None specifically is useful when you think the original traceback isn't helpful, which I think it isn't here. Otherwise, you can raise from e if you want to include it.
|
@mrocklin the previous behavior of gracefully closing the worker, without error, was tested. I've removed those tests since I changed the behavior to raise when a worker times out. Does that sound OK? Alternatively, I could somehow check in the |
| @gen_cluster(client=False, nthreads=[]) | ||
| def test_nanny_death_timeout(s): | ||
| yield s.close() | ||
| w = yield Nanny(s.address, death_timeout=1) |
There was a problem hiding this comment.
I think that this test is still valid, but you just want to add something like the following:
w = Nanny(s.address, death_timeout=1
with pytest.raises(gen.TimeoutError):
yield wThere was a problem hiding this comment.
Got it, thanks.
I've removed my new tests, since I think they're duplicative of these now.
This reverts commit acad50a.
|
All green. |
|
Thanks @TomAugspurger ! |
This changes Worker / Nanny startup to raise when they timeout.
This bubbles up to the
dask-workerCLI.Closes #2781