Deprecate --nprocs option for dask-worker CLI#5641
Deprecate --nprocs option for dask-worker CLI#5641jrbourbeau merged 18 commits intodask:mainfrom bryanwweber:deprecate-nprocs
--nprocs option for dask-worker CLI#5641Conversation
Per #2471, nprocs is confusing and should be replaced by num-workers.
|
Can one of the admins verify this patch? |
|
add to allowlist |
|
cc @guillaumeeb for visibility (as this change will impact |
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks @bryanwweber! I know this is still a draft PR, but I still wanted to leave some early feedback.
Also, unfortunately distributed has several known flaky tests at the moment. The test failures popping up here are all unrelated to the changes in this PR.
|
@jrbourbeau Sure, thanks for the review! I'll take a look tomorrow. I also noticed that edge case, but wanted to follow the pre-existing convention for the I just pushed 3 more commits to catch the few other places where |
Thanks @jrbourbeau, I'll try to follow the evolution here! |
|
@jacobtomlinson if you get a moment, would you mind taking a look at the |
|
@jrbourbeau I can't reproduce this test failure locally: https://github.com/dask/distributed/runs/4791743011?check_suite_focus=true#step:12:182 possibly because I'm on a mac and this is a Windows failure? |
jacobtomlinson
left a comment
There was a problem hiding this comment.
The SSH changes look reasonable to me.
This will be breaking for old_ssh but perhaps we should think about removing that now anyway.
|
Sorry for the confusion with the review dismissal. I'm happy with this change. We should remove |
|
@jrbourbeau I tried to use distributed/distributed/cli/dask_ssh.py Lines 216 to 217 in f4451b6 I suspect this might be why I needed to issue the SIGINT to get any output, if that function is blocking until SIGINT is sent? |
I suspect this is because the stdout stream is block-buffered by default (when non-interactive) you could set PYTHONUNBUFFERED if you want the output as soon as possible. |
|
@graingert Thanks for the thought! I don't think that's the problem here, for a few reasons. The modified test code with def test_ssh_cli_nprocs_renamed_to_num_workers(loop):
num_workers = 2
with popen(
["dask-ssh", f"--nprocs={num_workers}", "--nohost", "localhost"]
) as cluster:
with Client("tcp://127.0.0.1:8786", loop=loop) as c:
c.wait_for_workers(num_workers, timeout="15 seconds")
|
|
Sorry to be super late to the conversation, but is there a good reason to use |
No problem @jsignell! I believe |
|
Yeah, I just meant consolidate on a prefix :) |
|
Thanks @jsignell! Do you prefer |
No preference. There is an |
|
Whoops, sorry @jacobtomlinson I hit the wrong button there. Feel free to dismiss the review request (or do a review if you want to 😄 ) |
|
@jsignell Based on a conversation with @jrbourbeau we agreed that using |
The client only has 5 seconds to connect to the scheduler by default, so it seems to sometimes time out on CI.
|
@jrbourbeau I think I tracked down the SSH failures to the |
|
@jrbourbeau The test failing on Ubuntu/3.9/no ci1 doesn't look like it's due to my change, but I can't quite tell. Do you have any insight? https://github.com/dask/distributed/runs/4968972824?check_suite_focus=true#step:12:1434 Otherwise, waiting on macOS like some other PRs |
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks for all your efforts on this @bryanwweber. Agreed those test failures are unrelated to the changes here. I left a few final review comments, but this looks good overall
| nanny_port=None, | ||
| remote_dask_worker="distributed.cli.dask_worker", | ||
| local_directory=None, | ||
| **kwargs, |
There was a problem hiding this comment.
Adding **kwargs here will cause invalid keyword arguments to go unnoticed. For example, today a typo of memory_limt instead of memory_limit will raise an error, while with this change as is such a typo will be silently ignored.
Instead I'd rather us either keep an explicit nprocs= keyword around, but still raise a ValueError / FutureWarning accordingly, or do something like (below is pseudocode)
nprocs = kwargs.pop("nprocs")
if kwargs:
raise TypeError("...")
<existing ValueError / FutureWarning logic>| elif n_workers is None: | ||
| n_workers = 1 | ||
|
|
||
| self.n_workers = n_workers |
There was a problem hiding this comment.
Can we add a small self.nprocs property which redirects users to self.n_workers instead?
There was a problem hiding this comment.
I'm happy to add this. However, setting nprocs or n_workers as an attribute (after __init__() runs) doesn't actually change anything about the cluster, unless the user subsequently runs add_worker(). For that matter, it's not all that clear to me what the behavior should be... do all the worker nodes need to be updated with the number of worker processes? Is there a method to do that (there doesn't seem to be from my quick read of the code)?
There was a problem hiding this comment.
After an offline discussion, we decided that although these attributes are not necessarily meant to be set, a belt-and-suspenders approach to backward compatibility means we should add a property to warn instead of an AttributeError if nprocs is not defined. This is particularly true for the getting case
| ) | ||
| self.n_workers = self.kwargs.pop("nprocs") | ||
| else: | ||
| self.n_workers = self.kwargs.pop("n_workers", 1) |
There was a problem hiding this comment.
Similar comment here about adding a self.nprocs property redirecting to self.n_workers
There was a problem hiding this comment.
Similar response as above 😄
distributed/deploy/ssh.py
Outdated
| "renamed to n_workers.", | ||
| FutureWarning, | ||
| ) | ||
| self.n_workers = self.kwargs.pop("nprocs") |
There was a problem hiding this comment.
Not sure if this is needed or not
| self.n_workers = self.kwargs.pop("nprocs") | |
| self.n_workers = self.kwargs.pop("nprocs", 1) |
There was a problem hiding this comment.
It shouldn't be needed, but wouldn't hurt to add it.
--nprocs option for dask-worker CLI
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks @bryanwweber! This is in
Per #2471,
nprocsis potentially confusing and should be replaced bynum-workers.pre-commit run --all-files