Add new CLI that is extensible#9283
Conversation
docs/source/cli.rst
Outdated
| ====================== | ||
|
|
||
| Dask provides a command line tool (``dask``) to accomplish a number of | ||
| tasks. |
There was a problem hiding this comment.
We probably want to change this to reflect the current state of this work. Maybe say that it's experimental and include the version info for when it was added?
There was a problem hiding this comment.
At this stage I don't think that we should have an experimental label in docs for something top-level and that is this important. At least not if it's going to be in the index. I think that we should invest in this page by adding dask scheduler and whatnot and then not think of it as experimental.
There was a problem hiding this comment.
Yeah I agree that is an even better idea!
There was a problem hiding this comment.
I'll third this-- I think the docs will be a large (if not the most significant) part of this PR
|
|
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks @douglasdavis! Could you provide a bit more context around the motivation for adding this new feature? For example, was this based on user feedback?
Do we want to make
clicka hard dependency?
I don't have a strong opinion. I view click as a lightweight dependency that already being installed in a lot of cases (e.g. pip install dask[complete], conda install dask), so if it makes other things more straightforward, I wouldn't object to adding it as a dependency
|
Hey @jrbourbeau! The motivation is a bit multidimensional. In general a group of Dask contributors that were at the SciPy sprint last week discussed improving CLI based on some feedback and a desire to bring CLI tooling under one extensible umbrella. @jacobtomlinson adds some detail on a PR in distributed here: dask/distributed#6738 (comment) |
|
From my perspective this needs docs, and then we should drop anything that isn't implemented (info). I'm good to merge after that. |
|
Matthew Rocklin ***@***.***> writes:
From my perspective this needs docs, and then we should drop anything that isn't implemented (info). I'm good to merge after that.
OK that also sounds good to me
|
| from dask.cli import cli | ||
|
|
||
|
|
||
| def test_version(): |
There was a problem hiding this comment.
It'd be good to also add test for other commands being added (e.g. dask version) and register_third_party
There was a problem hiding this comment.
I refactored the registering code a bit to make registering commands somewhat test-able
Just for reference, (which is calling from dask.cli import info
import click
@click.command(name="fun")
def fun():
print(42)
info.add_command(fun)to get |
I don't think we should. I'd be interested to explore how much work it would be to also support existing tools that just use
Ooh this is news to me too! We should definitely switch out |
Typer apps can be used in Click apps, but I haven't been able to find anything that connects argparse and Click. Since the existing tools already use Click, I'm inclined to keep that as the foundation. If we don't make Click a hard dependency, perhaps we can make |
|
Alright I think this is ready for review. The partner PR (dask/distributed#6735) should go in with this one, but it relies on installing my dask branch associated with this PR (I've changed all of the CI to use dask at version douglasdavis/dask@new-cli). I'm guessing this situation requires:
|
There was a problem hiding this comment.
This is generally looking great, thanks @douglasdavis.
I had a go at migrating dask-ctl over to this and it was pretty easy. dask-contrib/dask-ctl#50
I left a couple of small comments but happy to coordinate the merging of this and dask/distributed#6735 when ready. Ping me on Slack if you want to sync up to do it.
| if not isinstance(command, (click.Command, click.Group)): | ||
| raise TypeError( | ||
| "entry points in 'dask_cli' must be instances of " | ||
| "click.Command or click.Group" | ||
| ) |
There was a problem hiding this comment.
Perhaps this should be a warning instead? If a package registers a bad entrypoint it breaks the whole CLI.
$ dask --help
Traceback (most recent call last):
File "/home/jtomlinson/miniconda3/envs/dask/bin/dask", line 33, in <module>
sys.exit(load_entry_point('dask', 'console_scripts', 'dask')())
File "/home/jtomlinson/Projects/dask/dask/dask/__main__.py", line 23, in main
run_cli()
File "/home/jtomlinson/Projects/dask/dask/dask/cli.py", line 79, in run_cli
_register_command_ep(cli, ep)
File "/home/jtomlinson/Projects/dask/dask/dask/cli.py", line 66, in _register_command_ep
raise TypeError(
TypeError: entry points in 'dask_cli' must be instances of click.Command or click.GroupThere was a problem hiding this comment.
Ah yes I think a warning would be better, just made that change
| "entry points in 'dask_cli' must be instances of " | ||
| "click.Command or click.Group" | ||
| ) | ||
| interface.add_command(command) |
There was a problem hiding this comment.
What happens in the case of a name conflict?
There was a problem hiding this comment.
I tested this out and found that a repeated name will just cause an overwrite. At first I added a commit to raise if this happens, but based on your other comment I decided to do another warning.
|
thanks @jacobtomlinson! Added a few commits to address those comments |
| sub-group in `interface`. | ||
|
|
||
| """ | ||
| command = entry_point.load() |
There was a problem hiding this comment.
Just to note that this dynamic registration approach allows third party packages to register arbitrary code to be executed when the dask command is run.
There was a problem hiding this comment.
Thanks for raising this, it's a fair point.
I don't think we should worry about this though, I agree that it allows nasty people to register an entrypoint that would cause the dask CLI command to run arbitrary code. But they could equally set a console_scripts entrypoint that just overrides the dask command altogether. If multiple packages register the same CLI command they just get overridden, so any package who's name starts with a letter greater than d can override the dask command.
Entrypoints as a concept allows arbitrary packages to register code that will be executed by some other package. Typically we have to trust that everything installed in the Python environment is trusted. There isn't really any other option.
jrbourbeau
left a comment
There was a problem hiding this comment.
Okay, all lingering review comments have been addressed. I think this PR, and the accompanying PR in distributed (xref dask/distributed#6735), are both good to go. Thanks for your work here @douglasdavis
In dask/dask#9283 we are adding a new top level `dask` CLI command which can be extended by other modules using entry points. A primary motivation here is to improve discoverability by uniting everything under one tool and allowing folks to run `dask --help` and `dask <subcommand> --help` to learn more about the various tools. This PR adds a new `click` group called `cuda` and moves the `dask-cuda-worker` command under that group with the name `worker`. This means the `dask-cuda-worker` becomes `dask cuda worker` in the new CLI tool. I haven't made any changes to the existing `dask-cuda-worker` console script so that will still continue to work, but maybe we should add a deprecation warning to it? I went with this name rather than `dask cuda-worker` because I think it is more readable and also leaves us open to adding more subcommands in the future without cluttering up the top-level `dask` namespace. ```console $ dask --help Usage: dask [OPTIONS] COMMAND [ARGS]... Dask command line interface. Options: --version Show the version and exit. -h, --help Show this message and exit. Commands: cluster Manage dask clusters. cuda GPU subcommands. docs Open Dask documentation (https://docs.dask.org/) in a web browser. info Information about your dask installation. scheduler Launch a distributed scheduler. ssh Launch a distributed cluster over SSH. worker Launch a distributed worker attached to an existing SCHEDULER. ``` ```console $ dask cuda --help Usage: dask cuda [OPTIONS] COMMAND [ARGS]... GPU subcommands. Options: -h, --help Show this message and exit. Commands: worker Launch a distributed worker with GPUs attached to an existing SCHEDULER. ``` ```console $ dask cuda worker --help Usage: dask cuda worker [OPTIONS] [SCHEDULER] [PRELOAD_ARGV]... Launch a distributed worker with GPUs attached to an existing SCHEDULER. See https://docs.rapids.ai/api/dask-cuda/stable/quickstart.html#dask-cuda-worker for info. Options: --host TEXT IP address of serving host; should be visible to the scheduler and other workers. Can be a string (like ``"127.0.0.1"``) or ``None`` to fall back on the address of the interface specified by ``--interface`` or the default interface. --nthreads INTEGER Number of threads to be used for each Dask worker process. [default: 1] ... ``` The CLI PR needs to be merged and released before this can be merged. Fixes #1038 Authors: - Jacob Tomlinson (https://github.com/jacobtomlinson) - Ray Douglass (https://github.com/raydouglass) - Peter Andreas Entschev (https://github.com/pentschev) - Charles Blackmon-Luca (https://github.com/charlesbluca) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - https://github.com/jakirkham - Peter Andreas Entschev (https://github.com/pentschev) URL: #981
New CLI discussed at SciPy sprint
This creates a new
daskcommand line tool that can be augmented by external packages (e.g. distributed) with a[dask_cli]entry point. The new tool can be used with eitheror
Adding an entry point is pretty straightforward; we'll use distributed as an example:
In an external-to-dask module at the location:
distributed.cli.dask_schedulerwith a function calledmainwe can add topyproject.toml(PEP-621):or an "old school"
setup.pyorsetup.cfgThe function needs to be a click command:
Since the
scheduler = ...entry point under thedask_clisection is defined in the distributed package; upon import ofdask.clitherun_clifunction will discover the externalschedulercommand. This allows use ofor
Related PRs in distributed: