[WIP] Use of new dask CLI : job submission#6738
Conversation
Initial commit for structure and basic implementation of `dask job submit`
|
Can one of the admins verify this patch? |
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ±0 15 suites ±0 6h 7m 17s ⏱️ - 18m 37s For more details on these failures, see this check. Results for commit b854dc9. ± Comparison against base commit 930d3dc. |
| def runscript(): | ||
| # TODO: use a tempfile | ||
| filepath = '/tmp/tmpfile' | ||
| with open(filepath, 'w') as f: | ||
| f.write(script) | ||
|
|
||
| st = os.stat(filepath) | ||
| os.chmod(filepath, st.st_mode | stat.S_IEXEC | stat.S_IREAD) | ||
|
|
||
| # TODO: need to set DASK_SCHEDULER_ADDRESS env variable in call to execute | ||
| # for our script; should be the address of the scheduler | ||
| # might need to use another approach in subprocess for this | ||
| return subprocess.check_output(filepath) |
There was a problem hiding this comment.
I recommend moving this out to a top-level private function so that serialization is easier.
client.run_on_scheduler(_run_script, script)|
|
||
| @job.command() | ||
| def gather(): | ||
| ... |
There was a problem hiding this comment.
I think that we should drop this function. I'm not sure what it would do.
|
I think that we need to think a little bit about how this would be used. When submitting a script how does that script get access to the scheduler? Naively I might think that the following should work: from dask.distributed import Client
client = Client() # connects to the current Dask scheduler
df = dask.dataframe.read_parquet(...)
df....Great. If this is the kind of workflow that we're thinking of then we'll need to address a few issues:
If this isn't the kind of workflow that we're thinking of then we should figure out what that workflow is and make sure that it works well. |
|
Like @mrocklin said, the cluster discovery seems like a major question here (and something that I believe has been out of scope for the core dask project up until now). This seems closely related to dask-ctl https://github.com/dask-contrib/dask-ctl, cc @jacobtomlinson. Maybe a |
|
@gjoseph92 this work was done at the SciPy sprints where @mrocklin, @jsignell, @jcrist, @charlesbluca and I made some longer-term plans about CLI in Dask generally. I think the plans is to migrate some of the core functionality from @douglasdavis started some of this work to move us towards an extensible I have an action from the sprints to write this topic up in design-docs. |
|
Haven't had time to circle back to this folks. Apologies for the delay, and thank you for the feedback! |
Implementation of
dask job submit. This is intended to consume a python script as:There are potentially two different usage patterns we are anticipating:
daskcollection or buildsdelayedobjects and calls<object>.compute(); this should use thedaskcluster to execute all.computecalls.dask-isms at all, but is intended to be a single worker job as if the script were just a function submitted viaclient.submit.It may make sense for these two patterns to be served by two different subcommands. Exploring that here to settle on an approach.
pre-commit run --all-files