Reverse precedence between collection and distributed default#9869
Reverse precedence between collection and distributed default#9869
Conversation
| elif computation == "dask.compute": | ||
| dask.compute(ddf, scheduler=scheduler) | ||
| elif computation == "compute_as_if_collection": | ||
| compute_as_if_collection( |
There was a problem hiding this comment.
I noticed this being specifically a problem since it passes the class as an argument which then might affect the compute precedence.
I put in these three ways of triggering a compute. if there are others, we might want to add them here as well
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks @fjetter -- switching back to the previous order here seems reasonable.
Left one non-blocking comment about the test added here. Happy to merge this as is or possibly simplify
| # Count how many submits/update-graph were received by the scheduler | ||
| assert ( | ||
| c.run_on_scheduler( | ||
| lambda dask_scheduler: sum( | ||
| len(comp.code) for comp in dask_scheduler.computations | ||
| ) | ||
| ) | ||
| == 2 | ||
| if use_distributed | ||
| else 1 | ||
| ) |
There was a problem hiding this comment.
I think this could probably be simplified if we used gen_cluster for this test instead. Though maybe that's problematic for the single-machine schedulers (not sure)?
Also, maybe len(task_prefixes) could be used here instead?
There was a problem hiding this comment.
a couple of things could be used. I'm a bit disappointed that there is not a simple counter for this that doesn't require internal/domain knowledge
This reverses the precedence change introduced in https://github.com/dask/dask/pull/9808/files/524009275e21394892f6161c53c6d250e3cd4a2a#r1062672836
Basically this determines if a collection is supposed to use its default executor or the available distributed cluster when being computed on a worker.
I can see arguments for both sides. This PR reverses the behavior to pre #9808