Skip to content

Reverse precedence between collection and distributed default#9869

Merged
fjetter merged 2 commits intodask:mainfrom
fjetter:get_scheduler_precedence
Jan 26, 2023
Merged

Reverse precedence between collection and distributed default#9869
fjetter merged 2 commits intodask:mainfrom
fjetter:get_scheduler_precedence

Conversation

@fjetter
Copy link
Member

@fjetter fjetter commented Jan 24, 2023

This reverses the precedence change introduced in https://github.com/dask/dask/pull/9808/files/524009275e21394892f6161c53c6d250e3cd4a2a#r1062672836

Basically this determines if a collection is supposed to use its default executor or the available distributed cluster when being computed on a worker.

I can see arguments for both sides. This PR reverses the behavior to pre #9808

elif computation == "dask.compute":
dask.compute(ddf, scheduler=scheduler)
elif computation == "compute_as_if_collection":
compute_as_if_collection(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this being specifically a problem since it passes the class as an argument which then might affect the compute precedence.

I put in these three ways of triggering a compute. if there are others, we might want to add them here as well

@fjetter fjetter changed the title Reverse predence between collection and distributed default Reverse precedence between collection and distributed default Jan 26, 2023
@fjetter fjetter requested a review from jrbourbeau January 26, 2023 12:24
Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fjetter -- switching back to the previous order here seems reasonable.

Left one non-blocking comment about the test added here. Happy to merge this as is or possibly simplify

Comment on lines +194 to +204
# Count how many submits/update-graph were received by the scheduler
assert (
c.run_on_scheduler(
lambda dask_scheduler: sum(
len(comp.code) for comp in dask_scheduler.computations
)
)
== 2
if use_distributed
else 1
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could probably be simplified if we used gen_cluster for this test instead. Though maybe that's problematic for the single-machine schedulers (not sure)?

Also, maybe len(task_prefixes) could be used here instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of things could be used. I'm a bit disappointed that there is not a simple counter for this that doesn't require internal/domain knowledge

@fjetter fjetter merged commit 579b191 into dask:main Jan 26, 2023
@fjetter fjetter deleted the get_scheduler_precedence branch January 26, 2023 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants