Reverse precedence between collection and distributed default by fjetter · Pull Request #9869 · dask/dask

fjetter · 2023-01-24T13:35:20Z

This reverses the precedence change introduced in https://github.com/dask/dask/pull/9808/files/524009275e21394892f6161c53c6d250e3cd4a2a#r1062672836

Basically this determines if a collection is supposed to use its default executor or the available distributed cluster when being computed on a worker.

I can see arguments for both sides. This PR reverses the behavior to pre #9808

fjetter · 2023-01-24T13:36:31Z

dask/tests/test_distributed.py

+        elif computation == "dask.compute":
+            dask.compute(ddf, scheduler=scheduler)
+        elif computation == "compute_as_if_collection":
+            compute_as_if_collection(


I noticed this being specifically a problem since it passes the class as an argument which then might affect the compute precedence.

I put in these three ways of triggering a compute. if there are others, we might want to add them here as well

jrbourbeau

Thanks @fjetter -- switching back to the previous order here seems reasonable.

Left one non-blocking comment about the test added here. Happy to merge this as is or possibly simplify

jrbourbeau · 2023-01-26T16:22:19Z

dask/tests/test_distributed.py

+    # Count how many submits/update-graph were received by the scheduler
+    assert (
+        c.run_on_scheduler(
+            lambda dask_scheduler: sum(
+                len(comp.code) for comp in dask_scheduler.computations
+            )
+        )
+        == 2
+        if use_distributed
+        else 1
+    )


I think this could probably be simplified if we used gen_cluster for this test instead. Though maybe that's problematic for the single-machine schedulers (not sure)?

Also, maybe len(task_prefixes) could be used here instead?

a couple of things could be used. I'm a bit disappointed that there is not a simple counter for this that doesn't require internal/domain knowledge

fjetter added 2 commits January 24, 2023 14:05

Reverse predence between collection and distributed default

b79905c

Add test

942b566

fjetter commented Jan 24, 2023

View reviewed changes

fjetter mentioned this pull request Jan 24, 2023

Remove set_config when using default client dask/distributed#7482

Merged

fjetter changed the title ~~Reverse predence between collection and distributed default~~ Reverse precedence between collection and distributed default Jan 26, 2023

fjetter requested a review from jrbourbeau January 26, 2023 12:24

jrbourbeau approved these changes Jan 26, 2023

View reviewed changes

fjetter merged commit 579b191 into dask:main Jan 26, 2023

fjetter deleted the get_scheduler_precedence branch January 26, 2023 17:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reverse precedence between collection and distributed default#9869

Reverse precedence between collection and distributed default#9869
fjetter merged 2 commits intodask:mainfrom
fjetter:get_scheduler_precedence

fjetter commented Jan 24, 2023

Uh oh!

fjetter Jan 24, 2023

Uh oh!

jrbourbeau left a comment

Uh oh!

jrbourbeau Jan 26, 2023

Uh oh!

fjetter Jan 26, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

fjetter commented Jan 24, 2023

Uh oh!

fjetter Jan 24, 2023

Choose a reason for hiding this comment

Uh oh!

jrbourbeau left a comment

Choose a reason for hiding this comment

Uh oh!

jrbourbeau Jan 26, 2023

Choose a reason for hiding this comment

Uh oh!

fjetter Jan 26, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants