Update scheduling policy docs for root-ish task colocation by gjoseph92 · Pull Request #5018 · dask/distributed

gjoseph92 · 2021-07-01T21:49:59Z

Reflects changes from #4967.

gjoseph92 · 2021-07-01T21:50:44Z

docs/source/scheduling-state.rst


 .. autoclass:: Scheduler
   :members:
+   :inherited-members:


I added this to get SchedulerState.decide_worker, but maybe we don't want all the Server, etc. methods showing up, and I should add decide_worker explicitly?

gjoseph92 · 2021-07-01T21:53:38Z

docs/source/scheduling-policies.rst

+Calculating these cousin tasks directly by traversing the graph would be expensive.
+Instead, we use the task's TaskGroup, which is the collection of all tasks with the
+same key prefix. (``(random-a1b2c3, 0)``, ``(random-a1b2c3, 1)``, ``(random-a1b2c3, 2)``
+would all belong to the TaskGroup ``random-a1b2c3``.)


I wish we could describe the heuristic without explaining TaskGroups (or at least that they were documented elsewhere), but I couldn't figure out how without something verbose and awkward.

@mrocklin in general I recognize that this addition is probably too much detail on the implementation, and could be much shorter and just describe the scheduler's objective of co-assigning related tasks. I'm not sure if we want to go into the details of the implementation here or not.

mrocklin · 2021-07-01T22:05:00Z

Personally I will probably not spend too much time reviewing this one. I'm happy to defer to your judgement. While I strongly support documentation efforts, docs on this site aren't as visible to users, and this implementation seems like something that might change/ is still in flux. I like that we have something, but I don't think that we should work too hard to refine it.

…

On Thu, Jul 1, 2021, 2:54 PM Gabe Joseph ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In docs/source/scheduling-state.rst <#5018 (comment)>: > @@ -310,5 +310,6 @@ API .. autoclass:: Scheduler :members: + :inherited-members: I added this to get SchedulerState.decide_worker, but maybe we don't want all the Server, etc. methods showing up, and I should add decide_worker explicitly? ------------------------------ In docs/source/scheduling-policies.rst <#5018 (comment)>: > + a b c d + \ \ / / + X + +In the above case, we want ``a`` and ``b`` to run on the same worker, +and ``c`` and ``d`` to run on the same worker, reducing future +data transfer. We can also ignore the location of ``X``, because assuming +we split the ``a b c d`` group across all workers to maximize parallelism, +then ``X`` will eventually get transferred everywhere. +(Note that wanting to co-locate ``a b`` and ``c d`` would still apply even if +``X`` didn't exist.) + +Calculating these cousin tasks directly by traversing the graph would be expensive. +Instead, we use the task's TaskGroup, which is the collection of all tasks with the +same key prefix. (``(random-a1b2c3, 0)``, ``(random-a1b2c3, 1)``, ``(random-a1b2c3, 2)`` +would all belong to the TaskGroup ``random-a1b2c3``.) I wish we could describe the heuristic without explaining TaskGroups (or at least that they were documented elsewhere), but I couldn't figure out how without something verbose and awkward. @mrocklin <https://github.com/mrocklin> in general I recognize that this addition is probably too much detail on the implementation, and could be much shorter and just describe the scheduler's objective of co-assigning related tasks. I'm not sure if we want to go into the details of the implementation here or not. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#5018 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACKZTHDQLHZQ7YZLFA5OKDTVTPXPANCNFSM47VQ2BLQ> .

gjoseph92 · 2021-07-02T16:22:48Z

@mrocklin then in that case, I think this is decent. Should we merge it?

mrocklin · 2021-07-02T16:23:47Z

Done

Update scheduling policy docs for dask#4967

e85aeb6

gjoseph92 commented Jul 1, 2021

View reviewed changes

mrocklin merged commit cbcec9c into dask:main Jul 2, 2021

gjoseph92 deleted the docs/root-task-scheduling branch July 2, 2021 16:26

ncclementi mentioned this pull request Aug 12, 2021

array data without reference kept on workers? dask/dask#7212

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update scheduling policy docs for root-ish task colocation#5018

Update scheduling policy docs for root-ish task colocation#5018
mrocklin merged 1 commit intodask:mainfrom
gjoseph92:docs/root-task-scheduling

gjoseph92 commented Jul 1, 2021

Uh oh!

gjoseph92 Jul 1, 2021

Uh oh!

gjoseph92 Jul 1, 2021

Uh oh!

mrocklin commented Jul 1, 2021 via email

Uh oh!

gjoseph92 commented Jul 2, 2021

Uh oh!

mrocklin commented Jul 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

gjoseph92 commented Jul 1, 2021

Uh oh!

gjoseph92 Jul 1, 2021

Choose a reason for hiding this comment

Uh oh!

gjoseph92 Jul 1, 2021

Choose a reason for hiding this comment

Uh oh!

mrocklin commented Jul 1, 2021 via email

Uh oh!

gjoseph92 commented Jul 2, 2021

Uh oh!

mrocklin commented Jul 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants