Adjust transfer costs in `worker_objective` by gjoseph92 · Pull Request #5326 · dask/distributed

gjoseph92 · 2021-09-17T03:27:24Z

This should maybe be two PRs, since there are two different things happening:

Add a fixed (currently 10ms) penalty per transfer as discussed in Scheduler underestimates data transfer cost for small transfers #5324 (comment). This should help discourage small transfers. I'd prefer if this cost weren't just a magic 0.01 number though.
Amortize the transfer cost by the number of waiters. This is related to Ignore widely-shared dependencies in decide_worker #5325. See the commit message b4ebbee for more description:

The idea is that if a key we need has many dependents, we should amortize the cost of transferring it to a new worker, since those other dependencies could then run on the new worker more cheaply. "We'll probably have to move this at some point anyway, might as well do it now."
This isn't actually intended to encourage transfers though. It's more meant to discourage transferring keys that could have just stayed in one place. The goal is that if A and B are on different workers, and we're the only task that will ever need A, but plenty of other tasks will need B, we should schedule alongside A even if B is a bit larger to move. This will Probably™️ lead to smoother scheduling overall.

I haven't tested this at all yet; it's just a theory right now. Just looking for thoughts.

Closes Scheduler underestimates data transfer cost for small transfers #5324
Tests added / passed
Passes black distributed / flake8 distributed / isort distributed

I'd like to incorporate measured latency somehow too instead of a magic 10ms, but it's a start.

As discussed in dask#5325. The idea is that if a key we need has many dependents, we should amortize the cost of transferring it to a new worker, since those other dependencies could then run on the new worker more cheaply. "We'll probably have to move this at some point anyway, might as well do it now." This isn't actually intended to encourage transfers though. It's more meant to discourage transferring keys that could have just stayed in one place. The goal is that if A and B are on different workers, and we're the only task that will ever need A, but plenty of other tasks will need B, we should schedule alongside A even if B is a bit larger to move. But this is all a theory and needs some tests.

distributed/scheduler.py

fjetter · 2021-09-20T09:07:25Z

distributed/scheduler.py

+                # amortize transfer cost over all waiters
+                comm_bytes += nbytes / len(dts._waiters)


Can you add an in-code comment explaining how this division amortizes cost? I assume this is again a "local topology" argument related to the fan-out tasks (#5325 (comment)) where we try to "ignore" tasks which will likely end up everywhere anyhow?

Will do. It's related to that, but actually a simpler idea. Basically, if we transfer to this worker now, that opens up the potential for N other tasks to run on this worker without transferring the data. So you could look at as, rather than this task paying the whole cost up front and others getting the benefit for free, all the sibling tasks split the cost of the transferring evenly between them. (That's an analogy of course—once transferred, the other tasks don't actually pay anything!)

github-actions · 2022-05-04T19:25:05Z

Unit Test Results

      16 files ±0       16 suites ±0 7h 38m 11s ⏱️ + 22m 54s
  2 746 tests ±0   2 662 ✔️ ±0     80 💤 - 1 3 ❌ ±0 1 🔥 +1
21 865 runs +1 20 812 ✔️ - 2 1 048 💤 +2 4 ❌ ±0 1 🔥 +1

For more details on these failures and errors, see this check.

Results for commit e3d62f6. ± Comparison against base commit baf05c0.

♻️ This comment has been updated with latest results.

distributed/scheduler.py

crusaderky · 2022-05-04T19:49:54Z

distributed/scheduler.py

+                # amortize transfer cost over all waiters
+                comm_bytes += nbytes / len(dts.waiters)
+                xfers += 1


Suggested change

# amortize transfer cost over all waiters

comm_bytes += nbytes / len(dts.waiters)

xfers += 1

nwaiters = len(dts.waiters)

# amortize transfer cost over all waiters

comm_bytes += nbytes / nwaiters

xfers += 1 / nwaiters

@gjoseph92 do you agree?

However this would not be replicable in get_comm_cost above

distributed/scheduler.py

gjoseph92 added 2 commits September 16, 2021 21:05

10ms penalty per transfer

4c67b0b

I'd like to incorporate measured latency somehow too instead of a magic 10ms, but it's a start.

gjoseph92 mentioned this pull request Sep 17, 2021

Ignore widely-shared dependencies in decide_worker #5325

Open

2 tasks

crusaderky reviewed Sep 17, 2021

View reviewed changes

distributed/scheduler.py Outdated Show resolved Hide resolved

Add fixed transfer cost in get_comm_cost as well

cbc145a

fjetter reviewed Sep 20, 2021

View reviewed changes

gjoseph92 mentioned this pull request Jan 19, 2022

Spatial join benchmarks (from Scipy 2020 talk) geopandas/dask-geopandas#114

Open

gjoseph92 mentioned this pull request Apr 1, 2022

Making AMM ReduceReplicas less aggressive towards widely-shared dependencies #6056

Open

gjoseph92 mentioned this pull request Apr 13, 2022

Consider candidates that don't hold any dependencies in decide_worker #4925

Open

3 tasks

Merge branch 'main' into worker_objective/adjust-transfer-cost

7f32a1a

crusaderky reviewed May 4, 2022

View reviewed changes

distributed/scheduler.py Outdated Show resolved Hide resolved

Update distributed/scheduler.py

a6c2315

crusaderky reviewed May 4, 2022

View reviewed changes

distributed/scheduler.py Show resolved Hide resolved

Update distributed/scheduler.py

5f18151

crusaderky reviewed May 4, 2022

View reviewed changes

distributed/scheduler.py Outdated Show resolved Hide resolved

Update distributed/scheduler.py

01a0e8c

crusaderky reviewed May 4, 2022

View reviewed changes

distributed/scheduler.py Outdated Show resolved Hide resolved

Update distributed/scheduler.py

e3d62f6

gjoseph92 mentioned this pull request Aug 20, 2022

Factor out and instrument task categorization logic - static graph analysis #6922

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adjust transfer costs in `worker_objective`#5326

Adjust transfer costs in `worker_objective`#5326
gjoseph92 wants to merge 8 commits intodask:mainfrom
gjoseph92:worker_objective/adjust-transfer-cost

gjoseph92 commented Sep 17, 2021

Uh oh!

Uh oh!

fjetter Sep 20, 2021

Uh oh!

gjoseph92 Sep 20, 2021

Uh oh!

github-actions bot commented May 4, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

crusaderky May 4, 2022

Uh oh!

crusaderky May 4, 2022

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		# amortize transfer cost over all waiters
		comm_bytes += nbytes / len(dts._waiters)

Uh oh!

Conversation

gjoseph92 commented Sep 17, 2021

Uh oh!

Uh oh!

fjetter Sep 20, 2021

Choose a reason for hiding this comment

Uh oh!

gjoseph92 Sep 20, 2021

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented May 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit Test Results

Uh oh!

Uh oh!

Uh oh!

Uh oh!

crusaderky May 4, 2022

Choose a reason for hiding this comment

Uh oh!

crusaderky May 4, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented May 4, 2022 •

edited

Loading