P2P for array rechunking by hendrikmakait · Pull Request #7534 · dask/distributed

hendrikmakait · 2023-02-09T17:42:45Z

Blocked by and sibling to dask/dask#9939

Partially addresses #7507

Tests added / passed
Passes pre-commit run --all-files

github-actions · 2023-02-09T19:00:55Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      24 files ±    0       24 suites ±0 11h 25m 10s ⏱️ + 39m 48s
  3 486 tests +  74   3 382 ✔️ +  75   103 💤 ±0 1 ❌ - 1
40 962 runs +805 39 121 ✔️ +806 1 840 💤 +1 1 ❌ - 2

For more details on these failures, see this check.

Results for commit d554c62. ± Comparison against base commit f9306da.

♻️ This comment has been updated with latest results.

continuous_integration/environment-3.10.yaml

continuous_integration/environment-3.11.yaml

continuous_integration/environment-3.8.yaml

continuous_integration/environment-3.9.yaml

continuous_integration/environment-mindeps.yaml

distributed/shuffle/_worker_extension.py

fjetter

There are a couple of nits but overall looks good.

distributed/shuffle/_buffer.py

distributed/shuffle/_rechunk.py

distributed/shuffle/_scheduler_extension.py

fjetter · 2023-02-24T12:51:04Z

distributed/shuffle/_worker_extension.py

+            out: dict[str, list[tuple[ArrayRechunkShardID, bytes]]] = defaultdict(list)
+            for id, nslice in self._slicing[input_partition]:
+                out[self.worker_for[id.new_index]].append(
+                    (id, pickle.dumps((id.sub_index, data[nslice])))


nit: It's not entirely clear why the pickle.dumps includes the sub_index if the first tuple entry is already the full index. Is this intentional?

I've extended the docstring above. Does that help?

fjetter · 2023-02-24T12:52:43Z

distributed/shuffle/_rechunk.py

+    #: Index of the new chunk the shard belongs
+    new_index: NIndex
+    #: Sub-index of the shard within the new chunk
+    sub_index: NIndex


Can you elaborate what "new" index and "sub_index" is? Can this be expressed with a simple example?

I assume new_index corresponds to the "partition"

I've tread clearing things up with additional docs and better (?) naming.

fjetter · 2023-02-24T13:02:24Z

distributed/shuffle/_worker_extension.py

+        with self.time("cpu"):
+            arr = convert_chunk(data, subdims)


Suggested change

with self.time("cpu"):

arr = convert_chunk(data, subdims)

arr = await self.offload(convert_chunk, data, subdims)

pretty sure this should be offloaded. you even instrument it with CPU time so this is likely nontrivial

Good catch, I just noticed we don't do that on the P2P shuffle either (which is probably why I haven't done so here).

I'll improve offloading for both rechunking and shuffling in a follow-up PR.

fjetter · 2023-02-24T13:10:05Z

distributed/shuffle/_scheduler_extension.py

+def get_worker_for_hash_sharding(output_partition: NIndex, workers: list[str]) -> str:
+    """Get address of target worker for this output partition using hash sharding"""
+    i = hash(output_partition) % len(workers)
+    return workers[i]


Are output_partitions always symmetrical or can they vary significantly?

I assume this hash-mod mapping is a sufficiently uniform distribution but assuming the output chunks are asymmetric we can likely do better with a different mapping that creates a better output distribution.

Out of scope for this PR, just a question

Theoretically, output partitions can vary significantly, e.g., you could have a chunked array of ((1, 4), (64, 1024)), which would correspond to partitions of sizes [(1, 64), (1, 1024), (4, 64), (4, 1024)].

In practice, I'd assume that they are mostly homogeneous. Then again, I'm not an arrays person.

Co-authored-by: Florian Jetter <fjetter@users.noreply.github.com>

distributed/shuffle/_buffer.py

fjetter · 2023-02-24T13:34:03Z

@hendrikmakait I pushed another commit to fix the linter issues that were introduced by my earlier suggestion

Edit: Sorry, wasn't enough. I'll let you handle it :)

hendrikmakait · 2023-02-24T14:27:22Z

Moved to draft to avoid accidental merge before dask/dask#9939 has been merged and the requirements in this PR have been cleaned up.

hendrikmakait · 2023-02-24T17:47:25Z

CI goes red because I pushed dask/dask#9939 too late and CI still expects the old keyword. Waiting for dask/dask#9939 to push the commit that points to dask@main again and turns CI green. I'll then move this PR to Ready for Review. Tests work locally.

hendrikmakait · 2023-02-24T18:33:31Z

CI seems to keep running on an outdated commit on dask/dask. https://github.com/dask/distributed/actions/runs/4265042270/jobs/7424015784#step:11:66

The commit is from yesterday (dask/dask@970da68), does CI cache something here? If so, how to invalidate those caches?

hendrikmakait added 12 commits February 3, 2023 11:40

Split get

febff47

1d working draft

b0c230a

low-level nd-shuffle

8459ce5

Typing

66ab291

E2E rechunk working

13bb211

Add tests from dask.array

4a9a9f6

Merge branch 'main' into rechunking

0ccba72

Subclassing

b5454d5

Reduce diff

3fce8b1

Remove comment

fc434eb

Minor

d84b334

Fix writing to disk

eb52dfa

hendrikmakait added 9 commits February 10, 2023 10:53

Adjust environments

1c5766c

Trigger CI

4696e1b

Minor

180e5c9

Fix

61374df

Typo

6ff124c

Adjust serialization

1ca90de

Pickle

8d74cf6

Pickle on sender

1dc8fa7

Pseudo-random workers for chunks

3048e67