P2P rechunking by hendrikmakait · Pull Request #9939 · dask/dask

hendrikmakait · 2023-02-10T09:51:29Z

Partially addresses Efficient scalable shuffle - P2P shuffle extension distributed#7507
Tests added / passed
Passes pre-commit run --all-files

hendrikmakait · 2023-02-23T15:47:20Z

For now, I'm just adding the keyword to rechunk(...) itself, no functions that use rechunk(...) under the hood.

fjetter · 2023-02-24T14:26:56Z

dask/array/core.py

+        threshold=None,
+        block_size_limit=None,
+        balance=False,
+        rechunk=None,


I don't love the name for this. Naming is hard. I see three options

rechunk (as you are proposing)

algo (more generic but also more descriptive)

shuffle (not great but it would offer some consistency; probably the worst option :D )

I've chosen 1. for consistency with the shuffle keyword, which I'm not a huge fan of either.

The most descriptive might be
4. rechunk_{impl|algo}

This would be more descriptive and generalizes well to other methods that use a rechunk under the hood, similar to how we use shuffle in other functions as well. impl might give us slightly more freedom (I could see different implementations of a P2P-like algorithm, e.g., an RDMA-based one).

In that case, I'd even go as far as proposing to rename the shuffle keyword.

In that case, I'd even go as far as proposing to rename the shuffle keyword.

That's also on the table (with a deprecation cycle, of course)

I am -epsilon on "rechunk" and -lots on "shuffle". I think that "algorithm" or "implementation" ("rechunk_implementation" seems too long given the function name is already rechunk) are better choices.

I don't think that consistency with the shuffle keyword necessarily makes much sense: as noted, that's a bad choice too and it takes quite a lot of knowledge of the algorithm implementations to realise that these keywords are doing the same thing.

It's now algorithm as a keyword within rechunk, which should be named rechunk_algorithm within users of rechunk. In the config, it's array.rechunk.algorithm.

wence-

I see a change happened as I was reading (and your crystal ball is good!)

dask/array/rechunk.py

wence- · 2023-02-24T15:45:37Z

dask/array/core.py

+        threshold=None,
+        block_size_limit=None,
+        balance=False,
+        rechunk=None,


I am -epsilon on "rechunk" and -lots on "shuffle". I think that "algorithm" or "implementation" ("rechunk_implementation" seems too long given the function name is already rechunk) are better choices.

I don't think that consistency with the shuffle keyword necessarily makes much sense: as noted, that's a bad choice too and it takes quite a lot of knowledge of the algorithm implementations to realise that these keywords are doing the same thing.

wence- · 2023-02-24T15:48:02Z

dask/array/rechunk.py

    if ndim <= 1 or not all(new_chunks) or any(has_nans):
        # Trivial array / unknown dim => no need / ability for an intermediate
-        return steps + [new_chunks]
+        return [new_chunks]


Maybe along with this cleanup, move the initialisation of steps to line 589. Since this short-circuiting also doesn't rely on any of the threshold and block_size_limit setup, you could fast-path this a bit earlier.

I would also inline the definitions of has_nans and ndim into the condition since those names are not subsequently used in the function.

if len(ndim) <= 1 or not all(new_chunks) or any(math.isnan(x) for x in chain.from_iterable(old_chunks)): ...

I'm hesitant to inline has_nans since it would turn this into a three-line if statements that arguable harder to read. I've taken care of the other suggestions though.

Use p2p rechunk

c7756f4

github-actions bot added the array label Feb 10, 2023

hendrikmakait added 3 commits February 10, 2023 11:29

Add kw to rechunk

f1cb821

Driveby

d568906

Merge branch 'main' into rechunking

e116d33

github-actions bot added the dataframe label Feb 16, 2023

hendrikmakait added 3 commits February 16, 2023 16:23

Rechunk config

4d6f1f3

Merge branch 'main' into rechunking

9fefb0c

Docs

9c33959

hendrikmakait mentioned this pull request Feb 20, 2023

P2P for array rechunking dask/distributed#7534

Merged

2 tasks

hendrikmakait marked this pull request as ready for review February 23, 2023 15:45

hendrikmakait requested a review from fjetter February 23, 2023 15:47

Merge branch 'main' into rechunking

3f0ff62

fjetter reviewed Feb 24, 2023

View reviewed changes

Rename

8930917

wence- reviewed Feb 24, 2023

View reviewed changes

hendrikmakait added 2 commits February 24, 2023 17:07

More renaming

06e6bf4

Config

6845205

fjetter approved these changes Feb 24, 2023

View reviewed changes

fjetter mentioned this pull request Feb 24, 2023

Release 2023.2.1 dask/community#308

Closed

4 tasks

fjetter merged commit a58a2a7 into dask:main Feb 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

P2P rechunking#9939

P2P rechunking#9939
fjetter merged 11 commits intodask:mainfrom
hendrikmakait:rechunking

hendrikmakait commented Feb 10, 2023 •

edited

Loading

Uh oh!

hendrikmakait commented Feb 23, 2023

Uh oh!

fjetter Feb 24, 2023

Uh oh!

hendrikmakait Feb 24, 2023 •

edited

Loading

Uh oh!

fjetter Feb 24, 2023

Uh oh!

wence- Feb 24, 2023

Uh oh!

hendrikmakait Feb 24, 2023

Uh oh!

wence- left a comment

Uh oh!

Uh oh!

wence- Feb 24, 2023

Uh oh!

wence- Feb 24, 2023

Uh oh!

hendrikmakait Feb 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

hendrikmakait commented Feb 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hendrikmakait commented Feb 23, 2023

Uh oh!

fjetter Feb 24, 2023

Choose a reason for hiding this comment

Uh oh!

hendrikmakait Feb 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fjetter Feb 24, 2023

Choose a reason for hiding this comment

Uh oh!

wence- Feb 24, 2023

Choose a reason for hiding this comment

Uh oh!

hendrikmakait Feb 24, 2023

Choose a reason for hiding this comment

Uh oh!

wence- left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wence- Feb 24, 2023

Choose a reason for hiding this comment

Uh oh!

wence- Feb 24, 2023

Choose a reason for hiding this comment

Uh oh!

hendrikmakait Feb 24, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hendrikmakait commented Feb 10, 2023 •

edited

Loading

hendrikmakait Feb 24, 2023 •

edited

Loading