[WIP] MultiIndex by jsignell · Pull Request #8153 · dask/dask

jsignell · 2021-09-15T20:38:23Z

Closes Full support for multiindex in dataframes #1493
Tests added / passed
Passes black dask / flake8 dask / isort dask

This first commit is pulled from @TomAugspurger's original branch: TomAugspurger@0e741e1

My plan is to try to keep moving forward with that work and raise NotImplemented all over the place.

rjzamora · 2021-10-01T16:16:08Z

dask/dataframe/partitionquantiles.py

+def _collapse(partition):
+    return pd.Series(
+        list(partition.itertuples(index=False, name=None)),
+        index=partition.index,
+        name=tuple(partition.columns),
+    )


Oooo - It may make sense to precede this PR with a simpler PR to support multi-column sort_values using this trick.

@charlesbluca - Note that this approach is not as performant as direct DataFrame.quantiles/DataFrame.searchsorted support in pandas, but it should "unblock" multi-column sorting :)

Yeah this looks nice - thanks for the heads up! I can start up a WIP using this in sort_values

Cool yeah! I expect this to take a while to work out. There are still some open questions about how things should behave. So anything that can come out of here and be useful is great!

@charlesbluca - I started exploring this a bit in this branch (couldn't help myself). It is quite slow compared to 0th column partitioning, but does seem to work for cases where multiple columns are required for sufficient repartitioning.

TomAugspurger and others added 2 commits April 20, 2020 15:54

WIP: Add support for MultiIndex

0e741e1

Merge in 0e741e1 from Tom's fork

2a1cf0b

github-actions bot added dataframe dispatch Related to `Dispatch` extension objects io labels Sep 15, 2021

jsignell marked this pull request as draft September 15, 2021 20:38

rjzamora reviewed Oct 1, 2021

View reviewed changes

charlesbluca mentioned this pull request Oct 14, 2021

Add support for multi-column sort_values #8263

Merged

3 tasks

github-actions bot added the needs attention It's been a while since this was pushed on. Needs attention from the owner or a maintainer. label Nov 1, 2021

jsignell mentioned this pull request Nov 5, 2021

inconsistency in dataframe: no multiindex from pandas, but allowed from parquet #8323

Closed

jsignell removed the needs attention It's been a while since this was pushed on. Needs attention from the owner or a maintainer. label Mar 10, 2022

alxmrs mentioned this pull request Feb 29, 2024

Consider Extending Dask Dataframes to provide missing key features. alxmrs/xarray-sql#26

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] MultiIndex#8153

[WIP] MultiIndex#8153
jsignell wants to merge 2 commits intodask:mainfrom
jsignell:multiindex

jsignell commented Sep 15, 2021

Uh oh!

rjzamora Oct 1, 2021 •

edited

Loading

Uh oh!

charlesbluca Oct 1, 2021

Uh oh!

jsignell Oct 1, 2021

Uh oh!

rjzamora Oct 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

jsignell commented Sep 15, 2021

Uh oh!

rjzamora Oct 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charlesbluca Oct 1, 2021

Choose a reason for hiding this comment

Uh oh!

jsignell Oct 1, 2021

Choose a reason for hiding this comment

Uh oh!

rjzamora Oct 1, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rjzamora Oct 1, 2021 •

edited

Loading