Skip to content

Added compute method to raise error on use#8734

Merged
jsignell merged 6 commits intodask:mainfrom
Dranaxel:handling_dataframegroupby
Feb 21, 2022
Merged

Added compute method to raise error on use#8734
jsignell merged 6 commits intodask:mainfrom
Dranaxel:handling_dataframegroupby

Conversation

@Dranaxel
Copy link
Copy Markdown
Contributor

First (very) naive implementation of compute method

@GPUtester
Copy link
Copy Markdown
Collaborator

Can one of the admins verify this patch?

graph = HighLevelGraph.from_collections(name, dask, dependencies=dependencies)
return new_dd_object(graph, name, chunk(self._meta), self.obj.divisions)

def _compute(self):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should just be compute and it should accept kwarg to match:

dask/dask/base.py

Lines 266 to 267 in bd8e8dc

def compute(self, **kwargs):
"""Compute this dask collection

Suggested change
def _compute(self):
def compute(self, **kwargs):

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it mandatory to accept kwargs please ?

Copy link
Copy Markdown
Member

@jsignell jsignell Feb 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to match the compute method on the base class so that if someone accidentally calls gb.compute(scheduler="threads") for instance, they will get the NotImplementedError

@jsignell
Copy link
Copy Markdown
Member

ok to test

@jsignell
Copy link
Copy Markdown
Member

Thanks for opening this @Dranaxel - I think with a few tweaks it will be good to go!

@jsignell
Copy link
Copy Markdown
Member

It would also be great if you could add a test for this. Probably you can just add a few lines to this test:

def test_groupby_error():
pdf = pd.DataFrame({"x": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10], "y": list("abcbabbcda")})
ddf = dd.from_pandas(pdf, 3)
with pytest.raises(KeyError):
ddf.groupby("A")
with pytest.raises(KeyError):
ddf.groupby(["x", "A"])
dp = ddf.groupby("y")
msg = "Column not found: "
with pytest.raises(KeyError) as err:
dp["A"]
assert msg in str(err.value)
msg = "Columns not found: "
with pytest.raises(KeyError) as err:
dp[["x", "A"]]
assert msg in str(err.value)

Dranaxel and others added 3 commits February 18, 2022 18:20
@jsignell
Copy link
Copy Markdown
Member

Thanks for taking this on @Dranaxel! I think this will really help people :)

@jsignell jsignell merged commit e2397f7 into dask:main Feb 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Nicer handling of incorrect usage of DataFrameGroupBy

3 participants