Do not allow iterating a DataFrameGroupBy#8696
Conversation
|
This is definitely an improvement |
|
To expand on my previous comment, I think this is should be merged now and if additional work is needed to get this in a desired state, I'd put that in a separate PR. |
ian-r-rose
left a comment
There was a problem hiding this comment.
Thanks! I'm also trying to think of ways to better communicate that you can't directly compute() a DataFrameGroupby object. It has a lot of things in common with DataFrame, and can produce Dask DataFrames, but it's not a true collection in the sense of
import dask
ddf = dask.datasets.timeseries()
dask.is_dask_collection(ddf) # True
dask.is_dask_collection(ddf.groupby("name")) # False| "may be slow. You probably want to use 'apply' to execute a function for " | ||
| "all the columns. To access individual groups, use 'get_group'. To list " | ||
| "all the group names, use 'df[<group column>].unique().compute()'." |
There was a problem hiding this comment.
| "may be slow. You probably want to use 'apply' to execute a function for " | |
| "all the columns. To access individual groups, use 'get_group'. To list " | |
| "all the group names, use 'df[<group column>].unique().compute()'." | |
| "may be slow. You may want to use 'apply' or 'transform' to execute a function for " | |
| "all the groups. To access individual groups, use 'get_group'. To list " | |
| "all the group names, use 'df.groupby(<group-columns-or-index>).size().compute()'." |
Suggesting a different way to get the group names which should also work for multiple columns or the index.
There was a problem hiding this comment.
Thanks @ian-r-rose! The only quibble I have is with the last change from .unique() to .size(). I think the former would give you the values of the groups to put into `get_group()' whereas the latter just tells you how many rows are in the dataset right?
I opened #8695 to discuss that, since it's a little bit different than this case |
I ran into this issue yesterday (similar to #8695, a naive Pandas-aware mistake) and the fix seemed simple enough.
pre-commit run --all-files