Add attrs property to Series/Dataframe#6742
Conversation
This fails currently when testing series because df.iloc[:0], which is used in make_meta_pandas(x, index=None), does not keep the attrs.
TomAugspurger
left a comment
There was a problem hiding this comment.
Thanks for working on this. A few comments.
| def test_attrs(): | ||
| df = pd.DataFrame({"A": [1, 2], "B": [3, 4], "C": [5, 6]}) | ||
| df.attrs = {"date": "2020-10-16"} | ||
| df.A.attrs["unit"] = "kg" |
There was a problem hiding this comment.
I wouldn't recommend setting the attrs on a Series like this. It's not clear to me that it's a case supported by pandas (indexing into a DataFrame and setting on a Series).
There was a problem hiding this comment.
I don't think it would be consistent with the rest of the DataFrame methods if it wasn't supported.
Why should df.A[0] = 10 or df.A.values[0] = 100 or df.A.name = "A_new" be allowed but not df.A.attrs["unit"] = "kg"?
What way were you thinking? I'm not even sure how to do this in any other way to be honest. I've never initialized series separately and then appended them to a dataframe.
There was a problem hiding this comment.
There's some discussion at pandas-dev/pandas#35425, but let's not focus on it here.
For this PR we just need two tests. One on a dd.from_pandas(dataframe_with_attrs) and a second test with dd.from_pandas(series_with_attrs).
|
I think |
|
@TomAugspurger The tests are skipping now on older version but I suppose I have to increase the minimum pandas version as well? |
TomAugspurger
left a comment
There was a problem hiding this comment.
When we extract the attrs, we should do so conditional on the pandas version.
Co-authored-by: Tom Augspurger <TomAugspurger@users.noreply.github.com>
Co-authored-by: Tom Augspurger <TomAugspurger@users.noreply.github.com>
Co-authored-by: Tom Augspurger <TomAugspurger@users.noreply.github.com>
|
Looks good, thanks! |
Pandas has a property called
attrsthat supports attaching arbitrary metadata, such as physical units, to DataFrames and persisting it across operations. I've added support for that property in the dask Series/Dataframe as well.The
attrsdoesn't work that great with dask series because the pandasilocmethod doesn't currently persist theattrsdict. Because dask usesdf.ilocwhen creating the _meta dataframe inmake_meta_pandas(x, index=None)the attrs of all the series are therefore lost.I've added some simple tests that passes the dataframe tests but fails when testing series. It shouldn't once iloc persists the attrs dict but I find this important enough to be reminded by the failing tests.