TST/CLN: break up & parametrize tests for df.set_index#22236
TST/CLN: break up & parametrize tests for df.set_index#22236jreback merged 13 commits intopandas-dev:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #22236 +/- ##
=======================================
Coverage 92.17% 92.17%
=======================================
Files 169 169
Lines 50708 50708
=======================================
Hits 46740 46740
Misses 3968 3968
Continue to review full report at Codecov.
|
pandas/tests/frame/common.py
Outdated
| return pd.DataFrame(arr, columns=['one', 'two', 'three'], | ||
| index=['a', 'b', 'c']) | ||
|
|
||
| @cache_readonly |
There was a problem hiding this comment.
what is this? pls use a fixture
There was a problem hiding this comment.
Not 100% sure, but don't the attributes of TestData have the roles of fixtures for class-based tests?
There was a problem hiding this comment.
no that is a very bad pattern. we don't want to do that. pls use fixtures instead
There was a problem hiding this comment.
I don't think it's fair to blame me for following an existing pattern - just look at
@cache_readonly
def simple(self):
arr = np.array([[1., 2., 3.],
[4., 5., 6.],
[7., 8., 9.]])
return pd.DataFrame(arr, columns=['one', 'two', 'three'],
index=['a', 'b', 'c'])
directly above.
Do you want me to change all of test_alter_axes from classes to functions? That's the only (clean) way I can think of.
There was a problem hiding this comment.
not blaming you, asking you to use fixtures.
There was a problem hiding this comment.
that is orthogonal to switching from classes.
|
|
||
| from pandas.tests.frame.common import TestData | ||
|
|
||
| key = lambda x: x.name |
There was a problem hiding this comment.
they are "container"-types for testing, in the sense that -- applied to df['A'] -- they give the correct type of container. They are to be thought of like Series or Index, but since the bare MultiIndex()- constructor does not take a vector, I wrote it as a lambda. Makes testing all allowed cases nicely parametrisable.
There was a problem hiding this comment.
this is obtuse, I can't tell what you are doing. if you need something create it where you need it.
There was a problem hiding this comment.
Refactored. Hope you like it better like this - the advantage is that it stays local and that the generated test names are understandable
|
|
||
| class TestDataFrameAlterAxes(TestData): | ||
|
|
||
| def test_set_index_manually(self): |
There was a problem hiding this comment.
can you find a different name than manually
There was a problem hiding this comment.
I really think df.index = idx deserves the name "manual". It should also be discouraged, that's why there's set_index in the first place. df.set_index(idx) is then the API-supported (i.e. "non-manual") way of doing it
There was a problem hiding this comment.
Nevermind, how do you like "directly"? ;-)
| class TestDataFrameAlterAxes(TestData): | ||
|
|
||
| def test_set_index_manually(self): | ||
| df = self.mixed_frame.copy() |
There was a problem hiding this comment.
so I would rather make mixed_frame a fixture to avoid lots of boilerplate here
There was a problem hiding this comment.
This is a class-based test module. mixed_frame already existed before, and has - for all intents and purposes - the role of a fixture.
| @pytest.mark.parametrize('inplace', [True, False]) | ||
| @pytest.mark.parametrize('drop', [True, False]) | ||
| def test_set_index_drop_inplace(self, drop, inplace, keys): | ||
| df = self.dummy.copy() |
There was a problem hiding this comment.
make dummy a fixture (and rename it to something else)
There was a problem hiding this comment.
Refactored. Hope you like it better like this - the advantage is that it stays local and that the generated test names are understandable
There was a problem hiding this comment.
yes this is better (still pls make dummy a fixture)
| result = df2.set_index('key') | ||
| tm.assert_frame_equal(result, expected) | ||
|
|
||
| @pytest.mark.parametrize('container', [Series, Index, np.array, mi]) |
There was a problem hiding this comment.
call this box
put comments before the test
There was a problem hiding this comment.
I think that "box" is less clear than "container", but OK
|
|
||
| tm.assert_frame_equal(result, expected) | ||
|
|
||
| @pytest.mark.parametrize('container', [Series, Index, np.array, list, mi]) |
|
|
||
| tm.assert_frame_equal(result, expected) | ||
|
|
||
| @pytest.mark.parametrize('elem2', [key, Series, Index, np.array, list, mi]) |
There was a problem hiding this comment.
elem is not very descriptive
|
|
||
| keys = [elem1(df['A']), elem2(df['A'])] | ||
|
|
||
| # == gives ambiguous Boolean for Series |
There was a problem hiding this comment.
I said
strongly parametrized test_set_index_pass_arrays and test_set_index_duplicate_names, including several combinations and cases that were not tested before
This is (essentially) a very beefed-up version of test_set_index_duplicate_names. It test appending duplicate arrays in various forms (and with various kwargs, e.g. against drop), and tests for the error of passing duplicate column keys directly.
pandas/core/frame.py
Outdated
| if not isinstance(keys, list): | ||
| keys = [keys] | ||
|
|
||
| col_labels = [x for x in keys |
There was a problem hiding this comment.
what is all this for? if this is just reorging tests, why are you changing code?
There was a problem hiding this comment.
I wrote that:
I also added better warnings in case there are duplicate column labels in keys, and some corresponding tests.
Currently, df.set_index(['A', 'A']) yields something like:
ValueError: Duplicated level name: "A", assigned to level 1, is already used for level 0.
which was
- untested
- not a very clear warning
- bad for perf, because all arrays get processed already and only upon creation of the
MultiIndexis the error raised.
I fixed that with the code block above ('better warnings" is also in the title).
f773b94 to
ee569b3
Compare
pandas/tests/frame/common.py
Outdated
| return pd.DataFrame(arr, columns=['one', 'two', 'three'], | ||
| index=['a', 'b', 'c']) | ||
|
|
||
| @cache_readonly |
There was a problem hiding this comment.
no that is a very bad pattern. we don't want to do that. pls use fixtures instead
|
|
||
| from pandas.tests.frame.common import TestData | ||
|
|
||
| key = lambda x: x.name |
There was a problem hiding this comment.
this is obtuse, I can't tell what you are doing. if you need something create it where you need it.
| @pytest.mark.parametrize('inplace', [True, False]) | ||
| @pytest.mark.parametrize('drop', [True, False]) | ||
| def test_set_index_drop_inplace(self, drop, inplace, keys): | ||
| df = self.dummy.copy() |
|
Hello @h-vetinari! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on August 31, 2018 at 20:35 Hours UTC |
7aa1af4 to
f46c1e4
Compare
|
@jreback Don't why my comments aren't showing up here - copying them to the thread.
I don't think it's fair to blame me for following an existing pattern - just look at directly above (https://github.com/pandas-dev/pandas/blob/v0.23.4/pandas/tests/frame/common.py#L97-104) Do you want me to change all of test_alter_axes from classes to functions? That's the only other (clean) way I can think of.
Refactored. Hope you like it better like this - the advantage is that it stays local and that the generated test names are understandable |
pandas/core/frame.py
Outdated
| if not isinstance(x, (Series, Index, MultiIndex, | ||
| list, np.ndarray))] | ||
| if any(x not in self for x in col_labels): | ||
| missing = [x for x in col_labels if x not in self] |
There was a problem hiding this comment.
are there explict tests for each of the branches. pls add a comment to each one indicating what you are checking.
There was a problem hiding this comment.
Yest there are tests for each branch, see test_set_index_pass_arrays_duplicate and test_set_index_raise. Added comments
pandas/tests/frame/common.py
Outdated
| return pd.DataFrame(arr, columns=['one', 'two', 'three'], | ||
| index=['a', 'b', 'c']) | ||
|
|
||
| @cache_readonly |
There was a problem hiding this comment.
not blaming you, asking you to use fixtures.
pandas/tests/frame/common.py
Outdated
| return pd.DataFrame(arr, columns=['one', 'two', 'three'], | ||
| index=['a', 'b', 'c']) | ||
|
|
||
| @cache_readonly |
There was a problem hiding this comment.
that is orthogonal to switching from classes.
| @pytest.mark.parametrize('inplace', [True, False]) | ||
| @pytest.mark.parametrize('drop', [True, False]) | ||
| def test_set_index_drop_inplace(self, drop, inplace, keys): | ||
| df = self.dummy.copy() |
There was a problem hiding this comment.
yes this is better (still pls make dummy a fixture)
| tm.assert_frame_equal(result, expected) | ||
|
|
||
| # also test index name if append=True (name is duplicate here for B) | ||
| @pytest.mark.parametrize('box', [Series, Index, np.array, 'MultiIndex']) |
There was a problem hiding this comment.
dont' use a string, just directly put the lambda here. ideally we have NO if/else in test functions, sometimes its unavoidable, but making tests as simple as possible is the key
| df = self.dummy.copy() | ||
| df.index.name = index_name | ||
|
|
||
| # update constructor in case of MultiIndex |
| df.index.name = index_name | ||
|
|
||
| # transform strings to correct box constructor | ||
| def rebox(x): |
There was a problem hiding this comment.
this is not good at all, pls don't do this inside the test function, just use a lambda in the box itself
| # keep the timezone | ||
| result = i.to_series(keep_tz=True) | ||
| assert_series_equal(result.reset_index(drop=True), expected) | ||
| # convert to series while keeping the timezone |
There was a problem hiding this comment.
The whole test is about converting a DatetimeIndex to a Series, so I renamed the test as such.
I renamed i to idx for better readability.
Finally, idx.to_series(keep_tz=True) yields:
B
2013-01-01 13:00:00-08:00 2013-01-01 13:00:00-08:00
2013-01-02 14:00:00-08:00 2013-01-02 14:00:00-08:00
Name: B, dtype: datetime64[ns, US/Pacific]
so needs an index change to fit with expected. I just found that using the index-kwarg of to_series is cleaner and more understandable than using reset_index - and since it's the conversion that's being tested and not the manner of the index reset, I changed it.
h-vetinari
left a comment
There was a problem hiding this comment.
Thanks for the detailed review. Should be getting close now.
pandas/core/frame.py
Outdated
| if not isinstance(x, (Series, Index, MultiIndex, | ||
| list, np.ndarray))] | ||
| if any(x not in self for x in col_labels): | ||
| missing = [x for x in col_labels if x not in self] |
There was a problem hiding this comment.
Yest there are tests for each branch, see test_set_index_pass_arrays_duplicate and test_set_index_raise. Added comments
pandas/tests/frame/common.py
Outdated
| return pd.DataFrame(arr, columns=['one', 'two', 'three'], | ||
| index=['a', 'b', 'c']) | ||
|
|
||
| @cache_readonly |
| # keep the timezone | ||
| result = i.to_series(keep_tz=True) | ||
| assert_series_equal(result.reset_index(drop=True), expected) | ||
| # convert to series while keeping the timezone |
There was a problem hiding this comment.
The whole test is about converting a DatetimeIndex to a Series, so I renamed the test as such.
I renamed i to idx for better readability.
Finally, idx.to_series(keep_tz=True) yields:
B
2013-01-01 13:00:00-08:00 2013-01-01 13:00:00-08:00
2013-01-02 14:00:00-08:00 2013-01-02 14:00:00-08:00
Name: B, dtype: datetime64[ns, US/Pacific]
so needs an index change to fit with expected. I just found that using the index-kwarg of to_series is cleaner and more understandable than using reset_index - and since it's the conversion that's being tested and not the manner of the index reset, I changed it.
jreback
left a comment
There was a problem hiding this comment.
ok looks closer a few more comments
pandas/core/frame.py
Outdated
| raise KeyError('{}'.format(missing)) | ||
| elif len(set(col_labels)) < len(col_labels): | ||
| # if all are valid labels, but there are duplicates | ||
| dup = Series(col_labels) |
There was a problem hiding this comment.
rather use a set difference operation here
There was a problem hiding this comment.
maybe I'm not seeing it, but IMO set difference isn't gonna show duplicates (because as a set, they'll be the same)...
There was a problem hiding this comment.
subtract the sets of all - others
There was a problem hiding this comment.
this is not about the set of all inputs - it's strictly about column labels. E.g.
df.set_index(['A', 'A', 'B', df.A, some_series, some_index])
[...]
col_labels = ['A', 'A', 'B'] # at the line of your comment (in this case)
No set operation that I can think of would yield A, which is what I want to raise. (multisets would work, but that's hardly more intuitive than duplicated, IMO)
|
|
||
|
|
||
| @pytest.fixture | ||
| def frame_of_index_cols(): |
There was a problem hiding this comment.
pls move to pandas/tests/frame/conftest.py need to start this.
| class TestDataFrameAlterAxes(TestData): | ||
|
|
||
| def test_set_index_directly(self): | ||
| df = self.mixed_frame.copy() |
There was a problem hiding this comment.
add this to the conftest as well as a fixture
|
I translated all the "fixture"-attributes from |
a1da79c to
32b618f
Compare
|
@jreback All green, and should be good to go. |
|
@jreback Another ping :) |
|
i will look soon |
|
@jreback I know you're busy, but you already said "ok looks closer a few more comments", and the changes since then are easy: just the fixtures you asked for. Would like to get this through as it's blocking two other PRs. |
jreback
left a comment
There was a problem hiding this comment.
ok, pls open a new issue that refs this, to remove use of TestData in favor of fixtures
pandas/core/frame.py
Outdated
|
|
||
| # collect elements from "keys" that are not allowed array types | ||
| col_labels = [x for x in keys | ||
| if not isinstance(x, (Series, Index, MultiIndex, |
There was a problem hiding this comment.
why isn't this just isinstance(x, tuple) or is_scalar(x)?
There was a problem hiding this comment.
These are the cases (except valid column keys) that currently don't raise a KeyError. I've kept the error reporting exactly the same as before:
pd.DataFrame([[0,1], [2,3]]).set_index(map(str,[1,2]))
# KeyError: <map object at 0x000001D380115550>
Could be changed, but then I need to know the desired allowed signature / error reporting
There was a problem hiding this comment.
Tried the ABC Versions, but getting errors that indexes are not hashable. Left it as is for the moment.
____________________ TestDataFrameAlterAxes.test_set_index ____________________
self = <pandas.tests.frame.test_alter_axes.TestDataFrameAlterAxes object at 0x0000018FC05ED400>
mixed_frame = A B C D foo
lem22AYh2f -0.599136 -0.811715 -0.135242 0.764100 bar
iBogSI...05 bar
vDdWVTLtoj -0.433490 -1.416721 0.460720 -0.539860 bar
Pw5bUHt4sR 0.180016 1.651358 -1.041539 -0.832112 bar
def test_set_index(self, mixed_frame):
df = mixed_frame
idx = Index(np.arange(len(df))[::-1])
> df = df.set_index(idx)
pandas\tests\frame\test_alter_axes.py:41:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas\core\frame.py:3874: in set_index
if any(x not in self for x in col_labels):
pandas\core\frame.py:3874: in <genexpr>
if any(x not in self for x in col_labels):
pandas\core\generic.py:1654: in __contains__
return key in self._info_axis
pandas\core\indexes\base.py:1994: in __contains__
hash(key)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = Int64Index([29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13,
12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
dtype='int64')
def __hash__(self):
> raise TypeError("unhashable type: %r" % type(self).__name__)
E TypeError: unhashable type: 'Int64Index'
pandas\core\indexes\base.py:2021: TypeError
_________________ TestDataFrameAlterAxes.test_set_index_cast __________________
self = <pandas.tests.frame.test_alter_axes.TestDataFrameAlterAxes object at 0x0000018FCA8EE668>
def test_set_index_cast(self):
# issue casting an index then set_index
df = DataFrame({'A': [1.1, 2.2, 3.3], 'B': [5.0, 6.1, 7.2]},
index=[2010, 2011, 2012])
> df2 = df.set_index(df.index.astype(np.int32))
pandas\tests\frame\test_alter_axes.py:50:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas\core\frame.py:3874: in set_index
if any(x not in self for x in col_labels):
pandas\core\frame.py:3874: in <genexpr>
if any(x not in self for x in col_labels):
pandas\core\generic.py:1654: in __contains__
return key in self._info_axis
pandas\core\indexes\base.py:1994: in __contains__
hash(key)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = Int64Index([2010, 2011, 2012], dtype='int64')
def __hash__(self):
> raise TypeError("unhashable type: %r" % type(self).__name__)
E TypeError: unhashable type: 'Int64Index'
pandas\core\indexes\base.py:2021: TypeError
_______________ TestDataFrameAlterAxes.test_set_index_timezone ________________
self = <pandas.tests.frame.test_alter_axes.TestDataFrameAlterAxes object at 0x0000018FCAD1AEF0>
def test_set_index_timezone(self):
# GH 12358
# tz-aware Series should retain the tz
idx = to_datetime(["2014-01-01 10:10:10"],
utc=True).tz_convert('Europe/Rome')
df = DataFrame({'A': idx})
> assert df.set_index(idx).index[0].hour == 11
pandas\tests\frame\test_alter_axes.py:343:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas\core\frame.py:3874: in set_index
if any(x not in self for x in col_labels):
pandas\core\frame.py:3874: in <genexpr>
if any(x not in self for x in col_labels):
pandas\core\generic.py:1654: in __contains__
return key in self._info_axis
pandas\core\indexes\base.py:1994: in __contains__
hash(key)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = DatetimeIndex(['2014-01-01 11:10:10+01:00'], dtype='datetime64[ns, Europe/Rome]', freq=None)
def __hash__(self):
> raise TypeError("unhashable type: %r" % type(self).__name__)
E TypeError: unhashable type: 'DatetimeIndex'
pandas\core\indexes\base.py:2021: TypeError
______________ TestDataFrameAlterAxes.test_dti_set_index_reindex ______________
self = <pandas.tests.frame.test_alter_axes.TestDataFrameAlterAxes object at 0x0000018FCA7FDD68>
def test_dti_set_index_reindex(self):
# GH 6631
df = DataFrame(np.random.random(6))
idx1 = date_range('2011/01/01', periods=6, freq='M', tz='US/Eastern')
idx2 = date_range('2013', periods=6, freq='A', tz='Asia/Tokyo')
> df = df.set_index(idx1)
pandas\tests\frame\test_alter_axes.py:413:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas\core\frame.py:3874: in set_index
if any(x not in self for x in col_labels):
pandas\core\frame.py:3874: in <genexpr>
if any(x not in self for x in col_labels):
pandas\core\generic.py:1654: in __contains__
return key in self._info_axis
pandas\core\indexes\base.py:1994: in __contains__
hash(key)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = DatetimeIndex(['2011-01-31 00:00:00-05:00', '2011-02-28 00:00:00-05:00',
'2011-03-31 00:00:00-04:00', '... '2011-05-31 00:00:00-04:00', '2011-06-30 00:00:00-04:00'],
dtype='datetime64[ns, US/Eastern]', freq='M')
def __hash__(self):
> raise TypeError("unhashable type: %r" % type(self).__name__)
E TypeError: unhashable type: 'DatetimeIndex'
There was a problem hiding this comment.
huh? pls use the ABC version, we do this everywhere else. To be very honest I would split this PR up into 2 parts as I think you are actually changing something here but its very very hard to tell.
pandas/core/frame.py
Outdated
| # if there are any labels that are invalid, we raise a KeyError | ||
| missing = [x for x in col_labels if x not in self] | ||
| raise KeyError('{}'.format(missing)) | ||
| elif len(set(col_labels)) < len(col_labels): |
pandas/core/frame.py
Outdated
| raise KeyError('{}'.format(missing)) | ||
| elif len(set(col_labels)) < len(col_labels): | ||
| # if all are valid labels, but there are duplicates | ||
| dup = Series(col_labels) |
There was a problem hiding this comment.
subtract the sets of all - others
| @@ -28,244 +25,284 @@ | |||
|
|
|||
| class TestDataFrameAlterAxes(TestData): | |||
There was a problem hiding this comment.
you shouldn't need TestData any longer
32b618f to
d2f1e78
Compare
h-vetinari
left a comment
There was a problem hiding this comment.
Thanks for the review; opened #22471 as requested
pandas/core/frame.py
Outdated
|
|
||
| # collect elements from "keys" that are not allowed array types | ||
| col_labels = [x for x in keys | ||
| if not isinstance(x, (Series, Index, MultiIndex, |
There was a problem hiding this comment.
These are the cases (except valid column keys) that currently don't raise a KeyError. I've kept the error reporting exactly the same as before:
pd.DataFrame([[0,1], [2,3]]).set_index(map(str,[1,2]))
# KeyError: <map object at 0x000001D380115550>
Could be changed, but then I need to know the desired allowed signature / error reporting
pandas/core/frame.py
Outdated
| raise KeyError('{}'.format(missing)) | ||
| elif len(set(col_labels)) < len(col_labels): | ||
| # if all are valid labels, but there are duplicates | ||
| dup = Series(col_labels) |
There was a problem hiding this comment.
this is not about the set of all inputs - it's strictly about column labels. E.g.
df.set_index(['A', 'A', 'B', df.A, some_series, some_index])
[...]
col_labels = ['A', 'A', 'B'] # at the line of your comment (in this case)
No set operation that I can think of would yield A, which is what I want to raise. (multisets would work, but that's hardly more intuitive than duplicated, IMO)
pandas/core/frame.py
Outdated
|
|
||
| # collect elements from "keys" that are not allowed array types | ||
| col_labels = [x for x in keys | ||
| if not isinstance(x, (Series, Index, MultiIndex, |
There was a problem hiding this comment.
Tried the ABC Versions, but getting errors that indexes are not hashable. Left it as is for the moment.
____________________ TestDataFrameAlterAxes.test_set_index ____________________
self = <pandas.tests.frame.test_alter_axes.TestDataFrameAlterAxes object at 0x0000018FC05ED400>
mixed_frame = A B C D foo
lem22AYh2f -0.599136 -0.811715 -0.135242 0.764100 bar
iBogSI...05 bar
vDdWVTLtoj -0.433490 -1.416721 0.460720 -0.539860 bar
Pw5bUHt4sR 0.180016 1.651358 -1.041539 -0.832112 bar
def test_set_index(self, mixed_frame):
df = mixed_frame
idx = Index(np.arange(len(df))[::-1])
> df = df.set_index(idx)
pandas\tests\frame\test_alter_axes.py:41:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas\core\frame.py:3874: in set_index
if any(x not in self for x in col_labels):
pandas\core\frame.py:3874: in <genexpr>
if any(x not in self for x in col_labels):
pandas\core\generic.py:1654: in __contains__
return key in self._info_axis
pandas\core\indexes\base.py:1994: in __contains__
hash(key)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = Int64Index([29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13,
12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
dtype='int64')
def __hash__(self):
> raise TypeError("unhashable type: %r" % type(self).__name__)
E TypeError: unhashable type: 'Int64Index'
pandas\core\indexes\base.py:2021: TypeError
_________________ TestDataFrameAlterAxes.test_set_index_cast __________________
self = <pandas.tests.frame.test_alter_axes.TestDataFrameAlterAxes object at 0x0000018FCA8EE668>
def test_set_index_cast(self):
# issue casting an index then set_index
df = DataFrame({'A': [1.1, 2.2, 3.3], 'B': [5.0, 6.1, 7.2]},
index=[2010, 2011, 2012])
> df2 = df.set_index(df.index.astype(np.int32))
pandas\tests\frame\test_alter_axes.py:50:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas\core\frame.py:3874: in set_index
if any(x not in self for x in col_labels):
pandas\core\frame.py:3874: in <genexpr>
if any(x not in self for x in col_labels):
pandas\core\generic.py:1654: in __contains__
return key in self._info_axis
pandas\core\indexes\base.py:1994: in __contains__
hash(key)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = Int64Index([2010, 2011, 2012], dtype='int64')
def __hash__(self):
> raise TypeError("unhashable type: %r" % type(self).__name__)
E TypeError: unhashable type: 'Int64Index'
pandas\core\indexes\base.py:2021: TypeError
_______________ TestDataFrameAlterAxes.test_set_index_timezone ________________
self = <pandas.tests.frame.test_alter_axes.TestDataFrameAlterAxes object at 0x0000018FCAD1AEF0>
def test_set_index_timezone(self):
# GH 12358
# tz-aware Series should retain the tz
idx = to_datetime(["2014-01-01 10:10:10"],
utc=True).tz_convert('Europe/Rome')
df = DataFrame({'A': idx})
> assert df.set_index(idx).index[0].hour == 11
pandas\tests\frame\test_alter_axes.py:343:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas\core\frame.py:3874: in set_index
if any(x not in self for x in col_labels):
pandas\core\frame.py:3874: in <genexpr>
if any(x not in self for x in col_labels):
pandas\core\generic.py:1654: in __contains__
return key in self._info_axis
pandas\core\indexes\base.py:1994: in __contains__
hash(key)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = DatetimeIndex(['2014-01-01 11:10:10+01:00'], dtype='datetime64[ns, Europe/Rome]', freq=None)
def __hash__(self):
> raise TypeError("unhashable type: %r" % type(self).__name__)
E TypeError: unhashable type: 'DatetimeIndex'
pandas\core\indexes\base.py:2021: TypeError
______________ TestDataFrameAlterAxes.test_dti_set_index_reindex ______________
self = <pandas.tests.frame.test_alter_axes.TestDataFrameAlterAxes object at 0x0000018FCA7FDD68>
def test_dti_set_index_reindex(self):
# GH 6631
df = DataFrame(np.random.random(6))
idx1 = date_range('2011/01/01', periods=6, freq='M', tz='US/Eastern')
idx2 = date_range('2013', periods=6, freq='A', tz='Asia/Tokyo')
> df = df.set_index(idx1)
pandas\tests\frame\test_alter_axes.py:413:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas\core\frame.py:3874: in set_index
if any(x not in self for x in col_labels):
pandas\core\frame.py:3874: in <genexpr>
if any(x not in self for x in col_labels):
pandas\core\generic.py:1654: in __contains__
return key in self._info_axis
pandas\core\indexes\base.py:1994: in __contains__
hash(key)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = DatetimeIndex(['2011-01-31 00:00:00-05:00', '2011-02-28 00:00:00-05:00',
'2011-03-31 00:00:00-04:00', '... '2011-05-31 00:00:00-04:00', '2011-06-30 00:00:00-04:00'],
dtype='datetime64[ns, US/Eastern]', freq='M')
def __hash__(self):
> raise TypeError("unhashable type: %r" % type(self).__name__)
E TypeError: unhashable type: 'DatetimeIndex'
pandas/core/frame.py
Outdated
|
|
||
| # collect elements from "keys" that are not allowed array types | ||
| col_labels = [x for x in keys | ||
| if not isinstance(x, (Series, Index, MultiIndex, |
There was a problem hiding this comment.
huh? pls use the ABC version, we do this everywhere else. To be very honest I would split this PR up into 2 parts as I think you are actually changing something here but its very very hard to tell.
Fair enough, this was actually a good intuition. I split out all the new things into a new issue/PR (#22484 #22486), and reverted all the new warnings here. |
| @@ -0,0 +1,121 @@ | |||
| import pytest | |||
There was a problem hiding this comment.
A couple of things about this file:
- Do you use all of these fixtures in your changes?
- I would prefer if the naming is a little more consistent e.g.:
- You have the word "frame" is some of your fixture names but not othres
- You have underscores between words in some names but not others
There was a problem hiding this comment.
For this PR, I turned an often-used DF into a fixture - together with the other attributes of TestData. @jreback then told me to start a conftest.py and put it there, together with the other TestData-attributes used in this module (frame, mixed_frame).
I translated all the attributes into conftest.py without renaming them, so that they can be replaced on a per-module-basis as laid out in #22471. The names of the fixtures are clearly suboptimal, but following up on fixturizing the other modules would be much harder if I start renaming now.
There was a problem hiding this comment.
That's fair. Let's save for a follow-up then.
There was a problem hiding this comment.
Since the names now changed after all, I thought I'd ask for your opinion on the naming/consistency.
There was a problem hiding this comment.
@gfyoung
Thanks. I just added a few more consistency fixes (but squashed to simplify reviewing), but only the first three fixture names were affected.
c175d85 to
befb356
Compare
79541b8 to
d050112
Compare
|
I've had a |
d050112 to
f42793a
Compare
f42793a to
4ac9633
Compare
|
@jreback Green (after 5 retriggers...) |
|
thanks. in the future, not necessary to push multiple times, just needed a single one, if the failure is unrelated. |
While working on #22225 I had the strong urge to clean up the tests for
df.set_index(to be able to work off of them forseries.set_index)Since the diff is pretty busted, here's a description. In
tests/frame/test_alter_axes.py, I did:test_set_index2into several piecestest_set_index_bugtotest_set_index_after_mutation(following corresponding GH issue)test_set_index_nonuniqtotest_set_index_verify_integrity(also added a MI-case)test_set_index_pass_arraysandtest_set_index_duplicate_names, including several combinations and cases that were not tested beforepd.with direct imports (best practice according to review in TST/CLN: series.duplicated; parametrisation; fix warning #21899):assert_series_equaletc. withtm.assert_series_equal(best practice according to review in TST/CLN: series.duplicated; parametrisation; fix warning #21899):check_names=Falsein several cases.I also added better warnings in case there are duplicate column labels in
keys, and some corresponding tests.