[MRG] Fix incorrect error when OneHotEncoder.transform called prior to fit#12443
[MRG] Fix incorrect error when OneHotEncoder.transform called prior to fit#12443jnothman merged 6 commits intoscikit-learn:masterfrom
Conversation
rth
left a comment
There was a problem hiding this comment.
Please add a non regression test and rename the title to a summary of what this PR does.
| n_features = X.shape[1] | ||
| sel = np.zeros(n_features, dtype=bool) | ||
| sel[np.asarray(self.categorical_features)] = True | ||
| if sum(sel) == 0: |
There was a problem hiding this comment.
I have not looked at the code in detail, but isn't this equivalent to,
not self.categorical_features.any()or something similar ?
There was a problem hiding this comment.
categorical_features can be a array of indices or a mask. This works in either case. I followed the pattern in sklearn.preprocessing.base._transform_selected
There was a problem hiding this comment.
Can you add a comment above this code block to explain the specific case that this code is checking for?
|
the failure is because you don't ignore the FutureWarning, I think. |
| X_tr = enc.fit_transform(X) | ||
| expected_features = np.array(list(), dtype='object') | ||
| assert_array_equal(X, X_tr) | ||
| assert_equal(enc.categories_, list()) |
There was a problem hiding this comment.
With the adoption of pytest, we are phasing out use of test helpers assert_equal, assert_true, etc. Please use bare assert statements, e.g. assert x == y, assert not x, etc.
There was a problem hiding this comment.
Is this true for assert_array_equal? Is it preferable to do assert np.array_equal(x, y)
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Add a note in the whatsnew file?
| n_features = X.shape[1] | ||
| sel = np.zeros(n_features, dtype=bool) | ||
| sel[np.asarray(self.categorical_features)] = True | ||
| if sum(sel) == 0: |
There was a problem hiding this comment.
Can you add a comment above this code block to explain the specific case that this code is checking for?
|
Thanks @dillongardner |
…ybutton * upstream/master: Fix max_depth overshoot in BFS expansion of trees (scikit-learn#12344) TST don't test utils.fixes docstrings (scikit-learn#12576) DOC Fix typo (scikit-learn#12563) FIX Workaround limitation of cloudpickle under PyPy (scikit-learn#12566) MNT bare asserts (scikit-learn#12571) FIX incorrect error when OneHotEncoder.transform called prior to fit (scikit-learn#12443)
…ikit-learn into add_codeblock_copybutton * 'add_codeblock_copybutton' of https://github.com/thoo/scikit-learn: Move an extension under sphinx_copybutton/ Move css/js file under sphinxext/ Fix max_depth overshoot in BFS expansion of trees (scikit-learn#12344) TST don't test utils.fixes docstrings (scikit-learn#12576) DOC Fix typo (scikit-learn#12563) FIX Workaround limitation of cloudpickle under PyPy (scikit-learn#12566) MNT bare asserts (scikit-learn#12571) FIX incorrect error when OneHotEncoder.transform called prior to fit (scikit-learn#12443) Retrigger travis:max time limit error DOC: Clarify `cv` parameter description in `GridSearchCV` (scikit-learn#12495) FIX remove FutureWarning in _object_dtype_isnan and add test (scikit-learn#12567) DOC Add 's' to "correspond" in docs for Hamming Loss. (scikit-learn#12565) EXA Fix comment in plot-iris-logistic example (scikit-learn#12564) FIX stop words validation in text vectorizers with custom preprocessors / tokenizers (scikit-learn#12393) DOC Add skorch to related projects (scikit-learn#12561) MNT Don't change self.n_values in OneHotEncoder.fit (scikit-learn#12286) MNT Remove unused assert_true imports (scikit-learn#12560) TST autoreplace assert_true(...==...) with plain assert (scikit-learn#12547) DOC: add a testimonial from JP Morgan (scikit-learn#12555)
|
This change breaks In I propose: |
… to fit (scikit-learn#12443)" This reverts commit 6d389ba.
… to fit (scikit-learn#12443)" This reverts commit 6d389ba.
Reference Issues/PRs
Fixes #12395
What does this implement/fix? Explain your changes.
Fix checks for
categories_attribute when transform is called.The edge case when class is instantiated with
categorical_featuresset to all False,_legacy_modeis set to True but_legacy_fit_transformnever gets called. To fix this, explicitly tested for this and setcategories_to an empty list. This fix also allowsget_feature_namesto return an empty array.Any other comments?