Skip to content

[MRG+2] Norm inconsistency between RFE and SelectFromModel (was _LearntSelectorMixin) #2121#6181

Merged
amueller merged 12 commits intoscikit-learn:masterfrom
antoinewdg:master
Oct 24, 2016
Merged

[MRG+2] Norm inconsistency between RFE and SelectFromModel (was _LearntSelectorMixin) #2121#6181
amueller merged 12 commits intoscikit-learn:masterfrom
antoinewdg:master

Conversation

@antoinewdg
Copy link
Copy Markdown
Contributor

As @jnothman mentionned in the mailing list it would be nice to also add this option for RFE. The unit tests for RFE cover the case of a sparse coeff matrix, which I have trouble to handle, so I would need a little help if this were to be done.

@jnothman
Copy link
Copy Markdown
Member

Thanks! Unfortunately, norm along an axis is not supported in the earliest version of numpy we support. You could add an axis-supporting variant to sklearn.utils.fixes. Or reuse sklearn.preprocessing.normalize which supports fewer ord options.

Also, please add tests.

@MechCoder MechCoder changed the title Norm inconsistency between RFE and SelectFromModel (was _LearntSelectorMixin) #2121 [MRG] Norm inconsistency between RFE and SelectFromModel (was _LearntSelectorMixin) #2121 Jan 20, 2016
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @jnothman says, you can use sklearn.preprocessing.normalize to handle ord not being supported in older versions of NumPy

@MechCoder
Copy link
Copy Markdown
Member

For the test, I would just say to

  1. Fit SelectFromModel on multi-target data, transform the data
  2. Fit the same base estimator on the same data, manually compute the mask by comparing norm(coefficients) < threshold, mask the data
    Check 1] and 2] are same. Similar to (https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_selection/tests/test_from_model.py#L73)

@jnothman
Copy link
Copy Markdown
Member

(I think this is more about multiclass than multi-target.) You could
alternatively create a dummy estimator where fit() stores a fixed coef_.

On 21 January 2016 at 08:24, Manoj Kumar notifications@github.com wrote:

For the test, I would just say to

  1. Fit SelectFromModel on multi-target data, transform the data
  2. Fit the same base estimator on the same data, manually compute the mask
    by comparing norm(coefficients) < threshold, mask the data
    Check 1] and 2] are same. Similar to (
    https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_selection/tests/test_from_model.py#L73
    )


Reply to this email directly or view it on GitHub
#6181 (comment)
.

@MechCoder
Copy link
Copy Markdown
Member

Were you referring to the common use-case or to the test that I linked to?

@jnothman
Copy link
Copy Markdown
Member

Probably taking your comments out of context.

On 21 January 2016 at 08:57, Manoj Kumar notifications@github.com wrote:

Were you referring to the common use-case or to the test that I linked to?


Reply to this email directly or view it on GitHub
#6181 (comment)
.

@jnothman
Copy link
Copy Markdown
Member

Is multi-target supported here?

@MechCoder
Copy link
Copy Markdown
Member

It seems so, although untested.

@jnothman
Copy link
Copy Markdown
Member

Remind me what the shape of multitarget multiclass coef_ is?

@MechCoder
Copy link
Copy Markdown
Member

Oh I meant multitarget/multioutput in a regression setting, not "multitarget multiclass"

@jnothman
Copy link
Copy Markdown
Member

Oh, okay.

On 21 January 2016 at 14:21, Manoj Kumar notifications@github.com wrote:

Oh I meant multitarget in a regression setting, not "multitarget
multiclass"


Reply to this email directly or view it on GitHub
#6181 (comment)
.

@antoinewdg
Copy link
Copy Markdown
Contributor Author

Thank you for all these pieces of advice, if all goes well I will update the pull request within this week.

@antoinewdg
Copy link
Copy Markdown
Contributor Author

I chose to implement a fix for the norm function in sklearn.utils.fixes inspired by sklearn.preprocessing.normalize The code in normalize is consequently not really DRY, is it OK to refractor it to use the fix ?

@antoinewdg
Copy link
Copy Markdown
Contributor Author

It seems I got confused with the numpy version to check when implementing the fix (keepdims only exists in numpy 1.10), I'll fix this as soon as possible.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The keepdims argument seems to be YAGNI to me atleast in this context.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you mention it, it seems silly to have bothered with it !

@MechCoder
Copy link
Copy Markdown
Member

LGTM pending comments

@MechCoder
Copy link
Copy Markdown
Member

[MRG] -> [MRG+1]

@MechCoder MechCoder changed the title [MRG] Norm inconsistency between RFE and SelectFromModel (was _LearntSelectorMixin) #2121 [MRG+1] Norm inconsistency between RFE and SelectFromModel (was _LearntSelectorMixin) #2121 Feb 5, 2016
@MechCoder
Copy link
Copy Markdown
Member

@jnothman Merge?


norm_order : non-zero int, inf, -inf, default 1
Order of the norm used to filter the vectors of coefficients below
``threshold`` in the case where the ``coeff_`` attribute of the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*coef_ (one f)

@jnothman
Copy link
Copy Markdown
Member

  1. Please handle that not (0, 1) case
  2. Fix that typo
  3. Add an entry in doc/whats_new.rst
  4. LGTM; Merge!

@amueller
Copy link
Copy Markdown
Member

any updates on this?

# Conflicts:
#	sklearn/feature_selection/tests/test_from_model.py
#	sklearn/utils/fixes.py
#	sklearn/utils/tests/test_fixes.py
@antoinewdg
Copy link
Copy Markdown
Contributor Author

Here are the last fixes, sorry about this ridiculous delay.

@amueller
Copy link
Copy Markdown
Member

@antoinewdg don't sweat it, recently found an issue from 2013 that I forgot to reply to ;)

else:
from numpy.ma import MaskedArray # noqa

if 'axis' not in signature(np.linalg.norm).parameters:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you maybe add the numpy version this was added in a comment?

@amueller
Copy link
Copy Markdown
Member

Other than that, LGTM, too

@amueller amueller changed the title [MRG+1] Norm inconsistency between RFE and SelectFromModel (was _LearntSelectorMixin) #2121 [MRG+2] Norm inconsistency between RFE and SelectFromModel (was _LearntSelectorMixin) #2121 Oct 24, 2016
@antoinewdg
Copy link
Copy Markdown
Contributor Author

Sure! I'm not really sure what version that is though. The docs start mentioning it from 1.8 so I'll go with that, can't find any patch note about that.

@antoinewdg
Copy link
Copy Markdown
Contributor Author

antoinewdg commented Oct 24, 2016

While looking for patch notes I stumbled upon this stackoverflow thread.
This makes me kinda sad, but maybe using einsum directly instead of implementing a whole fix is simpler ?

EDIT: no this is stupid, we would have to make a big 'switch' over the order then.

@amueller
Copy link
Copy Markdown
Member

Yeah you should go with when the docs mention it. I'm confused on why you would do einsum instead of the transpose but whatever. The patch looks good :)

@amueller amueller merged commit 74a9756 into scikit-learn:master Oct 24, 2016
sergeyf pushed a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017
…ntSelectorMixin) scikit-learn#2121 (scikit-learn#6181)

* Norm inconsistency between RFE and SelectFromModel (was _LearntSelectorMixin) scikit-learn#2121

* safe_pwr utility

* Norm fix

* Removed safe_pwr

* 1D arrays support for norm fix

* Test case for 2d coef in SelectFromModel

* Fix numpy version requirement for norm fix

* Implement fixes suggested by @jnothman

* Add numpy version requiring the fix.
Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
…ntSelectorMixin) scikit-learn#2121 (scikit-learn#6181)

* Norm inconsistency between RFE and SelectFromModel (was _LearntSelectorMixin) scikit-learn#2121

* safe_pwr utility

* Norm fix

* Removed safe_pwr

* 1D arrays support for norm fix

* Test case for 2d coef in SelectFromModel

* Fix numpy version requirement for norm fix

* Implement fixes suggested by @jnothman

* Add numpy version requiring the fix.
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
…ntSelectorMixin) scikit-learn#2121 (scikit-learn#6181)

* Norm inconsistency between RFE and SelectFromModel (was _LearntSelectorMixin) scikit-learn#2121

* safe_pwr utility

* Norm fix

* Removed safe_pwr

* 1D arrays support for norm fix

* Test case for 2d coef in SelectFromModel

* Fix numpy version requirement for norm fix

* Implement fixes suggested by @jnothman

* Add numpy version requiring the fix.
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
…ntSelectorMixin) scikit-learn#2121 (scikit-learn#6181)

* Norm inconsistency between RFE and SelectFromModel (was _LearntSelectorMixin) scikit-learn#2121

* safe_pwr utility

* Norm fix

* Removed safe_pwr

* 1D arrays support for norm fix

* Test case for 2d coef in SelectFromModel

* Fix numpy version requirement for norm fix

* Implement fixes suggested by @jnothman

* Add numpy version requiring the fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants