Skip to content

ENH Allow SelectFromModel's max_features to accept callables#22356

Merged
adrinjalali merged 43 commits intoscikit-learn:mainfrom
Micky774:sfm_max_features_upgrade
Mar 22, 2022
Merged

ENH Allow SelectFromModel's max_features to accept callables#22356
adrinjalali merged 43 commits intoscikit-learn:mainfrom
Micky774:sfm_max_features_upgrade

Conversation

@Micky774
Copy link
Copy Markdown
Contributor

@Micky774 Micky774 commented Feb 1, 2022

Reference Issues/PRs

Fixes #21117

What does this implement/fix? Explain your changes.

This PR improves SelectFromModel's max_features by allowing it to:

  • Accept integral values to preserve prior behaviour
  • Accept real values to prescribe a proportion of features to allow
  • Accepts a callable to return an integral value for the number of features to allow

To accomplish this, I introduced a new private attribute max_features_ which is calculated in fit based on what type was received for max_features. The calculations for integral, and callable values, e.g.:

  • max_features_=max_features
  • max_features_=round(max_features*X.shape[1])
  • max_features_=max_features(X)

I then augmented existing unit tests to account for these new cases

Any other comments?

@Micky774 Micky774 changed the title ENH Allow SelectFromModel's max_features to accept floats and callables ENH Allow SelectFromModel's max_features to accept floats and callables Feb 1, 2022
Copy link
Copy Markdown
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having both floats and callables in a single PR may stall it. I think it's best to pick one for the first PR.

Callable is the more interesting one since it will be the first time we will support a callable for a *_features parameter.

Copy link
Copy Markdown
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! This looks good.

For new code, I prefer using single backticks. (We changed the setting ~2.5 years ago to make single backticks == double backticks)

@Micky774 Micky774 changed the title ENH Allow SelectFromModel's max_features to accept floats and callables ENH Allow SelectFromModel's max_features to accept callables Feb 2, 2022
@Micky774
Copy link
Copy Markdown
Contributor Author

Micky774 commented Feb 3, 2022

Updated the code to incorporate your suggestions! I noticed the different styles of back-ticks and was curious about that -- thanks for clearing it up! Right now there's not really any style standardization regarding single vs double back-ticks and quotes right?

Copy link
Copy Markdown
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!

@Micky774
Copy link
Copy Markdown
Contributor Author

@adrinjalali @glemaitre Would either of you be interested in reviewing this PR?

@thomasjpfan
Copy link
Copy Markdown
Member

The CI failures are related to the changes in this PR.

@Micky774 Micky774 closed this Feb 22, 2022
@Micky774 Micky774 deleted the sfm_max_features_upgrade branch February 22, 2022 18:06
@Micky774 Micky774 restored the sfm_max_features_upgrade branch March 4, 2022 03:55
@Micky774 Micky774 reopened this Mar 4, 2022
@adrinjalali
Copy link
Copy Markdown
Member

CI failure is related to our doc changes. I'm not sure how to fix them (has to to with indentations and new lines). You can try and see if adding new lines or changing indentation would fix it:

+ echo '<strong>Sphinx Warnings in affected files</strong><ul>'
+ echo '/home/circleci/project/sklearn/feature_selection/_from_model.py:docstring of sklearn.feature_selection._from_model.SelectFromModel:44: WARNING: Bullet list ends without a blank line; unexpected unindent.
/home/circleci/project/sklearn/feature_selection/_from_model.py:docstring of sklearn.feature_selection._from_model.SelectFromModel:47: WARNING: Bullet list ends without a blank line; unexpected unindent.'
+ sed 's/\/home\/circleci\/project\//<li>/g'
+ echo '</ul></body></html>'
+ '[' '/home/circleci/project/sklearn/feature_selection/_from_model.py:docstring of sklearn.feature_selection._from_model.SelectFromModel:44: WARNING: Bullet list ends without a blank line; unexpected unindent.
/home/circleci/project/sklearn/feature_selection/_from_model.py:docstring of sklearn.feature_selection._from_model.SelectFromModel:47: WARNING: Bullet list ends without a blank line; unexpected unindent.' 

@Micky774
Copy link
Copy Markdown
Contributor Author

Should be good to go now!

@adrinjalali adrinjalali merged commit db24a30 into scikit-learn:main Mar 22, 2022
@Micky774 Micky774 deleted the sfm_max_features_upgrade branch March 22, 2022 23:38
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Apr 6, 2022
…kit-learn#22356)

* Initial implementation

* Improved error handling and stability

* Added unit tests

* Updated test to use `max_features_` instead of `max_features`

* Added documentation for new private attribute `max_features_`

* Improved error handling for callables

* Updated whats_new

* Removed incorrect term reference to `max_features`

* Removed float case and improved testing

* Updated test names to more clearly reflect intention

* Added a sample callable in `max_features` description

* Improved documentation and streamlined error handling

* Updated example to include demonstrate using a callable for max_features

* Separated out callable demo into separate example

* Removed demo from `max_features` docs (now in example)

* Updated changelog

* Apply suggestions from code review

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* Trimmed unneeded comments

* Updated tests to reflect new error handling

* Removed new line at end of docstring

* Updated docstring

* Fixed example syntax error

* Fixed example syntax

* Apply suggestions from code review

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* Reverted irrelevant changes

* Update sklearn/feature_selection/_from_model.py

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* Fixed error message

* Improved test coverage

* Minor doc improvement -- added a list for `max_features` type

* Update sklearn/feature_selection/_from_model.py

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

* Improved input validation and added test for array-like

* Updated doc to use no longer use lambda function

* Fixed docstring list

* Added missing whitespace for list format in docstring

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SelectFromModel: max_features can't be greater than number of features

3 participants