ENH add multioutput support for RFE#16103
ENH add multioutput support for RFE#16103rth merged 20 commits intoscikit-learn:masterfrom divyaprabha123:0.23.x
Conversation
Checked the type of the target variable, if the target is equal to multioutput then multioutput is set to zero before check_X_y
|
Thanks for the pull request! Please fix the pep8 error, and please add a test |
|
You need to add a test. Could you test that the multioutput is supported in RFECV as well. Since RFECV inherit from RFE, it should work as well quite easily |
Sure @glemaitre I will do that :) |
sklearn/feature_selection/_rfe.py
Outdated
| X, y = check_X_y(X, y, "csc", ensure_min_features=2, | ||
| force_all_finite=not tags.get('allow_nan', True)) | ||
| force_all_finite=not tags.get('allow_nan', True), | ||
| multi_output=multioutput) |
There was a problem hiding this comment.
Why can't we just use multi_output=True in all cases?
There was a problem hiding this comment.
Yeah! that's actually possible! I thought we may doing extra computation by calling check_array function in validation.py. I will change that now :)
|
Sorry accidentally closed this PR reopening it again. @rth Can you please check this and let me know what you think? |
rth
left a comment
There was a problem hiding this comment.
Thanks @divyaprabha123 ! LTGM, aside for a minor comment.
Please add an entry to the change log at doc/whats_new/v0.23.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself with :user:.
| rfe.transform(X) | ||
|
|
||
|
|
||
| def test_multioutput(): |
There was a problem hiding this comment.
I imagine the same should work with RFECV? If so please parametrize this test with,
@pytest.mark.parametrize('ClsRFE', [RFE, RFECV])
def test_multioutput(ClsRFE):
...
rfe_test = ClsRFE(clf)There was a problem hiding this comment.
Yeah this will work with REFCV I will also add this in the test. Thank you so much @rth : )
| X, y = check_X_y(X, y, "csc", ensure_min_features=2, | ||
| force_all_finite=not tags.get('allow_nan', True)) | ||
| force_all_finite=not tags.get('allow_nan', True), | ||
| multi_output=True) |
There was a problem hiding this comment.
As a side comment, I think it is indeed best to assume that multi_output is supported here, pass it to the underlying estimator and let it error if it doesn't. As opposed to relying on multioutput estimator tag, which can potentially be not accurate.
jnothman
left a comment
There was a problem hiding this comment.
LGTM pending @rth's testing suggestion. Thanks @divyaprabha123
Hi @rth and @jnothman I have added the test function as suggested, please go through and let me know if I have change or improve anything! |
rth
left a comment
There was a problem hiding this comment.
One last comment below, also you need to resolve conflicts (can be done in Github UI).
| :class:`ensemble.GradientBoostingClassifier` as well as ``predict`` method of | ||
| :class:`tree.DecisionTreeRegressor`, :class:`tree.ExtraTreeRegressor`, and | ||
| :class:`ensemble.GradientBoostingRegressor`. | ||
| :pr:`16331` by :user:`Alexandre Batisse <batalex>`. |
There was a problem hiding this comment.
The merge conflict resolution went wrong somehow for the what's new. There should be no diff here aside for your changes. Will push a commit to fix it.
There was a problem hiding this comment.
@rth Thanks for accepting the pull request :)
What does this implement/fix? Explain your changes.
Regressor like random forest natively supports multi-output regression but when we use RFE bad input shape error is thrown, making it insufficient to utilize the multi-output facility.
This implementation checks the type of the target variable using utils.multiclass.type_of_target, if the type is equal to multioutput then "multi_output" argument of check_X_y is set to True.