Skip to content

RFE/RFECV step enhancements#12578

Closed
hermidalc wants to merge 6 commits intoscikit-learn:masterfrom
hermidalc:rfe_step_enhancements
Closed

RFE/RFECV step enhancements#12578
hermidalc wants to merge 6 commits intoscikit-learn:masterfrom
hermidalc:rfe_step_enhancements

Conversation

@hermidalc
Copy link
Copy Markdown
Contributor

Reference Issues/PRs

#10368

What does this implement/fix? Explain your changes.

Adds two new RFE/RFECV step parameters, step_one_threshold and step_decay.

step_one_threshold (int, default=None) allows to set a threshold number of features where step will revert to 1 if specified step int or float equates to > 1 feature.

step_decay (boolean, default=False) modifies behavior when float step specified. Instead of removing fixed number of features at each iteration calculated from percentage of starting number of features, removes a decaying number of features calculated from percentage of remaining features at each iteration.

Comments

Still need to add tests. Need feedback on parameter names and design if it is consistent with sklearn principles.

(rounded down) of features to remove at each iteration.

step_one_threshold : int or None (default=None)
If specified step int or float equates to > 1 feature then the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not very clear.

Copy link
Copy Markdown
Contributor Author

@hermidalc hermidalc Nov 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree that’s why I need feedback!

I’ve had my own customized RFE for over a year with these features and they are features that are described by Guyon et al in the original SVM-RFE paper.

I am totally open to param name changes etc. The param here is if you’ve set the step param to an int or float that corresponds to stepping multiple features at a time then you can soecify at some threshold num features where it starts stepping by one feature at a time. Maybe threshold isn’t the right word not sure

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the effect of this is that "Feature sets sized between n_features_to_select and step_one_threshold, inclusive, will all be considered regardless of the step parameter."

Possible names:

  • try_all_below
  • step_if_at_least

threshold number of remaining features when to revert to step = 1

step_decay : boolean, optional (default=False)
If step is a float whether to calculate the percentage of features
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "If step is a float, whether to remove a step fraction of the original number of features, or a step fraction of the current number of features in each step". Still not very clear though :-/
"Whether to remove a fraction of the original number feature or current number of features at each step?"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If true and step is a float, the number of features removed is calculated as a fraction of the remaining features in that iteration. If false, the number of features removed is constant (a fraction of the original number of features) across iterations.

threshold number of remaining features when to revert to step = 1

step_decay : boolean, optional (default=False)
If step is a float whether to calculate the percentage of features
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If true and step is a float, the number of features removed is calculated as a fraction of the remaining features in that iteration. If false, the number of features removed is constant (a fraction of the original number of features) across iterations.

If specified step int or float equates to > 1 feature then the
threshold number of remaining features when to revert to step = 1

step_decay : boolean, optional (default=False)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider calling this reducing_step

(rounded down) of features to remove at each iteration.

step_one_threshold : int or None (default=None)
If specified step int or float equates to > 1 feature then the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the effect of this is that "Feature sets sized between n_features_to_select and step_one_threshold, inclusive, will all be considered regardless of the step parameter."

Possible names:

  • try_all_below
  • step_if_at_least

self.scores_ = []

step_one_threshold = (self.step_one_threshold if
self.step_one_threshold is not None else 0)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be handling a float here too?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure why not, we can also make it an int as well. Meaning change the step rate to a different float or int at the threshold that corresponds to fewer features being removed per step. This behavior wasn't mentioned in the Guyon SVM-RFE paper but it doesn't harm I guess (they only mentioned having the behavior where you remove more features at the beginning in a reducing step and at some number of remaining features based on your particular problem and dataset you start stepping by 1)

Copy link
Copy Markdown
Contributor Author

@hermidalc hermidalc Nov 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnothman I updated code and doc based on suggestions above, except still don't understand the suggestion here for try_all_below or step_if_at_least? I think maybe a slight misunderstanding.

Suppose you have a dataset with 3000 features. When specifying step=100 (or a float) and step_one_threshold=250 then this is what will happen:

3000, 2900, 2800, ... , 400, 300, 250, 249, 248, 247, ... , n_features_to_select

The goal is to step by the specified (step) larger number of features until you get to the specified threshold number of features (step_one_threshold) where it will start stepping 1 at a time until it gets to n_features_to_select. You can also see to that the way I wrote the code it won't ever step over the threshold no matter how larger the step size.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If step_one_threshold is clearer to you, fine. What I mean by try_all_below is that from 250 down, all feature set sizes will be tried. Similarly, step_if_at_least indicates that if the feature set size is smaller than that value, step is disregarded.

Copy link
Copy Markdown
Contributor Author

@hermidalc hermidalc Nov 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not super happy with step_one_threshold either. But also feel the two suggestions could also be confusing to users. try_all_below makes it seem like it’s going to try but could fail when it will always do all, also the word below means it’s not including the number of feature passed in the param, meaning try all below 250 doesn’t include 250 which it should.

step_if_at_least can be hard for people to understand because I ask myself step what if it at least, it’s not clear by the param name that it means step param is active at at least those remaining features..

But I guess people should read the doc explanations, just thinking we could come up with a param name that is almost self explanatory.

I would lean towards the style of try all below vs step if at least, it’s easier to understand. Maybe try_all_from so it sounds inclusive of the param number?

while np.sum(support_) > n_features_to_select:
# Remaining features
features = np.arange(n_features)[support_]
n_remaining_features = np.sum(support_)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ever not deterministic? I.e. can we determine the sequence of feature set sizes in advance, or in a separate generator function (given X.shape[1], step, n_features_to_select, step_one_threshold, step_decay)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFE is deterministic and doesn't ever change based on iteration intermediate results. I guess it's somewhat separate from this pull request but a great idea.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that it belongs in this PR because it isolates the added complexity of step size / feature set size calculation from the selection of features. It would mean we don't need perplexingly redundant computation like np.sum(support_).

Copy link
Copy Markdown
Contributor Author

@hermidalc hermidalc Nov 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry what I meant is (and you probably saw this) is all of the code you are talking about, the loop and the redundant computation this is how RFE was before, I didn't add this code. That's why I thought it should be in a second PR but it's fine.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but until now, the loop looked relatively straightforward :)

Copy link
Copy Markdown
Contributor Author

@hermidalc hermidalc Nov 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that it belongs in this PR because it isolates the added complexity of step size / feature set size calculation from the selection of features. It would mean we don't need perplexingly redundant computation like np.sum(support_).

Latest commit slightly improves preexisting num remaining features logic. I will refactor though to precompute all the steps in one iterable and then just loop over that in elimination loop.

@hermidalc
Copy link
Copy Markdown
Contributor Author

@amueller @jnothman quick question, in my other pull request for the univariate_score_params you told me it's important to maintain positional order for backwards compat and always putting a new kwarg at the end. But here I see in sklearn 0.20.0 someone added a new min_features_to_select param that is not at the end and doesn't maintain positional order for backwards compat for people who just specify their params positionally without keywords.

@hermidalc
Copy link
Copy Markdown
Contributor Author

Latest commit changing step_decay -> reducing_step param name, updated doc using #12578 (comment) above, and most importantly fixed the logic in the loop as previous logic didn't work as intended in all cases.

@jnothman
Copy link
Copy Markdown
Member

in my other pull request for the univariate_score_params you told me it's important to maintain positional order for backwards compat and always putting a new kwarg at the end. But here I see in sklearn 0.20.0 someone added a new min_features_to_select param that is not at the end and doesn't maintain positional order for backwards compat for people who just specify their params positionally without keywords.

Indeed... others might not have done that. I generally go by the assumption that after the first two or so parameters we require users to specify by kwarg. Not that we say this loudly, although it is noted in the glossary.

fixed the logic in the loop as previous logic didn't work as intended in all cases.

This is one reason to pull the logic into a separate iterable.

self.scores_.append(step_score(estimator, features))
support_[features[ranks][:threshold]] = False
ranking_[np.logical_not(support_)] += 1
n_remaining_features = np.sum(support_)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this just be n_remaining_features - threshold?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that’s true right sorry for missing that.

ranks = np.ravel(ranks)

# Adjust step using special parameters if specified
if self.step_one_threshold is not None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still would appreciate making this less nested and having a clearer interface by pulling it out into a method or generator

Copy link
Copy Markdown
Contributor Author

@hermidalc hermidalc Nov 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about it for a bit and trying to make it more concise i don’t think it’s possible unless I’m missed some possible way to code the logic. In order to perform the logic necessary it actually takes those if statements and nested like that. Remember that it has to handle when only reducing_step is set, or only step_one_threshold, or both parameters.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will move it out like you said into a separate generator function, or what do you prefer a private method that just generates all the remaining feature sizes in one go and returns a list and then I change the main loop to to loop over this?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too fussed about this. It is cosmetic. I'm just finding it harder to read than I think it should be.

@jnothman
Copy link
Copy Markdown
Member

Please add tests

@hermidalc
Copy link
Copy Markdown
Contributor Author

@jnothman I'll get back to this soon, it's last couple weeks of semester for me with projects and final exams. Once semester is over we'll finish this.

@calemagruder
Copy link
Copy Markdown

@hermidalc any updates? Would love to see this functionality added :-) Thanks!

@hermidalc
Copy link
Copy Markdown
Contributor Author

@calemagruder I’ll get back to it in a couple days and will finish the implementation and pull request with @jnothman

@hermidalc
Copy link
Copy Markdown
Contributor Author

hermidalc commented Mar 11, 2019

Hi all - sorry for the delay... grad school consuming my time. I'm back working on this now give me a couple days to make another commit.

I've made the step change more general that it doesn't have to revert to 1 but a user-specified float or int value. So you can for example step=0.1 then change step to 0.05 or 5 at a specific threshold number of features.

@hermidalc
Copy link
Copy Markdown
Contributor Author

hermidalc commented Mar 17, 2019

I screwed up my forked repository so now the pull request is also messed up... apologies. I'm going to close this and open a new pull request! I will refer back to this in that request.

@hermidalc hermidalc closed this Mar 17, 2019
@jnothman
Copy link
Copy Markdown
Member

jnothman commented Mar 17, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants