FIX ignore single node trees in gbm's feature importances#13620
FIX ignore single node trees in gbm's feature importances#13620jnothman merged 8 commits intoscikit-learn:masterfrom
Conversation
|
So does this fix the issue? |
|
I believe so, but I'd be happier if we could actually test it. @NicolasHug do you happen to have an example for this case? |
|
I don't understand what the issue is here. The original issue (FA not summing to 1 have been fixed). #7406 (comment) from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=10, random_state=1)
gbc = GradientBoostingClassifier(min_samples_leaf=5)
gbc.fit(X, y)
for stage in gbc.estimators_:
for tree in stage:
print(tree.tree_.node_count)
print(tree.tree_.compute_feature_importances())
print('-' * 10)
print(gbc.feature_importances_)This will give you an ensemble with 3-node trees and 1-node trees at the end. The FAs look good. Details |
|
In your example, the FEs are the same before and after this PR, because there's only one class, and hence all from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=15, n_informative=3, random_state=1, n_classes=3)
gbc = GradientBoostingClassifier(min_samples_leaf=7, random_state=42, n_estimators=300)
gbc.fit(X, y)
for stage in gbc.estimators_:
print("==========================")
for tree in stage:
print(tree.tree_.node_count)
print(tree.tree_.compute_feature_importances())
print('-' * 10)
print(gbc.feature_importances_)will report the following two sets of FEs before and after this PR. They're not too different in this case, but they're not the same: |
| normalize=False) for tree in trees) / len(trees) | ||
| total_sum += stage_sum | ||
|
|
||
| importances = total_sum / total_sum.sum() |
There was a problem hiding this comment.
This will give a ZeroDivision error if all the trees in the ensemble have only one node. You can reproduce with something like:
X, y = make_classification(n_samples=10, random_state=1)
gbc = GradientBoostingClassifier(min_samples_leaf=10)There was a problem hiding this comment.
that's why I have the if len(trees) == 0: continue above it.
There was a problem hiding this comment.
Yes, and if you hit continue for all trees then total_sum.sum() is 0 and you get an error
|
@NicolasHug like this? |
| continue | ||
| stage_sum = sum(tree.tree_.compute_feature_importances( | ||
| normalize=False) for tree in stage) / len(stage) | ||
| normalize=False) for tree in trees) / len(trees) |
There was a problem hiding this comment.
I'm not sure this is correct: this will give more weight to the stages where one (or more) tree is "missing" (because it's been filtered out). Let's say you have 3 classes and one of the trees is filtered out at a given stage, stage_sum will be an average over 2 trees instead of an average over 3 trees. Hence these 2 trees will have more weight than the other trees at the other stages.
After all, we just want to compute the average feature importances across all trees that have more than one node right? I feel like this does want we want:
relevant_trees = [tree
for stage in self.estimators_ for tree in stage
if tree.tree_.node_count > 1]
if not relevant_trees:
# degenerate case where all trees have only one node
return np.zeros(shape=self.n_features_)
relevant_feature_importances = [
tree.tree_.compute_feature_importances(normalize=False)
for tree in relevant_trees
]
avg_feature_importances = np.mean(relevant_feature_importances, axis=0)
return avg_feature_importances / np.sum(avg_feature_importances)There was a problem hiding this comment.
I agree, changed it to your code
| if tree.tree_.node_count > 1] | ||
| if not relevant_trees: | ||
| # degenerate case where all trees have only one node | ||
| return np.zeros(shape=self.n_features_) |
There was a problem hiding this comment.
We can test this by growing single node trees:
x = np.zeros((10, 10))
y = np.ones((10,))
gbr = GradientBoostingRegressor()
gbr.fit(X, y)Do we want this degenerate case to also sum to one?
There was a problem hiding this comment.
I don't think it makes sense to return non-zero values for the degenerate case. At least in my mind, the zero vector is a better representative of what we have. It's like, are features equally important, or are they not important at all? I think they're not important at all in this case.
There was a problem hiding this comment.
If this does not sum to one, we should document it.
There was a problem hiding this comment.
added to the docstring
NicolasHug
left a comment
There was a problem hiding this comment.
If you want a non-regression test making sure we ignore the single-node-trees, you could simply compute the FIs starting with
relevant_trees = [tree
for stage in self.estimators_ for tree in stage]
without filtering and asserting results are different.
LGTM anyway.
Some additional thoughts: I'm not a FI expert, but maybe in some cases one would want the intra-class FIs. In multiclass classification we build C trees per iteration, and for some datasets I wouldn't be surprised that the FIs differ quite a bit for each class.
Right now we're averaging over all the (non-single-node) trees without considering the classes. Other aggregating strategies would be to first compute intra-class FIs and only do a (weighted) average after. Or even return an array of shape (C, n_features). Anyway, just some thoughts, nothing that would prevent this PR from being merged ;)
I really like that, but it's backward incompatible, and since |
|
I just realized we have the same issue in |
doc/whats_new/v0.21.rst
Outdated
| :class:`ensemble.GradientBoostingRegressor` now ignores the single | ||
| node trees in feature importance calculation, and returns an | ||
| array of zeros if all trees have only one single node (i.e. the root node). | ||
| :issue:`13620` by `Adrin Jalali`_. |
There was a problem hiding this comment.
@NicolasHug I'm not sure how you'd like to merge the two changeslogs here.
There was a problem hiding this comment.
From the other PR:
- sum up to ``1``
- all the single node trees in feature importance calculation are ignored
- in case all trees have only one single node (i.e. a root node),
feature importances will be an array of all zeros.
All these cases apply here it seems?
There was a problem hiding this comment.
Not the first one, but merged them anyway.
733a2fb to
6720395
Compare
|
Thanks @adrinjalali |
…ikit-learn#13620)" This reverts commit 71172f6.
…ikit-learn#13620)" This reverts commit 71172f6.
Fixes #7406
I'm not sure how to test it though. The feature importances of the original post's example don't change after this PR.