FIX feature importances in random forests should sum up to 1#13636
FIX feature importances in random forests should sum up to 1#13636NicolasHug merged 7 commits intoscikit-learn:masterfrom
Conversation
NicolasHug
left a comment
There was a problem hiding this comment.
LGTM. I think we want to have a single entry in the what's new for all the ensemble models, so I would suggest to update this PR or #13620, whichever gets merged last
| n_classes=3) | ||
| clf = RandomForestClassifier(min_samples_leaf=5, random_state=42, | ||
| n_estimators=200).fit(X, y) | ||
| assert math.isclose(1, clf.feature_importances_.sum(), abs_tol=1e-7) |
There was a problem hiding this comment.
Just curious, why not np.is_close()?
There was a problem hiding this comment.
numpy's isclose is older than python's math.isclose and has some issues compared to the python's. Here's a nice conversation on the topic: numpy/numpy#10161. Since now we support python>=3.5, seems like a good choice to me to use math.isclose if comparing two floats is what we want to do.
sure, could do. But there's a slight difference that on GB{C/R} models the importances already do sum up to 1, which is not the case for forests. |
|
|
||
| .. _Hanmin Qin: https://github.com/qinhanmin2014 | ||
|
|
||
| .. _Adrin Jalali: https://github.com/adrinjalali |
There was a problem hiding this comment.
We should have a separate PR adding our names to this list. 😅
There was a problem hiding this comment.
lol, yeah, I've added mine in a few PRs, it'll be there once one of them gets merged lol
|
Thanks Adrin |
…-learn#13636)" This reverts commit ae20ae6.
…-learn#13636)" This reverts commit ae20ae6.
Same as the discussion in #7406 and #13620, the feature importances of random forest based mdoels should sum up to 1, and root only trees should be ignored.