-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
min_weight_fraction_leaf suggested improvements #6945
Description
Description
I've been using the min_weight_fraction_leaf parameter of DecisionTreeClassifier and RandomForestClassifier incorrectly and I think it's likely other people are doing the same thing as me.
For example, the documentation for min_weight_fraction_leaf in DecisionTreeClassifier says
The minimum weighted fraction of the input samples required to be at a leaf node.
It was really unclear to me what the docs meant by "weighted fraction of the input samples". Initially I thought it was a weighting based on the size of the classes or the values given by class_weight. I think a slight change in the parameter description could clear up this confusion. Perhaps something like
The minimum weighted fraction of the input samples required to be at a leaf node where weights are determined by sample_weight in the fit() method.
Furthermore, it appears min_weight_fraction_leaf only applies if sample_weight is provided in the call fit(). If sample_weight is not provided in the call to fit(), min_weight_fraction_leaf is silently ignored. Here, I think min_weight_fraction_leaf should still apply under the assumption that all samples are equally weighted OR a warning should be given that min_weight_fraction_leaf will not be used since sample_weight was not provided.
Versions
Darwin-15.5.0-x86_64-i386-64bit
Python 3.5.1 |Continuum Analytics, Inc.| (default, Dec 7 2015, 11:24:55)
[GCC 4.2.1 (Apple Inc. build 5577)]
NumPy 1.11.0
SciPy 0.17.1
Scikit-Learn 0.17.1
Also, I would love to make the changes I suggested (if they're deemed worthy), but I have little experience contributing to open-source libraries. Might need a bit of hand-holding if someone would be willing to help me out.