Skip to content

min_weight_fraction_leaf suggested improvements #6945

@ben519

Description

@ben519

Description

I've been using the min_weight_fraction_leaf parameter of DecisionTreeClassifier and RandomForestClassifier incorrectly and I think it's likely other people are doing the same thing as me.

For example, the documentation for min_weight_fraction_leaf in DecisionTreeClassifier says

The minimum weighted fraction of the input samples required to be at a leaf node.

It was really unclear to me what the docs meant by "weighted fraction of the input samples". Initially I thought it was a weighting based on the size of the classes or the values given by class_weight. I think a slight change in the parameter description could clear up this confusion. Perhaps something like

The minimum weighted fraction of the input samples required to be at a leaf node where weights are determined by sample_weight in the fit() method.

Furthermore, it appears min_weight_fraction_leaf only applies if sample_weight is provided in the call fit(). If sample_weight is not provided in the call to fit(), min_weight_fraction_leaf is silently ignored. Here, I think min_weight_fraction_leaf should still apply under the assumption that all samples are equally weighted OR a warning should be given that min_weight_fraction_leaf will not be used since sample_weight was not provided.

Versions

Darwin-15.5.0-x86_64-i386-64bit
Python 3.5.1 |Continuum Analytics, Inc.| (default, Dec 7 2015, 11:24:55)
[GCC 4.2.1 (Apple Inc. build 5577)]
NumPy 1.11.0
SciPy 0.17.1
Scikit-Learn 0.17.1

Also, I would love to make the changes I suggested (if they're deemed worthy), but I have little experience contributing to open-source libraries. Might need a bit of hand-holding if someone would be willing to help me out.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions