Modified BaseDecisionTree so that min_weight_fraction_leaf works when… by ben519 · Pull Request #6947 · scikit-learn/scikit-learn

ben519 · 2016-06-29T03:38:31Z

Reference Issue

Addresses #6945

Changes

min_weight_fraction_leaf should work even when sample_weight is not given (in which case samples are assumed to have equal weight). I also tweaked the description of min_weight_fraction_leaf to make its purpose a bit more clear.

Other comments

I'm fairly new to open-source collaboration as well as scikit-learn. I think my changes have room for improvement. For example, if sample_weight is given as None by the user, I reset it to an array of all 1s. I'm guessing this is a bad solution. However, I could use help figuring out where and how to implement the correct solution. I'm also slightly unsure the best way to test my changes. Would really appreciate someone offering 5 or 10 minutes of their time via Skype or Google Hangout so I can become a contributor and hopefully add more contributions in the future.

… sample_weight is None and improved parameter description

jnothman · 2016-06-29T05:10:19Z

sklearn/tree/tree.py

-        if self.min_weight_fraction_leaf != 0. and sample_weight is not None:
+        if self.min_weight_fraction_leaf != 0.:
+            if sample_weight is None:
+                sample_weight = np.repeat(1., n_samples)


Firstly, we don't need to explicitly do this, since all we want is the sum of weights, i.e. "total weight". So you could just do min_weight_leaf = self.min_weight_fraction_leaf * n_samples.

Secondly, I see why you might want this, but it now replicates the function of min_samples_leaf. So if this is the way we go, i.e. rather than a warning, you can actually just use

min_samples_leaf = max(min_samples_leaf, int(ceil(self.min_weight_fraction_leaf * n_samples)))

In terms of testing, you should look at existing tests for min_samples_leaf and check that min_weight_fraction_leaf offers the same behaviour.

When sample_weight is None, then samples are equally weighted in the Cython code. Does this addition really change anything? I believe behaviours should be identical. Do you have counter-examples where the trees that are built are actually different?

If sample_weight is None then min_weight_fraction_leaf has no effect. I think that's quite clear from the else case here.

Oh my bad, I was not aware of that else clause. Indeed. Ouch.

Could we fix this in the cython code?

jnothman · 2016-06-29T05:38:43Z

In terms of advice by skype et al, you might find the scikit-learn channel on Gitter helpful.

jnothman · 2016-06-29T06:27:28Z

sklearn/tree/tree.py


        # Set min_weight_leaf from min_weight_fraction_leaf
-        if self.min_weight_fraction_leaf != 0. and sample_weight is not None:
+        if self.min_weight_fraction_leaf != 0.:


You don't actually need this if

jnothman · 2016-08-23T13:49:58Z

@ben519 do you intend to finish off your work on this?

ben519 · 2016-08-24T00:02:49Z

@jnothman I definitely can't this week. I can give it a shot this weekend, but I'd be happier if someone more knowledgeable than me took this over.

nelson-liu · 2016-08-24T00:10:17Z

if no one else is available, i can do it. i'll be traveling the next week but starting sept 3 i'll have ~ a week free i can use to fixing this up. might be a good test to see if all that gsoc work w/ tree paid off

raghavrv · 2016-08-27T11:41:27Z

@nelson-liu please go ahead and submit a PR!

ogrisel · 2016-08-31T09:02:24Z

Closing this in favor of #7301.

Modified BaseDecisionTree so that min_weight_fraction_leaf works when…

e6de3d8

… sample_weight is None and improved parameter description

jnothman reviewed Jun 29, 2016
View reviewed changes

nelson-liu mentioned this pull request Aug 31, 2016

[MRG+3] Fix min_weight_fraction_leaf to work when sample_weights are not provided #7301

Merged

ogrisel closed this Aug 31, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Modified BaseDecisionTree so that min_weight_fraction_leaf works when…#6947

Modified BaseDecisionTree so that min_weight_fraction_leaf works when…#6947
ben519 wants to merge 1 commit intoscikit-learn:masterfrom
ben519:min_weight_fraction_leaf-improvements

ben519 commented Jun 29, 2016

Uh oh!

jnothman Jun 29, 2016

Uh oh!

glouppe Jun 29, 2016

Uh oh!

jnothman Jun 29, 2016

Uh oh!

glouppe Jun 29, 2016

Uh oh!

amueller Aug 23, 2016

Uh oh!

jnothman commented Jun 29, 2016

Uh oh!

jnothman Jun 29, 2016

Uh oh!

jnothman commented Aug 23, 2016

Uh oh!

ben519 commented Aug 24, 2016

Uh oh!

nelson-liu commented Aug 24, 2016

Uh oh!

raghavrv commented Aug 27, 2016

Uh oh!

ogrisel commented Aug 31, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Uh oh!

Conversation

ben519 commented Jun 29, 2016

Reference Issue

Changes

Other comments

Uh oh!

jnothman Jun 29, 2016

Choose a reason for hiding this comment

Uh oh!

glouppe Jun 29, 2016

Choose a reason for hiding this comment

Uh oh!

jnothman Jun 29, 2016

Choose a reason for hiding this comment

Uh oh!

glouppe Jun 29, 2016

Choose a reason for hiding this comment

Uh oh!

amueller Aug 23, 2016

Choose a reason for hiding this comment

Uh oh!

jnothman commented Jun 29, 2016

Uh oh!

jnothman Jun 29, 2016

Choose a reason for hiding this comment

Uh oh!

jnothman commented Aug 23, 2016

Uh oh!

ben519 commented Aug 24, 2016

Uh oh!

nelson-liu commented Aug 24, 2016

Uh oh!

raghavrv commented Aug 27, 2016

Uh oh!

ogrisel commented Aug 31, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants