[MRG+1] Add sample_weight support to Dummy Regressor by arjoly · Pull Request #3779 · scikit-learn/scikit-learn

arjoly · 2014-10-16T14:25:10Z

No description provided.

arjoly · 2014-10-16T14:25:32Z

sklearn/dummy.py

I removed accept_sparse='csr' since it's not supported.

arjoly · 2014-10-16T14:26:47Z

This is ready for review and it will fix #3420

MechCoder · 2014-10-16T14:54:49Z

sklearn/dummy.py

Noob question: out of curiosity, is there any difference between doing this and

`y = y[:, np.newaxis]`

I always use the latter.

The y = y[:, np.newaxis] doesn't preserve contiguity. This is a known bug and will be solve in the current or next release of numpy.

oh yes, I remember now :)

MechCoder · 2014-10-16T15:40:18Z

sklearn/utils/stats.py

If I'm understanding right, these two lines can be replaced by

precentile_idx = np.searchsorted(weight_cdf, (percentile / 100.) * weight_cdf[-1])

or am I wrong?

Do you think this could be optimized in another pr? I have just taken what @pprett has done previously and put it there to be useful to more than just gradient boosting.

okay, unless @pprett thinks if it is ok, to change this over here.

MechCoder · 2014-10-16T15:53:30Z

@arjoly Done with my review. LGTM 👍 Would greatly appreciate it if you have the time to look at #3772 and give your comments.

MechCoder · 2014-10-17T09:32:43Z

@arjoly Updated the PR description.

arjoly · 2014-10-17T11:32:35Z

Thanks @MechCoder !

SaurabhJha · 2014-10-19T10:26:28Z

sklearn/utils/stats.py

Sorry for being stupid, but I am not able to get this to work. My arguments are [3, 2, 4] and [1, 2, 3] for array and sample_weight respectively. The sorted_idx is an array and thus throwing a TypeError. I wonder what are the expected arguments here.

sample_weight should be a numpy array

On Sun, Oct 19, 2014 at 12:26 PM, Saurabh Jha notifications@github.com
wrote:

In sklearn/utils/stats.py:

@@ -44,3 +44,16 @@ def _rankdata(a, method="average"):

except TypeError as e:
rankdata = _rankdata
+
+
+def _weighted_percentile(array, sample_weight, percentile=50):

"""Compute the weighted percentile of array with sample_weight. """

sorted_idx = np.argsort(array)

Find index of median prediction for each sample

weight_cdf = sample_weight[sorted_idx].cumsum()

percentile_or_above = weight_cdf >= (percentile / 100.0) * weight_cdf[-1]

Sorry for being stupid, but I am not able to get this to work. My
arguments are [3, 2, 4] and [1, 2, 3] for array and sample_weight
respectively. The sorted_idx is an array and thus throwing a TypeError. I
wonder what are the expected arguments here.

—
Reply to this email directly or view it on GitHub
https://github.com/scikit-learn/scikit-learn/pull/3779/files#r19059282.

Godspeed,
Manoj Kumar,
Intern, Telecom ParisTech
Mech Undergrad
http://manojbits.wordpress.com

Thanks @MechCoder !

arjoly · 2014-10-21T13:37:48Z

A last reviewer ? ping @ogrisel, @pprett, @glouppe

arjoly · 2014-10-24T11:15:22Z

In the long term, I hope to replace the dummy estimator in gradient boosting by the dummy regressor and classifier. any last reviewer?

MechCoder · 2014-10-28T08:28:10Z

sklearn/tests/test_dummy.py

would it better to generate X randomly? Just for a sanity check.

MechCoder · 2014-10-28T08:30:41Z

I suppose you can go ahead and merge if noone replies in 2-3 days, or when you feel like. I believe none is following these changes and this is not that big a diff.

arjoly · 2014-10-31T12:11:33Z

Thanks @MechCoder ! I am ok to merge. Still I would appreciate a quick last review for this small pr.

MechCoder · 2014-11-04T08:49:50Z

I think this should go in.

[MRG+1] Add sample_weight support to Dummy Regressor

MechCoder · 2014-11-04T08:50:24Z

Thanks @arjoly .

arjoly reviewed Oct 16, 2014
View reviewed changes

sklearn/dummy.py

Copy link
Copy Markdown

Member Author

arjoly Oct 16, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed accept_sparse='csr' since it's not supported.

arjoly mentioned this pull request Oct 16, 2014

Adding sample_weightsupport to DummyRegresor #3420

Closed

arjoly force-pushed the sw-dummy-regressor branch 2 times, most recently from 9e4f6d5 to de49e67 Compare October 16, 2014 14:41

Add sample_weight support to Dummy Regressor

da31344

arjoly force-pushed the sw-dummy-regressor branch from de49e67 to da31344 Compare October 16, 2014 14:41

MechCoder reviewed Oct 16, 2014
View reviewed changes

Use np.average instead of np.mean

e0f4cd9

MechCoder reviewed Oct 16, 2014
View reviewed changes

MechCoder changed the title ~~[MRG] Add sample_weight support to Dummy Regressor~~ [MRG+1] Add sample_weight support to Dummy Regressor Oct 17, 2014

SaurabhJha reviewed Oct 19, 2014
View reviewed changes

MechCoder reviewed Oct 28, 2014
View reviewed changes

sklearn/tests/test_dummy.py

Copy link
Copy Markdown

Member

MechCoder Oct 28, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it better to generate X randomly? Just for a sanity check.

MechCoder force-pushed the master branch from 6deaea0 to 3f49cee Compare November 3, 2014 12:36

MechCoder closed this Nov 4, 2014

MechCoder reopened this Nov 4, 2014

MechCoder added a commit that referenced this pull request Nov 4, 2014

Merge pull request #3779 from arjoly/sw-dummy-regressor

e6835a7

[MRG+1] Add sample_weight support to Dummy Regressor

MechCoder merged commit e6835a7 into scikit-learn:master Nov 4, 2014

Uh oh!

Conversation

arjoly commented Oct 16, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arjoly commented Oct 16, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MechCoder commented Oct 16, 2014

Uh oh!

MechCoder commented Oct 17, 2014

Uh oh!

arjoly commented Oct 17, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Find index of median prediction for each sample

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arjoly commented Oct 21, 2014

Uh oh!

arjoly commented Oct 24, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MechCoder commented Oct 28, 2014

Uh oh!

arjoly commented Oct 31, 2014

Uh oh!

MechCoder commented Nov 4, 2014

Uh oh!

MechCoder commented Nov 4, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants