Skip to content

[MRG+1] Add sample_weight support to Dummy Regressor#3779

Merged
MechCoder merged 2 commits intoscikit-learn:masterfrom
arjoly:sw-dummy-regressor
Nov 4, 2014
Merged

[MRG+1] Add sample_weight support to Dummy Regressor#3779
MechCoder merged 2 commits intoscikit-learn:masterfrom
arjoly:sw-dummy-regressor

Conversation

@arjoly
Copy link
Copy Markdown
Member

@arjoly arjoly commented Oct 16, 2014

No description provided.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed accept_sparse='csr' since it's not supported.

@arjoly
Copy link
Copy Markdown
Member Author

arjoly commented Oct 16, 2014

This is ready for review and it will fix #3420

@arjoly arjoly force-pushed the sw-dummy-regressor branch 2 times, most recently from 9e4f6d5 to de49e67 Compare October 16, 2014 14:41
@arjoly arjoly force-pushed the sw-dummy-regressor branch from de49e67 to da31344 Compare October 16, 2014 14:41
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noob question: out of curiosity, is there any difference between doing this and

`y = y[:, np.newaxis]`

I always use the latter.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The y = y[:, np.newaxis] doesn't preserve contiguity. This is a known bug and will be solve in the current or next release of numpy.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes, I remember now :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding right, these two lines can be replaced by

precentile_idx = np.searchsorted(weight_cdf, (percentile / 100.) * weight_cdf[-1])

or am I wrong?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this could be optimized in another pr? I have just taken what @pprett has done previously and put it there to be useful to more than just gradient boosting.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, unless @pprett thinks if it is ok, to change this over here.

@MechCoder
Copy link
Copy Markdown
Member

@arjoly Done with my review. LGTM 👍 Would greatly appreciate it if you have the time to look at #3772 and give your comments.

@MechCoder MechCoder changed the title [MRG] Add sample_weight support to Dummy Regressor [MRG+1] Add sample_weight support to Dummy Regressor Oct 17, 2014
@MechCoder
Copy link
Copy Markdown
Member

@arjoly Updated the PR description.

@arjoly
Copy link
Copy Markdown
Member Author

arjoly commented Oct 17, 2014

Thanks @MechCoder !

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being stupid, but I am not able to get this to work. My arguments are [3, 2, 4] and [1, 2, 3] for array and sample_weight respectively. The sorted_idx is an array and thus throwing a TypeError. I wonder what are the expected arguments here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sample_weight should be a numpy array

On Sun, Oct 19, 2014 at 12:26 PM, Saurabh Jha notifications@github.com
wrote:

In sklearn/utils/stats.py:

@@ -44,3 +44,16 @@ def _rankdata(a, method="average"):

except TypeError as e:
rankdata = _rankdata
+
+
+def _weighted_percentile(array, sample_weight, percentile=50):

  • """Compute the weighted percentile of array with sample_weight. """
  • sorted_idx = np.argsort(array)
  • Find index of median prediction for each sample

  • weight_cdf = sample_weight[sorted_idx].cumsum()
  • percentile_or_above = weight_cdf >= (percentile / 100.0) * weight_cdf[-1]

Sorry for being stupid, but I am not able to get this to work. My
arguments are [3, 2, 4] and [1, 2, 3] for array and sample_weight
respectively. The sorted_idx is an array and thus throwing a TypeError. I
wonder what are the expected arguments here.


Reply to this email directly or view it on GitHub
https://github.com/scikit-learn/scikit-learn/pull/3779/files#r19059282.

Godspeed,
Manoj Kumar,
Intern, Telecom ParisTech
Mech Undergrad
http://manojbits.wordpress.com

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MechCoder !

@arjoly
Copy link
Copy Markdown
Member Author

arjoly commented Oct 21, 2014

A last reviewer ? ping @ogrisel, @pprett, @glouppe

@arjoly
Copy link
Copy Markdown
Member Author

arjoly commented Oct 24, 2014

In the long term, I hope to replace the dummy estimator in gradient boosting by the dummy regressor and classifier. any last reviewer?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it better to generate X randomly? Just for a sanity check.

@MechCoder
Copy link
Copy Markdown
Member

I suppose you can go ahead and merge if noone replies in 2-3 days, or when you feel like. I believe none is following these changes and this is not that big a diff.

@arjoly
Copy link
Copy Markdown
Member Author

arjoly commented Oct 31, 2014

Thanks @MechCoder ! I am ok to merge. Still I would appreciate a quick last review for this small pr.

@MechCoder
Copy link
Copy Markdown
Member

I think this should go in.

@MechCoder MechCoder closed this Nov 4, 2014
@MechCoder MechCoder reopened this Nov 4, 2014
MechCoder added a commit that referenced this pull request Nov 4, 2014
[MRG+1] Add sample_weight support to Dummy Regressor
@MechCoder MechCoder merged commit e6835a7 into scikit-learn:master Nov 4, 2014
@MechCoder
Copy link
Copy Markdown
Member

Thanks @arjoly .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants