[MRG+1] Add sample_weight support to Dummy Regressor#3779
[MRG+1] Add sample_weight support to Dummy Regressor#3779MechCoder merged 2 commits intoscikit-learn:masterfrom
Conversation
There was a problem hiding this comment.
I removed accept_sparse='csr' since it's not supported.
|
This is ready for review and it will fix #3420 |
9e4f6d5 to
de49e67
Compare
de49e67 to
da31344
Compare
There was a problem hiding this comment.
Noob question: out of curiosity, is there any difference between doing this and
`y = y[:, np.newaxis]`
I always use the latter.
There was a problem hiding this comment.
The y = y[:, np.newaxis] doesn't preserve contiguity. This is a known bug and will be solve in the current or next release of numpy.
There was a problem hiding this comment.
If I'm understanding right, these two lines can be replaced by
precentile_idx = np.searchsorted(weight_cdf, (percentile / 100.) * weight_cdf[-1])
or am I wrong?
There was a problem hiding this comment.
Do you think this could be optimized in another pr? I have just taken what @pprett has done previously and put it there to be useful to more than just gradient boosting.
There was a problem hiding this comment.
okay, unless @pprett thinks if it is ok, to change this over here.
|
@arjoly Updated the PR description. |
|
Thanks @MechCoder ! |
There was a problem hiding this comment.
Sorry for being stupid, but I am not able to get this to work. My arguments are [3, 2, 4] and [1, 2, 3] for array and sample_weight respectively. The sorted_idx is an array and thus throwing a TypeError. I wonder what are the expected arguments here.
There was a problem hiding this comment.
sample_weight should be a numpy array
On Sun, Oct 19, 2014 at 12:26 PM, Saurabh Jha notifications@github.com
wrote:
In sklearn/utils/stats.py:
@@ -44,3 +44,16 @@ def _rankdata(a, method="average"):
except TypeError as e:
rankdata = _rankdata
+
+
+def _weighted_percentile(array, sample_weight, percentile=50):
- """Compute the weighted
percentileofarraywithsample_weight. """- sorted_idx = np.argsort(array)
Find index of median prediction for each sample
- weight_cdf = sample_weight[sorted_idx].cumsum()
- percentile_or_above = weight_cdf >= (percentile / 100.0) * weight_cdf[-1]
Sorry for being stupid, but I am not able to get this to work. My
arguments are [3, 2, 4] and [1, 2, 3] for array and sample_weight
respectively. The sorted_idx is an array and thus throwing a TypeError. I
wonder what are the expected arguments here.—
Reply to this email directly or view it on GitHub
https://github.com/scikit-learn/scikit-learn/pull/3779/files#r19059282.
Godspeed,
Manoj Kumar,
Intern, Telecom ParisTech
Mech Undergrad
http://manojbits.wordpress.com
|
In the long term, I hope to replace the dummy estimator in gradient boosting by the dummy regressor and classifier. any last reviewer? |
There was a problem hiding this comment.
would it better to generate X randomly? Just for a sanity check.
|
I suppose you can go ahead and merge if noone replies in 2-3 days, or when you feel like. I believe none is following these changes and this is not that big a diff. |
|
Thanks @MechCoder ! I am ok to merge. Still I would appreciate a quick last review for this small pr. |
|
I think this should go in. |
[MRG+1] Add sample_weight support to Dummy Regressor
|
Thanks @arjoly . |
No description provided.