[MRG+2] Sparse label_binarizer#3203
[MRG+2] Sparse label_binarizer#3203hamsal wants to merge 41 commits intoscikit-learn:masterfrom hamsal:sprs-lbl-bin
Conversation
|
I am trying to figure out how to handle the case where the input |
|
Supporting sparse in type_of_target is definitely something that's needed, On 28 May 2014 07:57, hamsal notifications@github.com wrote:
|
|
Sparse support for type_of_target has been included based on the comments in Rohit's PR. The support for sparse input as it is right now will really only serve to validate the input sparse matrix as being correctly constructed there is no conversion from any format that is not multilabel-indicator to multilabel-indicator. This does seem more correct then failing on the input when it is a sparse multilabel-indicator matrix. |
|
I am getting some numerical precision issues in the doctests that I think may be unrelated to the changes of this PR. Advice on how to proceed would be appreciated. I am running on a 64 bit machine but the numbers calculated contain fewer significant figures. |
|
If i understand correctly the failure, a piece od code uses the label binarizer with the multilabel argument. But you seprecatr this parameter. I would simply correct the function call in the appropriate place. |
|
The multilabel error has been fixed, the precision errors have come back on the Travis CI. |
|
There is probably a change in the dtype which cause the small difference. |
sklearn/preprocessing/label.py
Outdated
There was a problem hiding this comment.
Can you rename this parameter sparse_output to be consistent with other module. Can you add also a comma after boolean?
|
Thanks for the point to the ellipsis directive! The The test coverage needs to be improved. |
sklearn/preprocessing/label.py
Outdated
There was a problem hiding this comment.
I would avoid doing any kind of magic with the label inference. What is the current behavior?
There was a problem hiding this comment.
Ok this was revised in a recent commit, it was based on some testing error.
|
Hm apparently you deleted and added some files. |
|
You forgot the ellipsis |
sklearn/utils/multiclass.py
Outdated
|
To undo properly the file deletion, you can use: But use with care, it rewrite the history of the branch. |
|
I don't understand how you get -2.66% of decreased coverage. |
|
I did not change code in my last commit and yet coverage changed significantly which is curious. |
|
About the files, they were not deletions they were changes of the access mode in the OS. I accidentally did |
… and made formatting and wording revisions
…rse agrmax, and (row_max.ravel() <= 0) -> (row_max.ravel() == 0)
|
Apparently, you have merge conflicts. Looks good to me, switch the title to MRG+1 |
|
Yes, we I think I am also +1 for merge. Thank you @hamsal. One small comment: there are a few tests, in this PR and elsewhere, that iterate through a number of sparse formats. In a new PR after this one is merged, could we please make this list a constant in |
+1 for your proposition @jnothman. Can you open an issue or make a pull request? |
|
merging |
|
merged as 2bfca14 |
|
In my git rebases and history edits it looks like there are 3 commits in this PR that just came into the PR after the most recent rebase. So I don't think this was ready for a merge! Please look at the commit history and see that there are three commits shared by @jnothman and @arjoly. Edit: These commits apparently also introduced the two new files in the files changed that have to do with the tree classifier. |
|
I should have made it clearer that I was also attempting to fix errors, I do not think the most recent Travis passed. |
|
Oh! Whoops... What do you mean by there being commits shared by jnothman and arjoly? |
|
You are right a test fails. Could you please submit a patch? |
|
The commit history has three commits, two from you and one from Arnaud. Arnaud Joly arjoly MAINT reduce amount of boiler code using standard C operations bd7db7b jnothman jnothman DOC give example of binarizing binary targets … eec979c jnothman jnothman DOC Note shape of binary binarized output 34872e9 Yes I can submit a patch for the Travis error, I am not sure what to do about these stary commits or if they are a problem. |
|
don't worry about that. it's squashed to one commit. |
|
Ok good to know |
|
But the test error makes it seem like the label binarizer is outputting non-integer labels. That should probably be locally tested rather than triggering an error in |
|
I'm very sorry to have not properly tested before pushing, so hopefully we can fix this quickly. |
|
This error has been confusing me because I can not recreate it locally, I have rebased and I have verified that my numpy is over 1.6.2 So for now the only place I have seen it is on Travis. Edit: I have recreated it locally |
|
I am working on the patch here #3314 I have fixed the error locally, Travis is building it now. |
|
The patch has been merged, I apologize for the headache and thank you for the reviews! |
|
Thanks for the good work! 🍻 |
This is a continuation of a part of PR #2458 based on the code already written by @rsivapr and @arjoly from the branch https://github.com/arjoly/scikit-learn/tree/sparse-label_binarizer.
The first steps are to write tests to motivate correctness and then implement the appropriate changes in label.py.
Initial Checklist:
LabelBinarizer()- with testinglabel_binarizer()- with testingis_label_indicator_matrixwith sparse inputlabel_binarizewith binary one class caseRebase Checklist: