[MRG-0] Make LabelEncoder more friendly to new labels#3483
[MRG-0] Make LabelEncoder more friendly to new labels#3483mjbommar wants to merge 9 commits intoscikit-learn:masterfrom
Conversation
|
You have a rebase issue. I got once the problem. If I remember, I think that I have fixed Hope it helps. |
|
@arjoly, appears to have worked. Thanks for the recommendation and sorry for the mistake. |
|
No problem, I ran once in that issue and this was frustrating. I am happy that it works for you! |
But it's important to have a consistent API across the scikit. Could you use a property instead? |
|
@mblondel , thanks for the recommendation. Implemented as suggested. |
There was a problem hiding this comment.
Could it work with string label (string)?
|
@mjbommar, should we expect you won't be completing this any time soon and label it "needs contributor" for someone to adopt? I will do so, but you should say if you'd rather complete it. |
|
@jnothman, my recollection is fuzzy, but I think this issue was primarily blocked by design disagreements. If we can come to an agreement about desired behavior, I could see how easily the work could be completed and merged into master. |
This is a final, cleanly rebased version of PR 3243 (#3243) incorporating discussions.
Summary:
This PR intends to make
preprocessing.LabelEncodermore friendly for production/pipeline usage by adding anew_labelsconstructor argument.Instead of always raising
ValueErrorfor unseen/new labels in transform,LabelEncodermay be initialized with new_labels as:"raise": current behavior, i.e., raiseValueError; to remain default behavior"update": update classes with new IDs[N, ..., N+m-1]for m new labels and assignN.B.:
.classes_is not a property to support thenew_labels="update"behavior.Tests and documentation updates included.