Skip to content

[MRG-0] Make LabelEncoder more friendly to new labels#3483

Closed
mjbommar wants to merge 9 commits intoscikit-learn:masterfrom
mjbommar:label-encoder-unseen-final
Closed

[MRG-0] Make LabelEncoder more friendly to new labels#3483
mjbommar wants to merge 9 commits intoscikit-learn:masterfrom
mjbommar:label-encoder-unseen-final

Conversation

@mjbommar
Copy link
Copy Markdown
Contributor

This is a final, cleanly rebased version of PR 3243 (#3243) incorporating discussions.

Summary:

This PR intends to make preprocessing.LabelEncoder more friendly for production/pipeline usage by adding a new_labels constructor argument.

Instead of always raising ValueError for unseen/new labels in transform, LabelEncoder may be initialized with new_labels as:

  • "raise": current behavior, i.e., raise ValueError; to remain default behavior
  • "update": update classes with new IDs [N, ..., N+m-1] for m new labels and assign
  • an integer value: set newly seen labels to have fixed value corresponding to this integer value

N.B.: .classes_ is not a property to support the new_labels="update" behavior.

Tests and documentation updates included.

@arjoly
Copy link
Copy Markdown
Member

arjoly commented Jul 24, 2014

You have a rebase issue. I got once the problem. If I remember, I think that I have fixed
the issue using the following strategy

git checkout master
git fetch upstream
git rebase upstream/master
git checkout new-branch
git rebase master
git push -f 

Hope it helps.

@mjbommar
Copy link
Copy Markdown
Contributor Author

@arjoly, appears to have worked. Thanks for the recommendation and sorry for the mistake.

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage decreased (-0.0%) when pulling 866e939 on mjbommar:label-encoder-unseen-final into 376ac51 on scikit-learn:master.

@arjoly
Copy link
Copy Markdown
Member

arjoly commented Jul 24, 2014

No problem, I ran once in that issue and this was frustrating. I am happy that it works for you!

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage decreased (-0.0%) when pulling da4cafb on mjbommar:label-encoder-unseen-final into 376ac51 on scikit-learn:master.

@mblondel
Copy link
Copy Markdown
Member

N.B.: Direct access to .classes_ should be replaced with .get_classes() in order to properly handle the new_labels="update".

But it's important to have a consistent API across the scikit. Could you use a property instead?

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage decreased (-0.01%) when pulling 0f3e3d3 on mjbommar:label-encoder-unseen-final into 376ac51 on scikit-learn:master.

@mjbommar
Copy link
Copy Markdown
Contributor Author

@mblondel , thanks for the recommendation. Implemented as suggested.

@mjbommar mjbommar changed the title Make LabelEncoder more friendly to new labels [MRG-0] Make LabelEncoder more friendly to new labels Jul 25, 2014
@mjbommar
Copy link
Copy Markdown
Contributor Author

mjbommar commented Aug 3, 2014

@arjoly, @jnothman , anything I can do to make this easier for you to review?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it work with string label (string)?

@jnothman
Copy link
Copy Markdown
Member

@mjbommar, should we expect you won't be completing this any time soon and label it "needs contributor" for someone to adopt? I will do so, but you should say if you'd rather complete it.

@mjbommar
Copy link
Copy Markdown
Contributor Author

mjbommar commented Apr 28, 2016

@jnothman, my recollection is fuzzy, but I think this issue was primarily blocked by design disagreements. If we can come to an agreement about desired behavior, I could see how easily the work could be completed and merged into master.

@mjbommar mjbommar closed this Jul 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants