[MRG+2] DOC add info to conventions for multi-label fitting by jkarno · Pull Request #7519 · scikit-learn/scikit-learn

jkarno · 2016-09-29T04:03:48Z

Reference Issue

What does this implement/fix? Explain your changes.

I added documentation to the Quick Start conventions, documenting how multiclass classifiers will act when fit on classifications vs. binary label indicators.

Any other comments?

This is my first pull request to Scikit-Learn, so please let me know if there is anything that should be done differently. I spoke with @amueller regarding this contribution, and he suggested this location for the new documentation. However, I'm not positive whether it should be under the type casting section.

Andy, please let me know if I misunderstood the issue. I tried to demonstrate the difference you described, but I'm not 100% sure this was what the issue was referring to.

nelson-liu

yeah, i'm not sure type casting is the proper place for this. It most definitely is a convention, but I don't think it fits cleanly into either category.

nelson-liu · 2016-09-29T04:07:45Z

doc/tutorial/basic/tutorial.rst

 array, since ``iris.target_names`` was for fitting.

+Similarly, when using `multiclass classifiers <http://scikit-learn.org/stable/modules/multiclass.html>`_,
+the format of the target data used for fitting determines whether multiclass or multilabel predictions will be returned::


this line is a bit long, can you split it up into 2?

nelson-liu · 2016-09-29T04:11:15Z

fwiw seems like there's an existing PR here: #5837 that edits the multiclass documentation. I do think this should be mentioned here as well, though.

jkarno · 2016-09-29T18:24:24Z

Created a new section in the conventions and tried to make shorter/clearer sentences.

I hadn't seen the other PR since it wasn't linked to the issue - should there be some sort of correspondence between the two PRs?

nelson-liu · 2016-09-29T18:26:32Z

I hadn't seen the other PR since it wasn't linked to the issue - should there be some sort of correspondence between the two PRs?

I suppose they should give similar information and not conflict each other , but besides that they could probably be treated fairly independently in my opinion.

amueller · 2016-09-29T19:50:45Z

I think the two PRs are independent.

jnothman

Should we also briefly note that multilabel data can be provided as a sparse matrix? Or does that belong elsewhere? Otherwise LGTM

jnothman · 2016-10-05T12:28:58Z

doc/tutorial/basic/tutorial.rst

+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When using :class:`multiclass classifiers <sklearn.multiclass>`,
+the format of returned predictions is dependent on the format of the target data fit upon. More specifically, a classifier fit on multiclass labels will return multiclass


Please make the line shorter to be consistent with the rest of the docs.

jnothman · 2016-10-05T12:30:04Z

doc/tutorial/basic/tutorial.rst

+The ``predict()`` method therefore provides corresponding multiclass predictions.
+In the second case, the classification target is provided to ``fit()``  as
+a 2d array of binary label indicators, using the :class:`LabelBinarizer <sklearn.preprocessing.LabelBinarizer>`.
+In this case ``predict()`` returns a 2d array representing the corresponding multi-label predictions.


You might highlight the last two instances receiving no label at all.

And maybe add that it's possible that multiple ones appear in a row? (or maybe change the example so that that actually happens?)

If so, refer to MultiLabelBinarizer... Is that too much for a tutorial?

amueller · 2016-10-05T15:07:46Z

looks good apart from minor comments :)

amueller · 2016-10-05T15:06:08Z

doc/tutorial/basic/tutorial.rst

+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When using :class:`multiclass classifiers <sklearn.multiclass>`,
+the format of returned predictions is dependent on the format of the target data fit upon. More specifically, a classifier fit on multiclass labels will return multiclass


Maybe I'd say "the learning and prediction task that is performed is dependent" because not only the output format is changed, what happens in the algorithm is different.

amueller · 2016-10-05T15:07:29Z

doc/tutorial/basic/tutorial.rst

+The ``predict()`` method therefore provides corresponding multiclass predictions.
+In the second case, the classification target is provided to ``fit()``  as
+a 2d array of binary label indicators, using the :class:`LabelBinarizer <sklearn.preprocessing.LabelBinarizer>`.
+In this case ``predict()`` returns a 2d array representing the corresponding multi-label predictions.


And maybe add that it's possible that multiple ones appear in a row? (or maybe change the example so that that actually happens?)

jnothman · 2016-10-06T00:41:37Z

LGTM

NelleV · 2016-10-06T16:23:12Z

The lines still seem too long. If we keep lines short (80char), future merges will be easier.

jkarno · 2016-10-06T21:13:58Z

My mistake, I was misunderstanding what exactly was meant by line lengths. Does this seem better now?

NelleV · 2016-10-06T21:16:03Z

Thanks!

jnothman · 2016-10-06T22:20:45Z

Thanks @jkarno!

amueller · 2016-10-07T16:48:35Z

sweet, grats @jkarno :)

…earn#7519) * DOC add info to conventions for multi-label fitting * DOC move to multilabel section and small edits * DOC clean multilabel examples and clean information * DOC fix line lengths for multiclass

DOC add info to conventions for multi-label fitting

5dc589b

nelson-liu reviewed Sep 29, 2016

View reviewed changes

DOC move to multilabel section and small edits

44d8aeb

jkarno changed the title ~~DOC add info to conventions for multi-label fitting~~ [MRG] DOC add info to conventions for multi-label fitting Sep 29, 2016

jnothman requested changes Oct 5, 2016

View reviewed changes

amueller requested changes Oct 5, 2016

View reviewed changes

DOC clean multilabel examples and clean information

99105f9

jnothman approved these changes Oct 6, 2016

View reviewed changes

jnothman changed the title ~~[MRG] DOC add info to conventions for multi-label fitting~~ [MRG+1] DOC add info to conventions for multi-label fitting Oct 6, 2016

DOC fix line lengths for multiclass

af9fe44

NelleV approved these changes Oct 6, 2016

View reviewed changes

NelleV changed the title ~~[MRG+1] DOC add info to conventions for multi-label fitting~~ [MRG+2] DOC add info to conventions for multi-label fitting Oct 6, 2016

jnothman merged commit 45cb11d into scikit-learn:master Oct 6, 2016

jnothman mentioned this pull request Nov 2, 2016

DecisionTreeClassifier unknown label type: 'continuous-multioutput' #7801

Closed

Uh oh!

Conversation

jkarno commented Sep 29, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

nelson-liu left a comment

Choose a reason for hiding this comment

Uh oh!

nelson-liu Sep 29, 2016

Choose a reason for hiding this comment

Uh oh!

nelson-liu commented Sep 29, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkarno commented Sep 29, 2016

Uh oh!

nelson-liu commented Sep 29, 2016

Uh oh!

amueller commented Sep 29, 2016

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman Oct 5, 2016

Choose a reason for hiding this comment

Uh oh!

jnothman Oct 5, 2016

Choose a reason for hiding this comment

Uh oh!

amueller Oct 5, 2016

Choose a reason for hiding this comment

Uh oh!

jnothman Oct 5, 2016

Choose a reason for hiding this comment

Uh oh!

amueller commented Oct 5, 2016

Uh oh!

amueller Oct 5, 2016

Choose a reason for hiding this comment

Uh oh!

amueller Oct 5, 2016

Choose a reason for hiding this comment

Uh oh!

jnothman commented Oct 6, 2016

Uh oh!

NelleV commented Oct 6, 2016

Uh oh!

jkarno commented Oct 6, 2016

Uh oh!

NelleV commented Oct 6, 2016

Uh oh!

jnothman commented Oct 6, 2016

Uh oh!

amueller commented Oct 7, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jkarno commented Sep 29, 2016 •

edited

Loading

nelson-liu commented Sep 29, 2016 •

edited

Loading