[MRG+2] DOC add info to conventions for multi-label fitting#7519
[MRG+2] DOC add info to conventions for multi-label fitting#7519jnothman merged 4 commits intoscikit-learn:masterfrom
Conversation
nelson-liu
left a comment
There was a problem hiding this comment.
yeah, i'm not sure type casting is the proper place for this. It most definitely is a convention, but I don't think it fits cleanly into either category.
doc/tutorial/basic/tutorial.rst
Outdated
| array, since ``iris.target_names`` was for fitting. | ||
|
|
||
| Similarly, when using `multiclass classifiers <http://scikit-learn.org/stable/modules/multiclass.html>`_, | ||
| the format of the target data used for fitting determines whether multiclass or multilabel predictions will be returned:: |
There was a problem hiding this comment.
this line is a bit long, can you split it up into 2?
|
fwiw seems like there's an existing PR here: #5837 that edits the multiclass documentation. I do think this should be mentioned here as well, though. |
|
Created a new section in the conventions and tried to make shorter/clearer sentences. I hadn't seen the other PR since it wasn't linked to the issue - should there be some sort of correspondence between the two PRs? |
I suppose they should give similar information and not conflict each other , but besides that they could probably be treated fairly independently in my opinion. |
|
I think the two PRs are independent. |
jnothman
left a comment
There was a problem hiding this comment.
Should we also briefly note that multilabel data can be provided as a sparse matrix? Or does that belong elsewhere? Otherwise LGTM
doc/tutorial/basic/tutorial.rst
Outdated
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| When using :class:`multiclass classifiers <sklearn.multiclass>`, | ||
| the format of returned predictions is dependent on the format of the target data fit upon. More specifically, a classifier fit on multiclass labels will return multiclass |
There was a problem hiding this comment.
Please make the line shorter to be consistent with the rest of the docs.
doc/tutorial/basic/tutorial.rst
Outdated
| The ``predict()`` method therefore provides corresponding multiclass predictions. | ||
| In the second case, the classification target is provided to ``fit()`` as | ||
| a 2d array of binary label indicators, using the :class:`LabelBinarizer <sklearn.preprocessing.LabelBinarizer>`. | ||
| In this case ``predict()`` returns a 2d array representing the corresponding multi-label predictions. |
There was a problem hiding this comment.
You might highlight the last two instances receiving no label at all.
There was a problem hiding this comment.
And maybe add that it's possible that multiple ones appear in a row? (or maybe change the example so that that actually happens?)
There was a problem hiding this comment.
If so, refer to MultiLabelBinarizer... Is that too much for a tutorial?
|
looks good apart from minor comments :) |
doc/tutorial/basic/tutorial.rst
Outdated
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| When using :class:`multiclass classifiers <sklearn.multiclass>`, | ||
| the format of returned predictions is dependent on the format of the target data fit upon. More specifically, a classifier fit on multiclass labels will return multiclass |
There was a problem hiding this comment.
Maybe I'd say "the learning and prediction task that is performed is dependent" because not only the output format is changed, what happens in the algorithm is different.
doc/tutorial/basic/tutorial.rst
Outdated
| The ``predict()`` method therefore provides corresponding multiclass predictions. | ||
| In the second case, the classification target is provided to ``fit()`` as | ||
| a 2d array of binary label indicators, using the :class:`LabelBinarizer <sklearn.preprocessing.LabelBinarizer>`. | ||
| In this case ``predict()`` returns a 2d array representing the corresponding multi-label predictions. |
There was a problem hiding this comment.
And maybe add that it's possible that multiple ones appear in a row? (or maybe change the example so that that actually happens?)
|
LGTM |
|
The lines still seem too long. If we keep lines short (80char), future merges will be easier. |
|
My mistake, I was misunderstanding what exactly was meant by line lengths. Does this seem better now? |
|
Thanks! |
|
Thanks @jkarno! |
|
sweet, grats @jkarno :) |
…earn#7519) * DOC add info to conventions for multi-label fitting * DOC move to multilabel section and small edits * DOC clean multilabel examples and clean information * DOC fix line lengths for multiclass
…earn#7519) * DOC add info to conventions for multi-label fitting * DOC move to multilabel section and small edits * DOC clean multilabel examples and clean information * DOC fix line lengths for multiclass
…earn#7519) * DOC add info to conventions for multi-label fitting * DOC move to multilabel section and small edits * DOC clean multilabel examples and clean information * DOC fix line lengths for multiclass
Reference Issue
Fixes #4639
What does this implement/fix? Explain your changes.
I added documentation to the Quick Start conventions, documenting how multiclass classifiers will act when fit on classifications vs. binary label indicators.
Any other comments?
This is my first pull request to Scikit-Learn, so please let me know if there is anything that should be done differently. I spoke with @amueller regarding this contribution, and he suggested this location for the new documentation. However, I'm not positive whether it should be under the type casting section.
Andy, please let me know if I misunderstood the issue. I tried to demonstrate the difference you described, but I'm not 100% sure this was what the issue was referring to.