[MRG+1] DOC more detailed note on SVC and SVR scalability#13209
Merged
qinhanmin2014 merged 4 commits intoscikit-learn:masterfrom Mar 11, 2019
Merged
[MRG+1] DOC more detailed note on SVC and SVR scalability#13209qinhanmin2014 merged 4 commits intoscikit-learn:masterfrom
qinhanmin2014 merged 4 commits intoscikit-learn:masterfrom
Conversation
3a9e30a to
29d36aa
Compare
qinhanmin2014
approved these changes
Feb 21, 2019
Contributor
|
LGTM |
chkoar
reviewed
Feb 21, 2019
sklearn/svm/classes.py
Outdated
| The implementation is based on libsvm. The fit time complexity | ||
| is more than quadratic with the number of samples which makes it hard | ||
| to scale to dataset with more than a couple of 10000 samples. | ||
| to scale to dataset with more than a couple of 10000 samples. For large |
sklearn/svm/classes.py
Outdated
| The implementation is based on libsvm. | ||
| The implementation is based on libsvm. The fit time complexity | ||
| is more than quadratic with the number of samples which makes it hard | ||
| to scale to dataset with more than a couple of 10000 samples. For large |
ogrisel
reviewed
Feb 21, 2019
sklearn/svm/classes.py
Outdated
| is more than quadratic with the number of samples which makes it hard | ||
| to scale to dataset with more than a couple of 10000 samples. | ||
| to scale to dataset with more than a couple of 10000 samples. For large | ||
| datasets consider using :class:`LinearSVC` or :class:`SGDClassifier` |
Member
There was a problem hiding this comment.
we need the full module path for the link to work, no? sklearn.linear_model.SGDClassifier
ogrisel
reviewed
Feb 21, 2019
sklearn/svm/classes.py
Outdated
| to scale to dataset with more than a couple of 10000 samples. | ||
| to scale to dataset with more than a couple of 10000 samples. For large | ||
| datasets consider using :class:`LinearSVC` or :class:`SGDClassifier` | ||
| instead. |
Member
There was a problem hiding this comment.
possibly after a class:sklearn.kernel_approximation.Nystroem transformer.
Member
Author
|
Addressed both comments. |
jnothman
approved these changes
Feb 27, 2019
Member
jnothman
left a comment
There was a problem hiding this comment.
I like this. I wonder if it's explicit enough in the user guide
sklearn/svm/classes.py
Outdated
| is more than quadratic with the number of samples which makes it hard | ||
| to scale to dataset with more than a couple of 10000 samples. | ||
| to scale to datasets with more than a couple of 10000 samples. For large | ||
| datasets consider using :class:`sklearn.linear_model.LinearSVR` or |
Member
There was a problem hiding this comment.
use classifiers instead of regressors?
glemaitre
requested changes
Feb 28, 2019
Member
glemaitre
left a comment
There was a problem hiding this comment.
@qinhanmin2014 is right there I think. It seems to be a typo.
sklearn/svm/classes.py
Outdated
| is more than quadratic with the number of samples which makes it hard | ||
| to scale to dataset with more than a couple of 10000 samples. | ||
| to scale to datasets with more than a couple of 10000 samples. For large | ||
| datasets consider using :class:`sklearn.linear_model.LinearSVR` or |
Member
There was a problem hiding this comment.
Suggested change
| datasets consider using :class:`sklearn.linear_model.LinearSVR` or | |
| datasets consider using :class:`sklearn.linear_model.LinearSVC` or |
sklearn/svm/classes.py
Outdated
| to scale to dataset with more than a couple of 10000 samples. | ||
| to scale to datasets with more than a couple of 10000 samples. For large | ||
| datasets consider using :class:`sklearn.linear_model.LinearSVR` or | ||
| :class:`sklearn.linear_model.SGDRegressor` instead, possibly after a |
Member
There was a problem hiding this comment.
Suggested change
| :class:`sklearn.linear_model.SGDRegressor` instead, possibly after a | |
| :class:`sklearn.linear_model.SGDClassifier` instead, possibly after a |
agramfort
approved these changes
Mar 6, 2019
qinhanmin2014
approved these changes
Mar 11, 2019
Member
Author
|
Thanks for addressing the review comment @qinhanmin2014 ! (and for other reviews) |
xhluca
pushed a commit
to xhluca/scikit-learn
that referenced
this pull request
Apr 28, 2019
koenvandevelde
pushed a commit
to koenvandevelde/scikit-learn
that referenced
this pull request
Jul 12, 2019
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This extends the note in the SVC / SVR on their bad scalability with the suggestion to use LinearSVC/LinearSVR or SDG classifier/regressor on larger datasets.
Just saw some usage of
SVC(kernel='linear')on large datasets, so putting it in the docstring in addition to the user manual might be useful.