Skip to content

[MRG+1] DOC more detailed note on SVC and SVR scalability#13209

Merged
qinhanmin2014 merged 4 commits intoscikit-learn:masterfrom
rth:doc-svc-scalability
Mar 11, 2019
Merged

[MRG+1] DOC more detailed note on SVC and SVR scalability#13209
qinhanmin2014 merged 4 commits intoscikit-learn:masterfrom
rth:doc-svc-scalability

Conversation

@rth
Copy link
Copy Markdown
Member

@rth rth commented Feb 21, 2019

This extends the note in the SVC / SVR on their bad scalability with the suggestion to use LinearSVC/LinearSVR or SDG classifier/regressor on larger datasets.

Just saw some usage of SVC(kernel='linear') on large datasets, so putting it in the docstring in addition to the user manual might be useful.

@chkoar
Copy link
Copy Markdown
Contributor

chkoar commented Feb 21, 2019

LGTM

The implementation is based on libsvm. The fit time complexity
is more than quadratic with the number of samples which makes it hard
to scale to dataset with more than a couple of 10000 samples.
to scale to dataset with more than a couple of 10000 samples. For large
Copy link
Copy Markdown
Contributor

@chkoar chkoar Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to datasets?

The implementation is based on libsvm.
The implementation is based on libsvm. The fit time complexity
is more than quadratic with the number of samples which makes it hard
to scale to dataset with more than a couple of 10000 samples. For large
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to datasets?

is more than quadratic with the number of samples which makes it hard
to scale to dataset with more than a couple of 10000 samples.
to scale to dataset with more than a couple of 10000 samples. For large
datasets consider using :class:`LinearSVC` or :class:`SGDClassifier`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need the full module path for the link to work, no? sklearn.linear_model.SGDClassifier

to scale to dataset with more than a couple of 10000 samples.
to scale to dataset with more than a couple of 10000 samples. For large
datasets consider using :class:`LinearSVC` or :class:`SGDClassifier`
instead.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly after a class:sklearn.kernel_approximation.Nystroem transformer.

@rth
Copy link
Copy Markdown
Member Author

rth commented Feb 27, 2019

Addressed both comments.

Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. I wonder if it's explicit enough in the user guide

is more than quadratic with the number of samples which makes it hard
to scale to dataset with more than a couple of 10000 samples.
to scale to datasets with more than a couple of 10000 samples. For large
datasets consider using :class:`sklearn.linear_model.LinearSVR` or
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use classifiers instead of regressors?

Copy link
Copy Markdown
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qinhanmin2014 is right there I think. It seems to be a typo.

is more than quadratic with the number of samples which makes it hard
to scale to dataset with more than a couple of 10000 samples.
to scale to datasets with more than a couple of 10000 samples. For large
datasets consider using :class:`sklearn.linear_model.LinearSVR` or
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
datasets consider using :class:`sklearn.linear_model.LinearSVR` or
datasets consider using :class:`sklearn.linear_model.LinearSVC` or

to scale to dataset with more than a couple of 10000 samples.
to scale to datasets with more than a couple of 10000 samples. For large
datasets consider using :class:`sklearn.linear_model.LinearSVR` or
:class:`sklearn.linear_model.SGDRegressor` instead, possibly after a
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:class:`sklearn.linear_model.SGDRegressor` instead, possibly after a
:class:`sklearn.linear_model.SGDClassifier` instead, possibly after a

@agramfort agramfort changed the title DOC more detailed note on SVC and SVR scalability [MRG+1] DOC more detailed note on SVC and SVR scalability Mar 6, 2019
@qinhanmin2014 qinhanmin2014 merged commit cd37fed into scikit-learn:master Mar 11, 2019
@rth
Copy link
Copy Markdown
Member Author

rth commented Mar 13, 2019

Thanks for addressing the review comment @qinhanmin2014 ! (and for other reviews)

@rth rth deleted the doc-svc-scalability branch March 13, 2019 21:41
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants