[MRG+2] modify disadvantage#8521
Merged
jmschrei merged 1 commit intoscikit-learn:masterfrom Mar 4, 2017
Merged
Conversation
svm can work effectively when feature number is >> number of samples. But to avoid over-fitting usually happens in such situation by choosing appropriate kernel (model selection) is important
Member
|
LGTM. |
amueller
reviewed
Mar 4, 2017
|
|
||
| - If the number of features is much greater than the number of | ||
| samples, the method is likely to give poor performances. | ||
| samples, avoid over-fitting in choosing :ref:`svm_kernels` and regularization |
Member
There was a problem hiding this comment.
Did you respect the 80 character line length?
Member
There was a problem hiding this comment.
Looks like its 84 characters, is that a major issue?
Codecov Report
@@ Coverage Diff @@
## master #8521 +/- ##
=======================================
Coverage 95.48% 95.48%
=======================================
Files 342 342
Lines 60913 60913
=======================================
Hits 58160 58160
Misses 2753 2753Continue to review full report at Codecov.
|
Member
|
LGTM as well. |
Member
|
Congrats @Ellen-Co2 ! |
Closed
herilalaina
pushed a commit
to herilalaina/scikit-learn
that referenced
this pull request
Mar 26, 2017
[MRG+2] modify disadvantage
massich
pushed a commit
to massich/scikit-learn
that referenced
this pull request
Apr 26, 2017
[MRG+2] modify disadvantage
Sundrique
pushed a commit
to Sundrique/scikit-learn
that referenced
this pull request
Jun 14, 2017
[MRG+2] modify disadvantage
NelleV
pushed a commit
to NelleV/scikit-learn
that referenced
this pull request
Aug 11, 2017
[MRG+2] modify disadvantage
paulha
pushed a commit
to paulha/scikit-learn
that referenced
this pull request
Aug 19, 2017
[MRG+2] modify disadvantage
maskani-moh
pushed a commit
to maskani-moh/scikit-learn
that referenced
this pull request
Nov 15, 2017
[MRG+2] modify disadvantage
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
svm can work effectively when feature number is >> number of samples.
But to avoid over-fitting usually happens in such situation by choosing
appropriate kernel (model selection) is important
Reference Issue
<-- Fixes #8450 -->
What does this implement/fix? Explain your changes.
In case of high dimensionality, SVM can still work effectively, but the over-fitting issue still need to be considered, cause the vc dimension might be close to infinite in such case, thus choose of kernel or control the regularization factor "C" is essential.
Any other comments?
To test for over-fitting, use cross validation or larger hold-out can be useful. Check some discussions regarding dimensionality here