[MRG] EXA Improve example plot_svm_anova.py#11731
[MRG] EXA Improve example plot_svm_anova.py#11731jnothman merged 6 commits intoscikit-learn:masterfrom qinhanmin2014:svm-anova-example
Conversation
|
what if you keep the old data and just scale the features? |
|
@agramfort |
|
ok thanks for giving it a try.
I agree now that Iris is more adapted.
|
|
Wondering if someone can review it. The original example is wrong IMO. |
|
thanks @adrinjalali I agree that your version is better. |
examples/svm/plot_svm_anova.py
Outdated
| transform = SelectPercentile(chi2) | ||
|
|
||
| clf = Pipeline([('anova', transform), ('svc', SVC(gamma="auto"))]) | ||
| clf = Pipeline([('anova', transform), |
There was a problem hiding this comment.
the transform variable is only used here, I guess we can remove it and have SelectPercentile(chi2) directly in the pipeline. The comment above the pipeline also needs to change accordingly.
adrinjalali
left a comment
There was a problem hiding this comment.
otherwise LGTM, thanks @qinhanmin2014 !
|
ping @jnothman @agramfort |
This reverts commit d8214fe.
This reverts commit d8214fe.




I think the example plot_svm_anova.py is not good. It claims that


This example shows how to perform univariate feature selection to improve the classification scores.However, with some non-informative features, we actually get the worst result when we select number of features equal to the original dataset. The reason is that the features in digits dataset are either 0 or 1, so it seems not reasonable to add non-informative features with np.random.random.Related comment #11588 (review)
In the new example, I use the iris dataset (4 features) and add 36 non-informative features. We can find that our model achieves best performance when we select around 10% of features.
Before the PR:
After the PR: