[MRG+2] Fix SVC predict_proba fails with new-style kernel strings#10412
[MRG+2] Fix SVC predict_proba fails with new-style kernel strings#10412qinhanmin2014 merged 11 commits intoscikit-learn:masterfrom
Conversation
|
Questions answered: Q1: yes, modify and test all applicable methods. Q2: I think fit was fixed in a previous patch Q3: I don't think we should go out of our way to support bytes in Python 3. As opposed to the magic of interchange between unicode and str, Python 3 tries to make a clear delineation between str and bytes. We clearly don't need/want bytes here. Please change WIP to MRG when you feel this is complete enough to be merged (pending review) |
|
Thanks for your answers, I have updated the title to MRG. |
sklearn/svm/libsvm.pyx
Outdated
|
|
||
| cdef void set_predict_params( | ||
| svm_parameter *param, int svm_type, kernel, int degree, double gamma, | ||
| svm_parameter *param, int svm_type, int kernel_type, int degree, double gamma, |
There was a problem hiding this comment.
Should we consider this breaking a public interface, @amueller??
qinhanmin2014
left a comment
There was a problem hiding this comment.
Please fix the flake8 errors.
I believe we should also modify dense fit in the same way.
|
_dense_fit was handled in #7064
|
|
@qinhanmin2014 The flake8 error is introduced in #7064. See scikit-learn/sklearn/svm/tests/test_svm.py Lines 509 to 511 in 6fdcb3b This test is designed to be run under Python2 and the name unicode is not defined under Python3.May be we should silence flake8 at that line like this? clf = svm.SVC(kernel=unicode('linear'), # noqa: F821
probability=True) |
|
I think u'linear' should work in place of unicode('linear') in all
supported pythons
…On 10 Jan 2018 2:46 pm, "Jiongyan Zhang" ***@***.***> wrote:
@qinhanmin2014 <https://github.com/qinhanmin2014> The flake8 error is
introduced in #7064
<#7064>. See
https://github.com/scikit-learn/scikit-learn/blob/
6fdcb3b/sklearn/svm/tests/
test_svm.py#L509-L511
This test is designed to be run under Python2 and the name unicode is not
defined under Python3.
May be we should silence flake8 at that line like this?
clf = svm.SVC(kernel=unicode('linear'), # noqa: F821
probability=True)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10412 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz69uozsoz7AaqiecRJ2_ee3MhXGxRks5tJDKFgaJpZM4RVUeO>
.
|
It is the case, but I might prefer to use an unified way to handle the problem (i.e. revert #7064 and apply the same way to dense fit). This might make the code easier to maintain (see my comment #10338 (comment)). I agree with amuller (#10338 (comment)) that current solution is better than #7064. WDYT? Thanks. |
|
LGTM apart from the following concerns: |
|
yes, I suspect we are breaking a public interface ... even one that's
rarely used. So we are better off just being inconsistent and making it
work without breaking current code.
|
|
Not sure whether we should break the public interface. But if don't want to break the public interface, we have two ways to do that (as @qinhanmin2014 proposed in #10338): (1) Remove type constraint of (2) Add string handling code to |
|
Do (1). set_predict_params does already does not type kernel, so we lose
absolutely nothing by removing the type. This would also be consistent with
predict.
I think we should do the same in fit. I'm not entirely sure how cython
handles LIBSVM_KERNEL_TYPES.index, but since that list is not typed, I
expect it should use CPython's list.index and object matching, hence we're
likely already casting the str back to generic object there too.
…On 12 January 2018 at 12:44, Jiongyan Zhang ***@***.***> wrote:
If don't want to break the public interface, we have two ways to do that:
(1) Remove type constraint of libsvm.predict_proba() parameter. That is
change str kernel to kernel. Though changes the interface, it still has
backward compatibility.
(2) Add string handling code to _dense_predict_proba(), just like what
_dense_fit() has done.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10412 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz60u-ztcID9ubq8_wAQUrjRhkb7wuks5tJrkHgaJpZM4RVUeO>
.
|
|
I changed parameter |
|
I think that I would change |
|
otherwise LGTM |
|
Thanks for the advice, updated. |
sklearn/svm/tests/test_svm.py
Outdated
|
|
||
| # Test ascii bytes (same as str on python2) | ||
| clf = svm.SVC(kernel=bytes('linear', 'ascii')) | ||
| clf = svm.SVC(kernel=str('linear'), probability=True) |
There was a problem hiding this comment.
I think this is now the same as the case below
There was a problem hiding this comment.
Yes, these lines are duplicated, I have removed them.
|
Happy to merge when green |
|
Thanks for your reviews! |
qinhanmin2014
left a comment
There was a problem hiding this comment.
LGTM. merging. Thanks @qmick
|
This did not include a change to what's new. @qmick: |
|
Sorry, did't realize it before, what can I do to handle this problem now? |
|
@qmick Please just open another PR. |
Reference Issues/PRs
Fixes #10374. Closes #10338
What does this implement/fix? Explain your changes.
This fixes SVC predict_proba fails. As discussion in #10338, I change dense case to be like sparse case, that is passing int kernel_type instead of str kernel.
It is done by moving
kernel_index = LIBSVM_KERNEL_TYPES.index(kernel)from cypthon code(svm/libsvm.pyx) to python code(svm/base.py)Any other comments?
Q1: Since both
predict()anddecision_function()insvm/libsvm.pyxalso useset_predict_params(), wherekernel_index = LIBSVM_KERNEL_TYPES.index(kernel)locates in, I modifypredict()anddecision_function()for consistency. Is it what we want?Q2: Should we make a change to
fit()insvm/libsvm.pyxas well?Q3: Both dense and sparse predict fail when
kernel=b'linear'is given under Python3. In fact, only_dense_fit()handles this case. Should we handle this case?NOTE: I reuse the test code from #10338, thanks @JoshuaMeyers for the great work!