TST Extend tests for scipy.sparse.*array in sklearn/svm/tests/test_sparse.py#27511
TST Extend tests for scipy.sparse.*array in sklearn/svm/tests/test_sparse.py#27511work-mohit wants to merge 13 commits intoscikit-learn:mainfrom
scipy.sparse.*array in sklearn/svm/tests/test_sparse.py#27511Conversation
|
@Charlie-XIAO would love to take any help from you whenever you feel free. |
|
@work-mohit Personally I would suggest refactoring these two tests a bit because I find them hard to read (not sure if maintainers agree). First add Then, I think we can refactor def check_svm_model_equal(svm_model, X_train, y_train, X_test):
# Use the original svm model for dense fit and clone an exactly same
# svm model for sparse fit
sparse_svm = base.clone(svm_model)
svm_model.fit(X_train.toarray(), y_train)
if sparse.issparse(X_test):
X_test_dense = X_test.toarray()
else:
X_test_dense = X_test
sparse_svm.fit(X_train, y_train)
assert sparse.issparse(sparse_svm.support_vectors_)
assert sparse.issparse(sparse_svm.dual_coef_)
assert_array_almost_equal(
svm_model.support_vectors_, sparse_svm.support_vectors_.toarray()
)
assert_array_almost_equal(svm_model.dual_coef_, sparse_svm.dual_coef_.toarray())
if svm_model.kernel == "linear":
assert sparse.issparse(sparse_svm.coef_)
assert_array_almost_equal(svm_model.coef_, sparse_svm.coef_.toarray())
assert_array_almost_equal(svm_model.support_, sparse_svm.support_)
assert_array_almost_equal(
svm_model.predict(X_test_dense), sparse_svm.predict(X_test)
)
assert_array_almost_equal(
svm_model.decision_function(X_test_dense), sparse_svm.decision_function(X_test)
)
assert_array_almost_equal(
svm_model.decision_function(X_test_dense),
sparse_svm.decision_function(X_test_dense),
)
if isinstance(svm_model, svm.OneClassSVM):
msg = "cannot use sparse input in 'OneClassSVM' trained on dense data"
else:
assert_array_almost_equal(
svm_model.predict_proba(X_test_dense), sparse_svm.predict_proba(X_test), 4
)
msg = "cannot use sparse input in 'SVC' trained on dense data"
if sparse.issparse(X_test):
with pytest.raises(ValueError, match=msg):
svm_model.predict(X_test)Then @skip_if_32bit
@pytest.mark.parametrize(
"X_train, y_train, X_test",
[
[X, Y, T],
[X2, Y2, T2],
[X_blobs[:80], y_blobs[:80], X_blobs[80:]],
[iris.data, iris.target, iris.data],
],
)
@pytest.mark.parametrize("kernel", ["linear", "poly", "rbf", "sigmoid"])
@pytest.mark.parametrize("sparse_container", CSR_CONTAINERS + LIL_CONTAINERS)
def test_svc(X_train, y_train, X_test, kernel, sparse_container):
"""Check that sparse SVC gives the same result as SVC."""
X_train = sparse_container(X_train)
clf = svm.SVC(
gamma=1,
kernel=kernel,
probability=True,
random_state=0,
decision_function_shape="ovo",
)
check_svm_model_equal(clf, X_train, y_train, X_test)You can then do something similar to @pytest.mark.parametrize(
"X_train, y_train, X_test",
[
[X, None, T],
[X2, None, T2],
[X_blobs[:80], None, X_blobs[80:]],
[iris.data, None, iris.data],
],
)
@pytest.mark.parametrize("kernel", ["linear", "poly", "rbf", "sigmoid"])
@pytest.mark.parametrize("sparse_container", CSR_CONTAINERS + LIL_CONTAINERS)
@skip_if_32bit
def test_sparse_oneclasssvm(X_train, y_train, X_test, kernel, sparse_container):
# Check that sparse OneClassSVM gives the same result as dense OneClassSVM
X_train = sparse_container(X_train)
clf = svm.OneClassSVM(gamma=1, kernel=kernel)
check_svm_model_equal(clf, X_train, y_train, X_test) |
|
I have also seen you getting |
Intersting ! |
Yes sure, I have checked test yet, just raised as a draft so that I can take help with others. |
|
from ._radius_neighbors_classmode import ( facing this issue, any help ? |
|
Please recompile @work-mohit. This Cython module is created a few months ago (if I remembered correctly). |
|
C:\Users\Mohit\anaconda3\envs\sklearndev\lib\site-packages\scipy\sparse_compressed.py:415: ValueError below is the code at 415 at def multiply(self, other):
"""Point-wise multiplication by another matrix, vector, or
scalar.
"""
# Scalar multiplication.
if isscalarlike(other):
return self._mul_scalar(other)
# Sparse matrix or vector.
if issparse(other):
if self.shape == other.shape:
other = self.__class__(other)
return self._binopt(other, '_elmul_')
# Single element.
elif other.shape == (1, 1):
return self._mul_scalar(other.toarray()[0, 0])
elif self.shape == (1, 1):
return other._mul_scalar(self.toarray()[0, 0])
# A row times a column.
elif self.shape[1] == 1 and other.shape[0] == 1:
return self._mul_sparse_matrix(other.tocsc())
elif self.shape[0] == 1 and other.shape[1] == 1:
return other._mul_sparse_matrix(self.tocsc())
# Row vector times matrix. other is a row.
elif other.shape[0] == 1 and self.shape[1] == other.shape[1]:
other = self._dia_container(
(other.toarray().ravel(), [0]),
shape=(other.shape[1], other.shape[1])
)
return self._mul_sparse_matrix(other)
# self is a row.
elif self.shape[0] == 1 and self.shape[1] == other.shape[1]:
copy = self._dia_container(
(self.toarray().ravel(), [0]),
shape=(self.shape[1], self.shape[1])
)`
` return other._mul_sparse_matrix(copy)
# Column vector times matrix. other is a column.
elif other.shape[1] == 1 and self.shape[0] == other.shape[0]:
other = self._dia_container(
(other.toarray().ravel(), [0]),
shape=(other.shape[0], other.shape[0])
)
return other._mul_sparse_matrix(self)
# self is a column.
elif self.shape[1] == 1 and self.shape[0] == other.shape[0]:
copy = self._dia_container(
(self.toarray().ravel(), [0]),
shape=(self.shape[0], self.shape[0])
)
return copy._mul_sparse_matrix(other)
else:
raise ValueError("inconsistent shapes")`` |
|
@work-mohit I think I've already told you how to debug this: #27511 (comment). I've checked that this is indeed the case, and it appears in the test instead of in the module. The thing is, if you have |
…tst/svm/test_sparse.py
…mohit/scikit-learn into fix/tst/svm/test_sparse.py
|
@ogrisel I'm getting
should this be reported? |
ogrisel
left a comment
There was a problem hiding this comment.
Could you please quote the full traceback of the exception to understand the context of the problem?
| assert sparse.issparse(sparse_svm.dual_coef_) | ||
| assert_array_almost_equal( | ||
| dense_svm.support_vectors_, sparse_svm.support_vectors_.toarray() | ||
| svm_model.support_vectors_, sparse_svm.support_vectors_.toarray() |
There was a problem hiding this comment.
I would rather not rename this variable. The previous name was more informative.
There was a problem hiding this comment.
Just to point out that, instead of taking two params (dense_svm & sparse_svm) separately, we're combining both. So I guess only naming dense_svm would not be sufficient. You can suggest some more meaningful name.
|
csr_container = <class 'scipy.sparse._csr.csr_array'>
|
|
Hi @work-mohit, are you still working on this PR? |
I got stuck somewhere in the issue, now I'm not tracking any more. |
|
@Charlie-XIAO: are you interested to continue this PR? |
|
Yes, I can continue this PR if needed. However, if I need to contribute to this branch I will need access granted by @work-mohit since I am not a maintainer. If @work-mohit no longer decides to continue, I would personally prefer to open another PR and mention him as collaborator in a commit. Would that be okay or do you suggest an alternative approach @jjerphan? |
|
Yes, this is usually the approach we take, giving credits to the original authors. |
|
@Charlie-XIAO this PR can be closed, right? |
|
I confirm it can be closed since it was superseded by #27723. |
Refer Issue #27090
currently a draft, would love to take any help.