TST Extend tests for scipy.sparse.*array in test_pairwise.py#27288
TST Extend tests for scipy.sparse.*array in test_pairwise.py#27288glemaitre merged 6 commits intoscikit-learn:mainfrom
scipy.sparse.*array in test_pairwise.py#27288Conversation
The failing test from the CI ( |
I'm not maintainer but I think you can simply remove |
|
Thank you, @Charlie-XIAO I will try this. |
|
Hi @StefanieSenger, I think the ids are basically used for identification purposes, otherwise some defaults will be used depending on the data structure. For example with ids the results are like: test_pairwise.py::test_euclidean_distances_known_result[dense-dense] PASSED [ 11%]
test_pairwise.py::test_euclidean_distances_known_result[dense-sparse_matrix] PASSED [ 22%]
test_pairwise.py::test_euclidean_distances_known_result[sparse_matrix-dense] PASSED [ 44%]
test_pairwise.py::test_euclidean_distances_known_result[sparse_matrix-sparse_array] PASSED [ 66%]without ids the results are test_pairwise.py::test_euclidean_distances_known_result[array-array] PASSED [ 11%]
test_pairwise.py::test_euclidean_distances_known_result[array-csr_matrix] PASSED [ 22%]
test_pairwise.py::test_euclidean_distances_known_result[csr_matrix-array] PASSED [ 44%]
test_pairwise.py::test_euclidean_distances_known_result[csr_matrix-csr_array] PASSED [ 66%]So we can either remove these ids or conditionally set the ids based on the container length. |
|
Hi, @OmarManzoor, I see and to be on the secure side I did now set the ids conditionally based on whether the container contained more than one element. It seems that without passing the ids, these are named differently, but the combinations tested are the same. "dense", "sparse_matrix", "sparse_array" translates directly to , "array", "csr_matrix", "csr_array". |
OmarManzoor
left a comment
There was a problem hiding this comment.
Thanks @StefanieSenger. LGTM.
OmarManzoor
left a comment
There was a problem hiding this comment.
I think there are a number of other instances as well where we might needs to set ids conditionally. Since they all involve CSR_CONTAINERS we can simply have a variable at the top like
container_ids = (
["dense", "sparse_matrix"] if len(CSR_CONTAINERS) == 1
else ["dense", "sparse_matrix", "sparse_array"]
)and then use this in all the required places.
Or you can remove the ids if you prefer. Since I added the second reviewer tag, if another maintainer feels we need them, then they can be added.
Thank you, @OmarManzoor. I've taken a look into the pytests documentation. It seems that ids are strings, that pytest automatically assigns for identifying each test case, but we can also explicitly name them, if we want, for clarity or debugging. Since the ids are not further used in the code, I will omit them. |
|
@StefanieSenger You may want to look at this one: #27262 (comment) |
Ah, thank you, @Charlie-XIAO. I will keep them then. Do you know why? |
I think it's still for compatibility, for instance, one may have a script that selects tests to run based on names. But not sure, maybe @glemaitre? |
|
The |
|
It's easier to trace back failing tests then, I see. Thank you @glemaitre . |
OmarManzoor
left a comment
There was a problem hiding this comment.
LGTM. Thanks @StefanieSenger
glemaitre
left a comment
There was a problem hiding this comment.
Otherwise LGTM for the other tests.
| # currently only supported for Euclidean, L1 and cosine. | ||
| X_sparse = csr_matrix(X) | ||
| Y_sparse = csr_matrix(Y) | ||
| X_sparse = csr_container(X) |
There was a problem hiding this comment.
Could we isolate the code that test with the sparse matrices.
It would avoid to repeat the tests for the dense case multiple times.
There was a problem hiding this comment.
@glemaitre, do you mean to turn this test into two separate tests?
There was a problem hiding this comment.
Okay, I will try that. :)
Edit: It's done. :)
| Y_sparse = csr_container(Y) | ||
|
|
||
| S = pairwise_distances(X_sparse, Y_sparse, metric="euclidean") | ||
| S2 = euclidean_distances(X_sparse, Y_sparse) |
There was a problem hiding this comment.
In l. 168-169, I see the following:
S = pairwise_distances(X_sparse, Y_sparse.tocsc(), metric="manhattan")
S2 = manhattan_distances(X_sparse.tobsr(), Y_sparse.tocoo())It means that somehow, we expect to try with other containers as well, BSR, COO, and CSC.
I assume that we just decorate the new test by parametrizing for all combinations. I don't think that we need to mix the type of X and y, but try the same type for both.
There was a problem hiding this comment.
Okay, I think I know what you mean here.
glemaitre
left a comment
There was a problem hiding this comment.
LGTM. Thanks @StefanieSenger
Reference Issues/PRs
Towards #27090.
What does this implement/fix? Explain your changes.
This PR substitutes scipy sparse matrices with the scipy containers introduced in #27095 in the
sklearn/metrics/tests/test_pairwise.pytest file.