[MRG+1] TST cover sparse matrix case for passing through NaN in transformer#11012
[MRG+1] TST cover sparse matrix case for passing through NaN in transformer#11012jnothman merged 6 commits intoscikit-learn:masterfrom
Conversation
| [QuantileTransformer(n_quantiles=10, random_state=42)] | ||
| ) | ||
| @pytest.mark.parametrize( | ||
| "func_sparse_format", |
| @pytest.mark.parametrize( | ||
| "func_sparse_format", | ||
| [sparse.csr_matrix, | ||
| sparse.csc_matrix] |
There was a problem hiding this comment.
This should include all sparse array formats IMO (see e.g.
)| for i in range(X.shape[1]): | ||
| # train only on non-NaN | ||
| col_train_sparse = func_sparse_format( | ||
| X_train[:, [i]][~np.isnan(X_train[:, i])]) |
There was a problem hiding this comment.
Maybe define this in a function,
def _get_valid_samples_by_column(X, i):
"""Get non NaN samples in column i of X"""
return X_train[:, [i]][~np.isnan(X_train[:, i])])and use here and in the test_missing_value_handling_dense function (twice)
|
@jnothman We forgot to include the sparse case in the common test. |
|
LGTM |
jnothman
left a comment
There was a problem hiding this comment.
This is quite rigorous, but if all our estimators supporting sparse matrices also support dense ones, I don't understand why we are not just checking that for X_train and X_test containing some NaN, the sparse and dense transforms are equivalent.
jnothman
left a comment
There was a problem hiding this comment.
I.e. there seems to be redundant testing of invariants here, when the only invariant we really care about is between sparse and dense (in the case that NaN occurs).
Good point. I thought that we needed to dissociate but actually this is only a single line to test and we can parametrize if the transformer support sparse matrices. I will change it accordingly. |
aeacc10 to
47ed0bf
Compare
47ed0bf to
fc4042c
Compare
|
@jnothman Made the changes. |
| Xt_sparse = (est_sparse.fit(sparse_constructor(X_train)) | ||
| .transform(sparse_constructor(X_test))) | ||
| assert_allclose(Xt_dense, Xt_sparse.A) | ||
| # check that inverse transform lead to the input data |
There was a problem hiding this comment.
This is surely not always the case. again we should be testing consistency with the dense case and no more
jnothman
left a comment
There was a problem hiding this comment.
I didn't mean this shouldn't test inverse_transform, but only test it for consistency with dense. But we should perhaps merge and fix it up
| est_sparse = clone(est) | ||
|
|
||
| Xt_dense = est_dense.fit(X_train).transform(X_test) | ||
| Xt_inv_dense = est_dense.transform(Xt_dense) |
There was a problem hiding this comment.
should this be inverse_transform?
| Xt_sparse = (est_sparse.fit(sparse_constructor(X_train)) | ||
| .transform(sparse_constructor(X_test))) | ||
| assert_allclose(Xt_sparse.A, Xt_dense) | ||
| Xt_inv_sparse = est_sparse.transform(Xt_sparse) |
There was a problem hiding this comment.
should this be inverse_transform?
|
thanks stupid mistake |
We previously make a common test for testing consistency of transformers letting pass NaN values.
However, we did not have a common test using sparse inputs.
This PR is covering this case.