[MRG+1] TST cover sparse matrix case for passing through NaN in transformer by glemaitre · Pull Request #11012 · scikit-learn/scikit-learn

glemaitre · 2018-04-22T12:02:10Z

We previously make a common test for testing consistency of transformers letting pass NaN values.
However, we did not have a common test using sparse inputs.

This PR is covering this case.

rth · 2018-04-22T21:20:34Z

sklearn/preprocessing/tests/test_common.py

+    [QuantileTransformer(n_quantiles=10, random_state=42)]
+)
+@pytest.mark.parametrize(
+    "func_sparse_format",


maybe sparse_constructor ?

rth · 2018-04-22T21:20:37Z

sklearn/preprocessing/tests/test_common.py

+@pytest.mark.parametrize(
+    "func_sparse_format",
+    [sparse.csr_matrix,
+     sparse.csc_matrix]


This should include all sparse array formats IMO (see e.g.

scikit-learn/sklearn/utils/estimator_checks.py

Line 440 in 7124d87

for sparse_format in ['csr', 'csc', 'dok', 'lil', 'coo', 'dia', 'bsr']:

)

rth · 2018-04-22T21:27:19Z

sklearn/preprocessing/tests/test_common.py

+    for i in range(X.shape[1]):
+        # train only on non-NaN
+        col_train_sparse = func_sparse_format(
+            X_train[:, [i]][~np.isnan(X_train[:, i])])


Maybe define this in a function,

def _get_valid_samples_by_column(X, i): """Get non NaN samples in column i of X""" return X_train[:, [i]][~np.isnan(X_train[:, i])])

and use here and in the test_missing_value_handling_dense function (twice)

glemaitre · 2018-04-23T15:20:31Z

@jnothman We forgot to include the sparse case in the common test.
If you can look at it, that would be great.

rth · 2018-04-23T21:48:40Z

LGTM

jnothman

This is quite rigorous, but if all our estimators supporting sparse matrices also support dense ones, I don't understand why we are not just checking that for X_train and X_test containing some NaN, the sparse and dense transforms are equivalent.

jnothman

I.e. there seems to be redundant testing of invariants here, when the only invariant we really care about is between sparse and dense (in the case that NaN occurs).

glemaitre · 2018-04-24T11:33:46Z

This is quite rigorous, but if all our estimators supporting sparse matrices also support dense ones, I don't understand why we are not just checking that for X_train and X_test containing some NaN, the sparse and dense transforms are equivalent.

Good point. I thought that we needed to dissociate but actually this is only a single line to test and we can parametrize if the transformer support sparse matrices. I will change it accordingly.

glemaitre · 2018-04-24T12:30:01Z

@jnothman Made the changes.

jnothman · 2018-04-25T05:38:59Z

sklearn/preprocessing/tests/test_common.py

+            Xt_sparse = (est_sparse.fit(sparse_constructor(X_train))
+                         .transform(sparse_constructor(X_test)))
+            assert_allclose(Xt_dense, Xt_sparse.A)
+            # check that inverse transform lead to the input data


This is surely not always the case. again we should be testing consistency with the dense case and no more

jnothman

I didn't mean this shouldn't test inverse_transform, but only test it for consistency with dense. But we should perhaps merge and fix it up

jnothman · 2018-04-26T14:29:27Z

sklearn/preprocessing/tests/test_common.py

+        est_sparse = clone(est)
+
+        Xt_dense = est_dense.fit(X_train).transform(X_test)
+        Xt_inv_dense = est_dense.transform(Xt_dense)


should this be inverse_transform?

jnothman · 2018-04-26T14:29:40Z

sklearn/preprocessing/tests/test_common.py

+            Xt_sparse = (est_sparse.fit(sparse_constructor(X_train))
+                         .transform(sparse_constructor(X_test)))
+            assert_allclose(Xt_sparse.A, Xt_dense)
+            Xt_inv_sparse = est_sparse.transform(Xt_sparse)


should this be inverse_transform?

glemaitre · 2018-04-26T16:21:20Z

thanks stupid mistake

TST cover sparse case for passing through NaN

6c49cee

rth reviewed Apr 22, 2018

View reviewed changes

rth mentioned this pull request Apr 23, 2018

[MRG] Ignore and pass-through NaN values in MaxAbsScaler and maxabs_scale #11011

Merged

glemaitre changed the title ~~[WIP] TST cover sparse matrix case for passing through NaN in transformer~~ [MRG] TST cover sparse matrix case for passing through NaN in transformer Apr 23, 2018

rth changed the title ~~[MRG] TST cover sparse matrix case for passing through NaN in transformer~~ [MRG+1] TST cover sparse matrix case for passing through NaN in transformer Apr 23, 2018

jnothman reviewed Apr 23, 2018

View reviewed changes

glemaitre force-pushed the fix_test_sparse_nan branch from aeacc10 to 47ed0bf Compare April 24, 2018 11:38

EHN factorize function and test all sparse format

fc4042c

glemaitre force-pushed the fix_test_sparse_nan branch from 47ed0bf to fc4042c Compare April 24, 2018 11:39

EHN address joel comments

75ba50e

jnothman reviewed Apr 25, 2018

View reviewed changes

FIX only check dense-sparse equivalence

0266236

jnothman reviewed Apr 26, 2018

View reviewed changes

FIX check invese equivalence dense sparse

3cb7f1d

jnothman reviewed Apr 26, 2018

View reviewed changes

FIX bug in inverse

814bbbc

jnothman merged commit 96a02f3 into scikit-learn:master Apr 26, 2018

Uh oh!

Conversation

glemaitre commented Apr 22, 2018

Uh oh!

rth Apr 22, 2018

Choose a reason for hiding this comment

Uh oh!

rth Apr 22, 2018

Choose a reason for hiding this comment

Uh oh!

rth Apr 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Apr 23, 2018

Uh oh!

rth commented Apr 23, 2018

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Apr 24, 2018

Uh oh!

glemaitre commented Apr 24, 2018

Uh oh!

jnothman Apr 25, 2018

Choose a reason for hiding this comment

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman Apr 26, 2018

Choose a reason for hiding this comment

Uh oh!

jnothman Apr 26, 2018

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Apr 26, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rth Apr 22, 2018 •

edited

Loading