In a number of places in tests, numpy.testing.assert_allclose is used with default absolute tolerance parameter which is atol=0.
This means in particular that np.testing.assert_allclose(0, 1e-16) will fail. More context for the reasons behind this choice can be found in numpy/numpy#3183 (comment) and PEP485, which can be summed up with,
If, for a given use case, a user needs to compare to zero, the test will be guaranteed to fail the first time, and the user can select an appropriate value.
The issue is that occasionally, the tests will pass, and but then may fail on some other platform.
For instance, this test in estimator_checks.py passes CI on master, but then randomly fail for osx: [1], [2] correspond to the same commit, one for the PR, one in the forked repo (this test fails with the bellow message and not necessary for the same Python versions) ,
```
________________________ test_non_meta_estimators[1045] ________________________
args = ('GaussianProcess', GaussianProcess(beta0=None, corr='squared_exponential', normalize=True,
...orage_mode='full', theta0=0.1, thetaL=None, thetaU=None,
verbose=False))
[...]
> assert_allclose(y_pred.ravel(), y_pred_2d.ravel())
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0
E
E (mismatch 40.0%)
E x: array([ -1.089129e-13, 1.000000e+00, 2.000000e+00, 1.272316e-13,
E 1.000000e+00, 2.000000e+00, 5.051515e-14, 1.000000e+00,
E 2.000000e+00, 9.214851e-15])
E y: array([ -1.086908e-13, 1.000000e+00, 2.000000e+00, 1.155742e-13,
E 1.000000e+00, 2.000000e+00, 5.062617e-14, 1.000000e+00,
E 2.000000e+00, 9.325873e-15])
```
When atol is used, it not very consistent.
As to sklearn.utils.testing.assert_allclose_dense_sparse it has by default atol=1e-9 and not 0.
While the necessary absolute tolerance is test dependent, it might still be useful to
- have a default value (e.g. 1e-9) when it's needed (e.g.
DEFAULT_ATOL in sklearn.utils.testing) , except for the cases when it has to be increased for specific reasons.
- use
atol when it's definitely reasonable to do so (e.g. in probability equalities)
- make
sklearn.utils.testing.assert_allclose_dense_sparse and sklearn.utils.testing.assert_allclose have the same default atol.
- Check that we don't have floating point equalities with 0.0 even if CI tests passes ([1], [2], [3] ..)
This might help improving the numerical stability of tests and prevent some of the tests failures on less common platforms cc @yarikoptic
What do you think?
In a number of places in tests,
numpy.testing.assert_allcloseis used with default absolute tolerance parameter which isatol=0.This means in particular that
np.testing.assert_allclose(0, 1e-16)will fail. More context for the reasons behind this choice can be found in numpy/numpy#3183 (comment) and PEP485, which can be summed up with,The issue is that occasionally, the tests will pass, and but then may fail on some other platform.
For instance, this test in
estimator_checks.pypasses CI on master, but then randomly fail for osx: [1], [2] correspond to the same commit, one for the PR, one in the forked repo (this test fails with the bellow message and not necessary for the same Python versions) ,test_gradient_boosting_loss_functions.pycompares floats to 0 withatol=0but doesn't fail.When
atolis used, it not very consistent.As to
sklearn.utils.testing.assert_allclose_dense_sparseit has by defaultatol=1e-9and not 0.While the necessary absolute tolerance is test dependent, it might still be useful to
DEFAULT_ATOLinsklearn.utils.testing) , except for the cases when it has to be increased for specific reasons.atolwhen it's definitely reasonable to do so (e.g. in probability equalities)sklearn.utils.testing.assert_allclose_dense_sparseandsklearn.utils.testing.assert_allclosehave the same defaultatol.This might help improving the numerical stability of tests and prevent some of the tests failures on less common platforms cc @yarikoptic
What do you think?