check_decision_proba_consistency fails with LinearDiscriminantAnalysis

The following common check recently started to fail on a Windows CI job:

```python-traceback
name = 'LinearDiscriminantAnalysis'
estimator_orig = LinearDiscriminantAnalysis()

    @ignore_warnings(category=FutureWarning)
    def check_decision_proba_consistency(name, estimator_orig):
        # Check whether an estimator having both decision_function and
        # predict_proba methods has outputs with perfect rank correlation.
    
        centers = [(2, 2), (4, 4)]
        X, y = make_blobs(n_samples=100, random_state=0, n_features=4,
                          centers=centers, cluster_std=1.0, shuffle=True)
        X_test = np.random.randn(20, 2) + 4
        estimator = clone(estimator_orig)
    
        if (hasattr(estimator, "decision_function") and
                hasattr(estimator, "predict_proba")):
    
            estimator.fit(X, y)
            # Since the link function from decision_function() to predict_proba()
            # is sometimes not precise enough (typically expit), we round to the
            # 10th decimal to avoid numerical issues.
            a = estimator.predict_proba(X_test)[:, 1].round(decimals=10)
            b = estimator.decision_function(X_test).round(decimals=10)
>           assert_array_equal(rankdata(a), rankdata(b))
E           AssertionError: 
E           Arrays are not equal
E           
E           Mismatched elements: 2 / 20 (10%)
E           Max absolute difference: 0.5
E           Max relative difference: 0.02631579
E            x: array([ 7. ,  8. , 11. ,  9. , 17. , 10. ,  5. , 14. ,  6. ,  1. , 19.5,
E                   4. ,  2. , 16. , 12. , 13. ,  3. , 15. , 19.5, 18. ])
E            y: array([ 7.,  8., 11.,  9., 17., 10.,  5., 14.,  6.,  1., 20.,  4.,  2.,
E                  16., 12., 13.,  3., 15., 19., 18.])
```

This happened on this PR which should not have any impact on the behavior `LinearDiscriminantAnalysis.predict_proba`: #17743.

https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=25474&view=logs&j=d32b16b6-cb9d-571b-e765-de83708fb8dd&t=b93f76c1-c2c9-579e-c2ec-c4f438af1261


I suspect the test to be too brittle. Maybe using a test set more related to the original distribution (blobs) would avoid ties or caused by arbitrary rounding?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

check_decision_proba_consistency fails with LinearDiscriminantAnalysis #19224

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

check_decision_proba_consistency fails with LinearDiscriminantAnalysis #19224

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions