Skip to content

Bug in LedoitWolf Shrinkage #6195

@GaelVaroquaux

Description

@GaelVaroquaux

The estimate of the shrinkage in the Ledoit is pretty broken:

import numpy as np
from sklearn import covariance
np.random.seed(42)
signals = np.random.random(size=(75, 4))
print(covariance.ledoit_wolf(signals))

This outputs:

(array([[ 0.08626827,  0.        , -0.        , -0.        ],
       [ 0.        ,  0.08626827,  0.        ,  0.        ],
       [-0.        ,  0.        ,  0.08626827, -0.        ],
       [-0.        ,  0.        , -0.        ,  0.08626827]]), 1.0)

In other words, the estimator has deduced that their should be a shrinkage of 1: it's taking something proportional to the identity.

That shrinkage is given by "m_n" in lemma 3.2 of "A well-conditioned estimator for large-dimensional covariance matrices", Olivier Ledoit and Michael Wolf: "m_n = <S_n, I_n>" where "<.,.>" is the canonical matrix inner product, I_n is the identity, and S_n the data scatter matrix. As can be seen from this equation, m_n == 1 is possible only if the scatter matrix is 1. Hence this result is false. Not that I believed it at all.

I know where the bug is (n_splits == 0). I just need to find a robust test so that these things don't happen again.

This is quite bad: we have had a broken Ledoit Wolf for a few releases :(. Ledoit Wolf is the most useful covariance estimator.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions