-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
Description
The estimate of the shrinkage in the Ledoit is pretty broken:
import numpy as np from sklearn import covariance np.random.seed(42) signals = np.random.random(size=(75, 4)) print(covariance.ledoit_wolf(signals))
This outputs:
(array([[ 0.08626827, 0. , -0. , -0. ],
[ 0. , 0.08626827, 0. , 0. ],
[-0. , 0. , 0.08626827, -0. ],
[-0. , 0. , -0. , 0.08626827]]), 1.0)
In other words, the estimator has deduced that their should be a shrinkage of 1: it's taking something proportional to the identity.
That shrinkage is given by "m_n" in lemma 3.2 of "A well-conditioned estimator for large-dimensional covariance matrices", Olivier Ledoit and Michael Wolf: "m_n = <S_n, I_n>" where "<.,.>" is the canonical matrix inner product, I_n is the identity, and S_n the data scatter matrix. As can be seen from this equation, m_n == 1 is possible only if the scatter matrix is 1. Hence this result is false. Not that I believed it at all.
I know where the bug is (n_splits == 0). I just need to find a robust test so that these things don't happen again.
This is quite bad: we have had a broken Ledoit Wolf for a few releases :(. Ledoit Wolf is the most useful covariance estimator.