Skip to content

BUG?: PCA output changed in 1.5 #28826

@larsoner

Description

@larsoner

Because of changing u_based_decision=False in svd_flip here:

https://github.com/scikit-learn/scikit-learn/pull/27491/files#diff-b17877cd9b0663deb819cce9f4cc84533c4ca88ca0ebd2380f9c8fc5864acf26R646

The PCA sign flipping differs between sklearn 1.4 and 1.5.0.dev0, see this failing MNE-Python CI:

https://github.com/mne-tools/mne-python/actions/runs/8663512660/job/23757842032?pr=12362#step:17:4581

The short version is that we vendor the 1.4-and-older sklearn code for PCA when svd_solver="full" (the only case we use/care about) so we noticed the difference when it changed. I'm not sure it matters much in practice, but it seems bad that a PCA fit_transform done in 1.4 is different in 1.5 for the given options. I can replicate locally with this code:

import numpy as np
from sklearn.decomposition import PCA
from mne.utils.numerics import _PCA

n_components = 0.9999
n_samples, n_dim = 1000, 10
X = np.random.RandomState(0).randn(n_samples, n_dim)
X[:, -1] = np.mean(X[:, :-1], axis=-1)  # true X dim is ndim - 1
X_orig = X.copy()
pca_skl = PCA(n_components, whiten=False, svd_solver="full")
pca_mne = _PCA(n_components, whiten=False)
X_skl = pca_skl.fit_transform(X)
X_mne = pca_mne.fit_transform(X)
np.testing.assert_allclose(X_mne, X_skl)  # Fails!

But if in MNE I change our line to have svd_flip(..., u_based_decision=False):

https://github.com/mne-tools/mne-python/blob/bf74c045d5220682e6e229b95a6e406014c0c73a/mne/utils/numerics.py#L911

It "passes", indicating that this is indeed the difference.

Perhaps this isn't really a bug in the sense that signs are ambiguous in the SVD anyway but a note that these have changed in 1.5 would probably be worthwhile!

EDIT: Although the comment immediately preceding the changed line is # flip eigenvectors' sign to enforce deterministic output, maybe if the idea is for it to be deterministic across sklearn versions as much as possible then this change should be considered a bug? 🤷

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions