Skip to content

KernelPCA: fit_transform and transform methods are inconsistent in case of zero eigenvalues #12141

@smarie

Description

@smarie

In current implementation of KernelPCA, when there are zero eigenvalues that are not removed (remove_zero_eig=False), then fit_transform and fit + transform methods lead to inconsistent results.

  • when fit_transform is run, there is an optimization code that does not recompute the gram matrix. Therefore the transformed X is X_transformed = self.alphas_ * np.sqrt(self.lambdas_) here
  • when transform is run, of course it cannot use the same shortcut and therefore the gram matrix is recomputed and a dot product is performed here. Because the eigenvectors self.alphas_ are not saved in a scaled version (I do not know why), they are scaled just here, by dividing by the square root of the eigenvalues np.sqrt(self.lambdas_). When eigenvalues are zero, this leads to infinite values in the eigenvectors which after the dot product may result in infinite values or nan values.

To fix this issue, I guess that we should not scale the eigenvectors when the eigenvalue is zero.

There are two ways to do this:

  • either do this in the transform method for example by
def transform(self, X):
        """Transform X.

        Parameters
        ----------
        X : array-like, shape (n_samples, n_features)

        Returns
        -------
        X_new : array-like, shape (n_samples, n_components)
        """
        check_is_fitted(self, 'X_fit_')

        # Compute centered gram matrix between X and training data X_fit_
        K = self._centerer.transform(self._get_kernel(X, self.X_fit_))

        # scale eigenvectors
        scaled_alphas = self.alphas_ / np.sqrt(self.lambdas_)

        # properly take null-space into account for the dot product
        scaled_alphas[:, self.lambdas_ == 0] = 0

        # Project by doing a scalar product between K and the scaled eigenvects
        return np.dot(K, scaled_alphas)
  • or we could scale the self.alphas_ directly when they are created (at the end of _fit). In which case we would need to adapt fit_transform and transform.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions