Fixed issue with KernelPCA.inverse_transform mean#16655
Fixed issue with KernelPCA.inverse_transform mean#16655glemaitre merged 5 commits intoscikit-learn:masterfrom lrjball:kernel_pca_inverse
Conversation
Added the mean to the inverse_transform to fix the issue, and have added tests as well.
jnothman
left a comment
There was a problem hiding this comment.
Thanks
Please add an entry to the change log at doc/whats_new/v0.23.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.
Please also note versionchanged in the docstring of inverse_transform
| kp = KernelPCA(n_components=2, kernel=kernel, fit_inverse_transform=True) | ||
| X_trans = kp.fit_transform(X) | ||
| X_inv = kp.inverse_transform(X_trans) | ||
| assert np.isclose(X.mean(axis=0), X_inv.mean(axis=0)).all() |
There was a problem hiding this comment.
Should we be able to assert re the values, not just the means?
There was a problem hiding this comment.
Good point, I've updated the test now.
- Added entry in whats new v0.23 - Updated tests to check for closeness of full data set, not just closeness of the mean. - Updated the fix for this bug. Realised that it wasn't an issue with the mean not being added on, but instead that self.alpha was not being taken into account in the inverse transform.
|
I realized that the fix to the bug wasn't actually due to the mean not being added, but was due to the alpha not being handled properly in the inverse_transform. So I have updated the fix and have also added an entry in the change log. |
| :func:`decomposition.non_negative_factorization` now preserves float32 dtype. | ||
| :pr:`16280` by :user:`Jeremie du Boisberranger <jeremiedbb>`. | ||
|
|
||
| - |Fix| :class:`decomposition.KernelPCA` method ``inverse_transform`` now |
There was a problem hiding this comment.
Maybe could be "fixed .... in the case that data was not centred" would be more helpful to users
There was a problem hiding this comment.
So actually after doing some digging, it was still returning the wrong thing even for centered data. I just only noticed the bug in non-centered data because the mean of the inverse-transformed data was zero when the original data set was not centered.
For example, in 0.22.0 the following still does not work:
import numpy as np
from sklearn.datasets import make_blobs
from sklearn.decomposition import KernelPCA
X, _ = make_blobs(n_samples=100, centers=[[1, 1, 1, 1]], random_state=0)
X = X - X.mean(axis=0)
kp = KernelPCA(n_components=2, fit_inverse_transform=True)
X_trans = kp.fit_transform(X)
X_inv = kp.inverse_transform(X_trans)
assert np.isclose(X, X_inv).all()
So this PR fixes the inverse_transform function for all X.
However, I can still update the message if there is a better way of phrasing this change.
glemaitre
left a comment
There was a problem hiding this comment.
It seems correct. Just a small change in the test to make an assert on the numpy array.
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
|
@glemaitre, does this have your approval? |
|
@lrjball Thanks for the fix |
|
For me, it seems that this PR broke the code. Could you check #18902? |
Though it is concluded here that the alpha is handled correctly, it is indeed handled correctly in the original implementation. The problem was that the mean was not added to the data when the kernel is linear. When the linear kernel is used, the information of the mean is completely lost by the centering of the kernel, while when non-linear kernel is used, the information of the mean is partially lost. |
|
@jnothman @glemaitre |
Currently
KernelPCA.inverse_transformreturns a data set with zero mean, even if the original data did not have zero mean.I believe that this PR fixes that issue, so that the mean of the inverse-transformed data set is the same as the mean of the original data set.
I've also added a test for this update.
Reference Issues/PRs
Fixes #16654