FIX Fixes test_scale_and_stability#18746
Conversation
ogrisel
left a comment
There was a problem hiding this comment.
Re-tracting my previous +1 review because now I realize that I has mis-understood the code in my first review. I will need more time to understand what's going on and what kind of fix would make sense.
I think it would help if the failing test could be split/decomposed or parametrized to show for which kind of data the problem is happening.
|
As background, this issue appears for The issue stems from the following dot product having a different result depending on the platform+ numpy version On windows one of the iterations get Edit There are small differences in the output of |
|
Is this just a matter of setting (CCA is notoriously unstable especially for poorly conditions matrices. When |
thomasjpfan
left a comment
There was a problem hiding this comment.
Updated this PR with by adjusting the cond of pinv2.
NicolasHug
left a comment
There was a problem hiding this comment.
LGTM then, thanks @thomasjpfan
This might need a whats new entry since this may induce a change in results in some cases.
ogrisel
left a comment
There was a problem hiding this comment.
The solution looks fine with me but I would be more confident if you could update the test_scale_and_stability to split it into 3 tests: test_scale_and_stability_linerud, test_scale_and_stability_non_regression_2821 and test_scale_and_stability_non_regression_7819 so that we know instantly which dataset is causing the instability shall we observe a new failure in the future.
Also we could use sklearn.processing.scale instead of manually scaling the input datasets.
I don't have a Windows machine handy so I cannot quickly check if the changes I suggest would still cause the failure to happen in master (scaling with a different ddof in particular).
|
Or alternatively, if you understand the cause of the problem well enough, could you implement a new test that would trigger a similar problem on any platform (not just windows + whatever BLAS is used on the failing CI) by making the instability even stronger? |
| predictions for `est.transform(Y)` when the training data is single-target. | ||
| :pr:`17095` by `Nicolas Hug`_. | ||
|
|
||
| - |Fix| Increases the stability of :class:`cross_decomposition.CCA` :pr:`18746` |
There was a problem hiding this comment.
specify that it's only for poorly conditioned matrices? So that not all users expect to have changes in the results
There was a problem hiding this comment.
The issue comes from:
when pinv2 is called. On windows and the pypi version of numpy the singular values for Xk are:
[0.7, 0.14, 9e-16]which results in a rank of 3.
On other platforms, where the original test passes, the singular values are
[0.7, 0.14, 1.4e-17]which results in a rank of 2. This is by designed as stated reference paper on page 12, where the rank should decrease when Xk gets updated.
Xk is updated here:
Given, this I do not think its for poorly conditioned matrices. It is our usage of pinv2 that was flaky.
There was a problem hiding this comment.
when pinv2 is called. On windows and the pypi version of numpy the singular values for Xk are:
After which iteration? If this happens at the first iter, then this is indeed a problem of condition number I believe
There was a problem hiding this comment.
(by first iteration I mean the first component, or equivalently the first call to _get_first_singular_vectors_power_method)
There was a problem hiding this comment.
After which iteration? If this happens at the first iter, then this is indeed a problem of condition number I believe
This happened on the second call to _get_first_singular_vectors_power_method.
There was a problem hiding this comment.
I think it would still be valuable to identify in which cases there's a stability improvement. Clearly, not all users will be impacted by this
There was a problem hiding this comment.
I think this happens randomly. I updated test_scale_and_stability to generate random ys and found that there were seeds that fail on my machine and on windows.
From a user point of view, for some datasets they may have gotten an incorrect model.
There was a problem hiding this comment.
Both random y and X? Because otherwise the problem might just be coming from X.
I still believe this is related to poorly conditioned matrices because what you pass to _get_first_singular_vectors_power_method at iteration i are the deflated matrices from iteration i - 1 (deflation = subtracting with a rank-1 matrix to obtain a (r - 1) rank matrix)
I'm quite certain that the name of the parameter to pinv (cond or rcond) is related to the condition number
There was a problem hiding this comment.
Both random y and X? Because otherwise the problem might just be coming from X.
Updated with randomly generated X and y. The condition number will alway gets much bigger when the matrix becomes a r - 1 rank matrix.
When the rank becomes r - 1, pinv2 by default is having trouble determining the rank of the matrix:
Setting cond= 10 * eps helps in this case.
There was a problem hiding this comment.
Specificly when pinv2 thinks a r-1 matrix is rank r, it divides by a really small singular value when computing the inverse.
|
This is ready to be merged, right? @glemaitre? Thanks! |
There was a problem hiding this comment.
The explanation @thomasjpfan provided above looks good to me and empirically backed by the fact that the updated test can fail with different random seeds even on linux 64 bit.
I am wondering: shall we try to remove the _UnstableArchMixin from the CCA mro?
| predictions for `est.transform(Y)` when the training data is single-target. | ||
| :pr:`17095` by `Nicolas Hug`_. | ||
|
|
||
| - |Fix| Increases the stability of :class:`cross_decomposition.CCA` :pr:`18746` |
There was a problem hiding this comment.
I think this is not just for CCA but also for related PLSCanonical model, isn't it? PLSSVD uses a different fit method.
PLSCanonical uses mode="A" which does not rely on those pinv2 calls. So the what's new entry is correct.
|
Let me try to push this and trigger a full build of all the wheels on all the architectures (32 bit and 64 bit windows and linux). |
…st_scale_and_stability_round_2
|
Both the 32 bit Linux and Windows builds all passed without the I will also launch the arm64 test just to be sure. Unfortunately we don't have any PPC CI configuration but I am pretty sure that this fix would also solve the numerical stability problem we observed for those as well. |
|
It also passed in arm64. Waiting for the final Azure builds to complete and then I plan to merge, unless @NicolasHug or @thomasjpfan have an objection. |
|
Merged! Thanks @thomasjpfan for tracking down the cause of this issue. I am really great to have a numerically stable CCA in scikit-learn after all those years. |
Reference Issues/PRs
Fixes #18613
Fixes #6279
Simliar to #13903
What does this implement/fix? Explain your changes.
Mapping small weights to zero helps with stabilization.
Any other comments?
This can error can be reproduced on windows with the latest numpy + scipy installed from pypi and python 3.8.