FIX use specific threshold to discard eigenvalues with 32 bits fp#18149
FIX use specific threshold to discard eigenvalues with 32 bits fp#18149rth merged 3 commits intoscikit-learn:masterfrom
Conversation
…heck_psd_eigenvalues`. Small positive eigenvalues were not correctly discarded by `_check_psd_eigenvalues` for 32bit data. Fixes scikit-learn#18146
|
@glemaitre > ready for review |
|
That's was super fast :). I will check this in the afternoon. Thanks @smarie |
glemaitre
left a comment
There was a problem hiding this comment.
LGTM. ping @rth @adrinjalali @NicolasHug for a second quick review
|
You're welcome :) |
|
Hey @smarie! The Do you mind to have a look? |
|
@alfaro96 , the issue seems only to happen on mac os x but only for python >= 3.7. This is therefore probably related to a very low-level numerical computation behaviour in the numpy or linear algebra libraries, that is different in this versions.
From all of this we'll be able to determine if raising a little bit the |
I could not reproduce this issue in my machine (macOS-10.15.6-x86_64-i386-64bit). Nevertheless, I am experiencing the same issue here: #17921. So, I assume that this is "generalized" problem.
These are the arrays for the # 64 bit data
array([-2.75182923e-14 -2.82984727e-15 -2.32422419e-15 -2.28444794e-15,
-1.88368307e-15 -1.03866288e-15 -9.64184248e-16 -8.61787743e-16,
-8.00057562e-16 -5.87978712e-16 -3.30763427e-16 -2.54953209e-16,
-1.82340883e-16 -1.56598272e-16 9.53318764e-18 8.86546109e-17,
2.40790473e-16 2.74047675e-16 5.38711820e-16 8.08403061e-16,
1.06110990e-15 1.32942923e-15 1.53738341e-15 2.06144632e-15,
3.44642618e-15 3.58062038e-15 5.06066380e-14 9.09952230e-01,
1.30832231e+00 8.77817255e+01])
# 32 bit data
array([-5.4791326e-06 -3.4567588e-06 -1.8017618e-06 -1.0867386e-06
-4.6405938e-07 -3.6561633e-07 -3.5079950e-07 -1.8227604e-07
-1.6541932e-07 -1.1700362e-07 1.6083459e-07 2.1471459e-07
2.6229949e-07 2.7478683e-07 3.4371274e-07 3.7471702e-07
5.2492760e-07 6.4274491e-07 6.8415738e-07 7.0834585e-07
9.6557756e-07 1.1529023e-06 1.3964327e-06 1.4208520e-06
1.5312862e-06 3.3968322e-06 8.9305595e-06 9.0995252e-01
1.3083217e+00 8.7781647e+01])
|
|
ok so if I'm not mistaken, the ratio between 4th and 1st eigenvalue is Setting the threshold to (the new line should be |
You are right, the ratio is slightly above the
Changing
Do you mind to submit a PR to solve this issue? Thank you @smarie! |
…scussed in scikit-learn#18149. Updated docstring accordingly.
…kit-learn#18149) Co-authored-by: Sylvain MARIE <sylvain.marie@se.com>
Reference Issues/PRs
Fixes #18146
What does this implement/fix? Explain your changes.
Fixed 32/64bit consistency for
KernelPCAand other models using_check_psd_eigenvalues. Small positive eigenvalues were not correctly discarded by_check_psd_eigenvaluesfor 32bit data.Any other comments?
Note that all of these rules are just "rule of thumb" validated/confirmed by experience (although strongly influenced by the minimum value available in single and double precision) so it is very important to add this example (and other failing ones if we find any) in the test harness to ensure non-regression over time.
Reminder:
np.finfo('float32').eps=1.2e-07np.finfo('float64').eps=2.2e-16Another note: we could imagine to extend the proposed test to generate various toy dataset configurations, in order to be even more sure of the thresholds used.