Skip to content

Precision errors in LDA.explained_variance_ratio_ #6031

@hlin117

Description

@hlin117

This surfaces the discussion in #5216. Here are some precision differences than @JPFrancoia and I are looking at:

import numpy as np
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.utils.testing import assert_array_almost_equal

state = np.random.RandomState(0)
X = state.normal(loc=0, scale=100, size=(40, 20))
y = state.randint(0, 3, size=(40, 1))

# Train the LDA classifier. Use the eigen solver
lda_eigen = LDA(solver='eigen', n_components=5)
lda_eigen.fit(X, y)

lda_svd = LDA(n_components=5, solver='svd')
lda_svd.fit(X, y)
assert_array_almost_equal(lda_eigen.explained_variance_ratio_, lda_svd.explained_variance_ratio_)
AssertionError:
Arrays are not almost equal to 6 decimals

(shapes (20,), (3,) mismatch)
 x: array([  6.03795532e-01,   3.96204468e-01,   5.85621882e-16,
         3.18609950e-16,   2.08378911e-16,   1.21510637e-16,
         7.83079028e-17,   7.58612317e-17,   3.89040436e-17,...
 y: array([  5.52469269e-01,   4.47530731e-01,   7.08925911e-17])

Besides the fact that lda_eigen has the wrong number of components for the explained_variance_ratio_ (according to the docs, it should have only n_components=5), there are numeric differences as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugEasyWell-defined and straightforward way to resolve

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions