Skip to content

cross_decomposition module needs work #4122

@nmayorov

Description

@nmayorov

Recently I've spent some time exploring cross_decomposition module (I haven't paid attention to CCA). I noticed several problems with it:

  1. The module needs much better narrative documentation. Now it's not clear what algorithms actually do, where they might be useful and how they are different.
  2. The layout of files and import approach are unusual for sklearn. The files are named with trailing underscores, __all__ is put into individual files, but not in __init__.py.
  3. The implementation is rather obscure (for short and conventional description of PLS 2 refer here).
    1. _nipals_twoblocks_inner_loop in fact computes both weights and scores, but only weights are returned, and scores are then recomputed.
    2. Parameter norm_y_weights crucially determines how the algorithm works (along with deflation_mode and mode), but does it in tricky way. It took me a lot of time to understand why implemented here PLSRegression is equivalent to PLS 2 algorithm described in most sources.
      Parameter norm_y_weights set to False for PLSRegression in order to make a regression coefficient $\hat{c}$ (see link above) between $t$ and $u$ equal to 1. It is done purely by convention and to compare with implementations in R (I suppose).
    3. Alternative _svd_cross_product solver is provided, but never used in code (only in tests). Also It can't be a substitution for _nipals_twoblocks_inner_loop with norm_y_weights=False.
  4. There are no required early stop checks. When X or Y matrix are deflated to zeros, iterations become invalid and assumed properties are no longer held. That causes the bug bug in PLSRegression() when one of the columns in X is constant #3932.
  5. Computed y_rotations_ is not correct for PLSRegression, i. e. y_scores_ != np.dot(Yc, y_rotations_). I think there is no way to compute these rotations in this case, because Y is deflated on x_scores
  6. Several obvious bugs such as modifying non-existing variable, etc.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions