FIX DOC MNT Big revamp of cross-decomposition module#17095
FIX DOC MNT Big revamp of cross-decomposition module#17095NicolasHug merged 33 commits intoscikit-learn:masterfrom
Conversation
…_PLS_qui_porte_si_bien_son_nom
…_PLS_qui_porte_si_bien_son_nom
…_PLS_qui_porte_si_bien_son_nom
|
CC @thomasjpfan @amueller you have time for reviews now, right? Welcome back!! :p |
Co-authored-by: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org>
…_PLS_qui_porte_si_bien_son_nom
…Hug/scikit-learn into la_PLS_qui_porte_si_bien_son_nom
…_PLS_qui_porte_si_bien_son_nom
| Classes included in this module are :class:`PLSRegression` | ||
| Apart from CCA, the PLS estimators are particularly suited when the matrix of | ||
| predictors has more variables than observations, and when there is | ||
| multicollinearity among the features. By contrast, standard linear regression |
There was a problem hiding this comment.
The fact that it fixes the multicollinearity issue while linear regression would fail unless regularization seems an authoritative argument here. Do we have a ref or example that demo this?
There was a problem hiding this comment.
When features are collinear, the covariance matrix is ill-defined and thus non-invertible.
I don't have a ref at hand, but I'm assuming this is common knowledge (could be wrong?)
There was a problem hiding this comment.
are you referring to this https://github.com/scikit-learn/scikit-learn/pull/17095/files#diff-df97917f68917d3a110df30940d771dfR176 ? is it a statement of PLS vs CCA rather than PLS vs linear regression. Maybe I am nitpicking here.
There was a problem hiding this comment.
It's mostly a statement about PLS vs non-regularized LR: LR is unstable when there is collinearity among features, PLS is not. However, CCA is unstable too (as described in the link you mentioned).
…_PLS_qui_porte_si_bien_son_nom
agramfort
left a comment
There was a problem hiding this comment.
thx @NicolasHug for clarifying
|
Thanks @TomDLT and @agramfort for the reviews! Let me push a what'snew entry, and I'll merge when green |
|
Docs look good, merging. |
|
@NicolasHug I have a question about the documentation. What exactly does the attribute x_loadings_ represent? The correlation, or the coefficients of the linear combinations used to transform X? (See also https://stackoverflow.com/questions/78725061/what-does-the-attribute-x-loadings-represent?noredirect=1#comment138800857_78725061) |
Closes #4122
Closes #8392
Probably Fixes #4469, though I can't reproduce.
Fixes #11645
Closes #13521
Closes #16177
This PR is a rework of the cross-decomposition module which had been left for dead for years. Mainly, docs and tests were added, and code was simplified. This hopefully comes with a few bug fixes.
This is a big PR, but it's a lot of docs. You might want to ignore the diff and just review the files from scratch, given the amount of changes. The good news is that there are docs now, so you're off to a much better start than I was.
Other stuff:
use 1d shapes for vectors instead of 2d shapes. Makes outer product more obvious.
for PLSSVD, CCA, PLSCanonical
n_componentsnow raises a FutureWarning if it's not in[1, min(n_samples, n_features, n_targets)]. This is only for backward compat, and an error will be raised in 2 versions:n_componentscannot be greater than the rank of the cross-covariance matrix = X.T.dot(Y) which is bounded as above. For PLSRegression, the rank is bounded by n_features. See comments in code.PLSSVD, CCA and PLSCanonical:
XandY. For PLSRegression, y_scores are different, so I didn't deprecate (But I doubt these are useful anyway)