FIX ensure to return 1-D array with single target in PLS* and CCA#20355
Conversation
NicolasHug
left a comment
There was a problem hiding this comment.
thanks for the PR @polocorona , this looks good but instead of changing the imputer, I think we should be updating the PLSRegression estimator as suggested in #19352 (comment)
|
@NicolasHug Done. Let me know if I understood your request correctly. 😄 |
NicolasHug
left a comment
There was a problem hiding this comment.
Thanks @polocorona , this looks great!
I made some minor comments, and we could also add a test in test_pls.py to make sure that ndim == 1 for single-target regression.
Pinging @ogrisel @glemaitre @jjerphan as you reviewed #19409, I think this one is more correct.
|
BTW this doesn't affect just PLSRegression but all the estimators that inherit from _PLS so we should test and mention these ones as well |
There was a problem hiding this comment.
Thanks a lot @polocorona ! This LGTM with some minor comments
jjerphan
left a comment
There was a problem hiding this comment.
Thanks @polocorona for your contribution!
I just have one comment before approving.
|
@NicolasHug should I wait for another review for the merge? |
|
@polocorona: yes, two core-devs must give their approvals. |
ogrisel
left a comment
There was a problem hiding this comment.
Although I am not sure many people use PLS and CCA with 1d targets, I am worried about backward compat. See below:
| X_test = np.random.randn(10, 3) | ||
| Y_pred = pls.predict(X_test) | ||
|
|
||
| assert Y_pred.shape == (10,) |
There was a problem hiding this comment.
I am worried this is a breaking change for users who already have code running with 1d Y shaped as (n_samples, 1). Would it work for the imputation meta estimator if we made the estimator record the shape of Y at fit time to reuse it at predict time?
X = np.random.randn(10, 3)
Y = np.random.randn(10, 1)
pls = Estimator(n_components=Y.shape[1])
pls.fit(X, Y)
X_test = np.random.randn(10, 3)
Y_pred = pls.predict(X_test)
assert Y_pred.shape == (10, 1)
pls = Estimator(n_components=Y.shape[1])
pls.fit(X, np.ravel(Y))
X_test = np.random.randn(10, 3)
Y_pred = pls.predict(X_test)
assert Y_pred.shape == (10,)There was a problem hiding this comment.
I guess that'd be fine.
We might not need to explicitly record y.shape as it should be embedded already in self.y_weights_ and self.y_loadings_
But for what it's worth, we're not consistent at all w.r.t. the output shape when passing y with y.shape == (n_samples, 1). Here are a few estimators:
Estimator , output shape , 'supports multioutput'
HistGradientBoostingClassifier, (100,) , False
HistGradientBoostingRegressor , (100,) , False
RandomForestClassifier , (100,) , True
RandomForestRegressor , (100,) , True
LogisticRegression , (100,) , False
LinearRegression , (100, 1) , True
SGDClassifier , (100,) , False
SGDRegressor , (100,) , False
SVC , (100,) , False
PLSRegression , (100, 1) , True
Those that don't support multioutput will warn with "DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel()."
It's expected that these ones will output (n_samples,), but for example the RandomForest estimators support multioutput and still ravel the predictions, unlike PLSRegression and LinearRegression.
For ref, the our regressors checks ensure that y_pred.shape == y.shape and our classifiers checks ensure that y_pred.shape == (n_samples,), but because of the data that these checks use, neither of them can detect such discrepancies.
glemaitre
left a comment
There was a problem hiding this comment.
I think @NicolasHug is right. We should however open a new issue and make sure to correct the common test as well.
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
|
Uhm so apparently, we have some 1D array that is coming before the |
|
Hi @polocorona, thanks for your patience! Do you mind synchronizing with upstream to rerun the checks? Also please, could you move the changelog entry to 1.0.2? |
|
It is not straightforward indeed. It all depends if we agree to introduce a backwards-incompatible change. |
Reference Issues/PRs
Superseded #19409
Fixes #19352
Fixes #19409
What does this implement/fix? Explain your changes.
Fixes #19352 so IterativeImputer works with PLSRegression as estimator.
Any other comments?
test_impute.pycoverage with 97%