ColumnTransformer input feature name and count validation#14544
ColumnTransformer input feature name and count validation#14544amueller merged 10 commits intoscikit-learn:masterfrom
Conversation
|
now you only need to adjust the tests ;) |
|
The failing test is @NicolasHug do you agree with this version? |
|
My general thought is that we should have gone for a stricter version of #14237 and just error for any discrepancy match, as a bug fix. But OK for doing a deprecation. I'm OK with this but we still need to error when |
|
I agree with all of that @NicolasHug. I lean a little more towards a deprecation. |
|
I added a test for the case where the indexing is negative, I'd appreciate a close review on that part. |
|
@amueller you fine with the changes? |
NicolasHug
left a comment
There was a problem hiding this comment.
a few comments, other than that LGTM
| err_msg = 'Number of features' | ||
| with pytest.raises(ValueError, match=err_msg): | ||
| ct.transform(X_array_fewer) | ||
| with pytest.warns(DeprecationWarning, match=msg): |
There was a problem hiding this comment.
Why warn when passing less features? I thought the only scenario we still accept is passing more features, not less?
This will be confusing for users since the message says it should not error until 0.24
| "for the data given during fit.") | ||
| with pytest.raises(ValueError, match=err_msg): | ||
| tf.transform(X_trans_df) | ||
| with pytest.warns(DeprecationWarning, match=warn_msg): |
There was a problem hiding this comment.
same here, why warn here since we error as well?
|
|
||
| tf = ColumnTransformer([('bycol', Trans(), [0, -1])]) | ||
| tf.fit(df_extra) | ||
| msg = "At least one negative column was used to" |
There was a problem hiding this comment.
can you also check that no warning is raised when n_features matches with negative indexing? Just in case.
|
@NicolasHug done :) |
NicolasHug
left a comment
There was a problem hiding this comment.
Thanks Adrin, LGTM if all goes green
Fixes #14251.
This raises a deprecation warning if the
Xat the time of fit and transform don't match in terms of feature names or count. It implements option (1) of #14251 (comment)