Skip to content

MAINT make sure to test encoders in common tests#26859

Merged
thomasjpfan merged 4 commits intoscikit-learn:mainfrom
glemaitre:encoder_common_test
Jul 24, 2023
Merged

MAINT make sure to test encoders in common tests#26859
thomasjpfan merged 4 commits intoscikit-learn:mainfrom
glemaitre:encoder_common_test

Conversation

@glemaitre
Copy link
Copy Markdown
Member

Just found out that our encoders are not tested via the common test.
I added the necessary tags and solve a couple of issues.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jul 19, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: de845c3. Link to the linter CI: here

estimator.transform(X)

with raises(Exception, match="Unknown label type", may_pass=True):
with raises(Exception, match="(?i)unknown", may_pass=True):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment here maybe?

if "string" not in tags["X_types"]:
X[0, 0] = {"foo": "bar"}
msg = "argument must be a string.* number"
msg = "string.* number"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here as well, we're basically loosening up the tests here, I rather fix the estimator.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I revert back with a slight change in the match. I added a comment to inform from where the error is raised.

Comment on lines +3546 to +3547
# TargetEncoder accepts binary and continuous values. We therefore force
# y to be binary also.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can that be through an estimator tag rather than hard coding here? making encoders pass common tests shouldn't be fixing common tests to accept encoders lol

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, but this is a really weird thing: a transformer that uses y and can be classification and regression but only binary classification. This is just a beast.

However, I special case with another comment because we should be dropping this issue when merging support for multiclass: #26674

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think I should do this in #26674?

Copy link
Copy Markdown
Member Author

@glemaitre glemaitre Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#26674 should modify the code such that the test is passing. I assume I can do the trick of @thomasjpfan using binary_only and you will remove it in your PR.

Copy link
Copy Markdown
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My issue with this PR as it stands is that it's another example of how we're not third party developer friendly. Our tags are not sufficient for our own estimator, and I think we should be fixing the tags instead of special casing. It's not wild to imagine an estimator which only handles regression and binary-only classification.

Comment on lines +3550 to +3554
if estimator.__class__.__name__ == "TargetEncoder":
# TargetEncoder is a special case where a transformer uses `y` but only accept
# binary classification and regression targets.
# TODO: remove this special case when multiclass support is added to
# TargetEncoder.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the issue is our tags are not sufficient. We should have one tag which would be output, and can support a list of values, and in this case it would be regression and binary.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The short term solution is to give TargetEncoder the "binary_only" tag which is true for classification and good enough to run the common test.

Copy link
Copy Markdown
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

Comment on lines +3550 to +3554
if estimator.__class__.__name__ == "TargetEncoder":
# TargetEncoder is a special case where a transformer uses `y` but only accept
# binary classification and regression targets.
# TODO: remove this special case when multiclass support is added to
# TargetEncoder.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The short term solution is to give TargetEncoder the "binary_only" tag which is true for classification and good enough to run the common test.

Comment on lines -345 to -346
def _more_tags(self):
return {"requires_y": True}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this tag be removed? TargetEncoder does require_y in fit and fit_transform.

@adrinjalali
Copy link
Copy Markdown
Member

Should we also make sure they're actually tested? 😁 I can't find where the change is to make the tests go from not-tested or xfail to tested-now.

@glemaitre
Copy link
Copy Markdown
Member Author

Should we also make sure they're actually tested? grin I can't find where the change is to make the tests go from not-tested or xfail to tested-now.

if 2darray not in X_types then nothing is tested :).

Copy link
Copy Markdown
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomasjpfan thomasjpfan merged commit d991a19 into scikit-learn:main Jul 24, 2023
punndcoder28 pushed a commit to punndcoder28/scikit-learn that referenced this pull request Jul 29, 2023
glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Sep 18, 2023
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants