MAINT make sure to test encoders in common tests by glemaitre · Pull Request #26859 · scikit-learn/scikit-learn

glemaitre · 2023-07-19T12:51:14Z

Just found out that our encoders are not tested via the common test.
I added the necessary tags and solve a couple of issues.

github-actions · 2023-07-19T12:53:07Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: de845c3. Link to the linter CI: here}

adrinjalali · 2023-07-19T21:22:58Z

sklearn/utils/estimator_checks.py

        estimator.transform(X)

-    with raises(Exception, match="Unknown label type", may_pass=True):
+    with raises(Exception, match="(?i)unknown", may_pass=True):


add a comment here maybe?

adrinjalali · 2023-07-19T21:23:20Z

sklearn/utils/estimator_checks.py

    if "string" not in tags["X_types"]:
        X[0, 0] = {"foo": "bar"}
-        msg = "argument must be a string.* number"
+        msg = "string.* number"


here as well, we're basically loosening up the tests here, I rather fix the estimator.

I revert back with a slight change in the match. I added a comment to inform from where the error is raised.

adrinjalali · 2023-07-19T21:24:09Z

sklearn/utils/estimator_checks.py

+    # TargetEncoder accepts binary and continuous values. We therefore force
+    # y to be binary also.


can that be through an estimator tag rather than hard coding here? making encoders pass common tests shouldn't be fixing common tests to accept encoders lol

Yep, but this is a really weird thing: a transformer that uses y and can be classification and regression but only binary classification. This is just a beast.

However, I special case with another comment because we should be dropping this issue when merging support for multiclass: #26674

Do you think I should do this in #26674?

#26674 should modify the code such that the test is passing. I assume I can do the trick of @thomasjpfan using binary_only and you will remove it in your PR.

adrinjalali

My issue with this PR as it stands is that it's another example of how we're not third party developer friendly. Our tags are not sufficient for our own estimator, and I think we should be fixing the tags instead of special casing. It's not wild to imagine an estimator which only handles regression and binary-only classification.

adrinjalali · 2023-07-20T15:21:15Z

sklearn/utils/estimator_checks.py

+    if estimator.__class__.__name__ == "TargetEncoder":
+        # TargetEncoder is a special case where a transformer uses `y` but only accept
+        # binary classification and regression targets.
+        # TODO: remove this special case when multiclass support is added to
+        # TargetEncoder.


so the issue is our tags are not sufficient. We should have one tag which would be output, and can support a list of values, and in this case it would be regression and binary.

The short term solution is to give TargetEncoder the "binary_only" tag which is true for classification and good enough to run the common test.

thomasjpfan

Thank you for the PR!

thomasjpfan · 2023-07-20T20:17:27Z

sklearn/utils/estimator_checks.py

+    if estimator.__class__.__name__ == "TargetEncoder":
+        # TargetEncoder is a special case where a transformer uses `y` but only accept
+        # binary classification and regression targets.
+        # TODO: remove this special case when multiclass support is added to
+        # TargetEncoder.


The short term solution is to give TargetEncoder the "binary_only" tag which is true for classification and good enough to run the common test.

thomasjpfan · 2023-07-20T20:19:22Z

sklearn/preprocessing/_target_encoder.py

-    def _more_tags(self):
-        return {"requires_y": True}


Should this tag be removed? TargetEncoder does require_y in fit and fit_transform.

adrinjalali · 2023-07-21T12:47:23Z

Should we also make sure they're actually tested? 😁 I can't find where the change is to make the tests go from not-tested or xfail to tested-now.

glemaitre · 2023-07-24T12:38:55Z

Should we also make sure they're actually tested? grin I can't find where the change is to make the tests go from not-tested or xfail to tested-now.

if 2darray not in X_types then nothing is tested :).

thomasjpfan

LGTM

MAINT make sure to test encoders in common tests

e8a9d58

github-actions bot added module:preprocessing module:utils labels Jul 19, 2023

adrinjalali reviewed Jul 19, 2023

View reviewed changes

glemaitre added 2 commits July 20, 2023 11:52

Adrin comments

b0dc4cb

iter

f8a0b19

adrinjalali reviewed Jul 20, 2023

View reviewed changes

thomasjpfan reviewed Jul 20, 2023

View reviewed changes

iter

de845c3

adrinjalali approved these changes Jul 24, 2023

View reviewed changes

thomasjpfan approved these changes Jul 24, 2023

View reviewed changes

thomasjpfan merged commit d991a19 into scikit-learn:main Jul 24, 2023

punndcoder28 pushed a commit to punndcoder28/scikit-learn that referenced this pull request Jul 29, 2023

MAINT make sure to test encoders in common tests (scikit-learn#26859)

159bc63

glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Sep 18, 2023

MAINT make sure to test encoders in common tests (scikit-learn#26859)

7b2fbc4

jeremiedbb pushed a commit that referenced this pull request Sep 20, 2023

MAINT make sure to test encoders in common tests (#26859)

ae941f4

REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023

MAINT make sure to test encoders in common tests (scikit-learn#26859)

8eccc86

		# TargetEncoder accepts binary and continuous values. We therefore force
		# y to be binary also.

Uh oh!

Conversation

glemaitre commented Jul 19, 2023

Uh oh!

github-actions bot commented Jul 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre Jul 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adrinjalali commented Jul 21, 2023

Uh oh!

glemaitre commented Jul 24, 2023

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Jul 19, 2023 •

edited

Loading

glemaitre Jul 21, 2023 •

edited

Loading