FIX Improve checking for string and multilabel inputs for classification metrics#33086
Conversation
| # all of length 3 | ||
| EXAMPLES = [ | ||
| (IND, np.array([[0, 1, 1], [1, 0, 0], [0, 0, 1]])), | ||
| (IND, np.array([[0, 1], [1, 0], [0, 0]])), |
There was a problem hiding this comment.
Changed because this test would now fail for the (IND, IND) combination (as the two IND have different sizes).
I checked and all other combinations involving IND would error (expected is None), thus I think it is safe to amend this.
There was a problem hiding this comment.
Now this example is basically the same as the next entry, so maybe we can remove this line?
| if len(set(isinstance(label, str) for label in ys_labels)) > 1: | ||
| raise ValueError("Mix of label input types (string and number)") | ||
| msg_details = "Got " + " ".join([f"{xp.unique(y)}" for y in ys]) | ||
| raise ValueError(f"Mix of label input types (string and number); {msg_details}") |
There was a problem hiding this comment.
We could raise a TypeError here, like what _union1d does, but I think it is better to keep as ValueError as its more of an input value problem as e.g., the predictions/y2 don't match true labels/y1. Also this makes this backwards compatible.
| ys_types = set(type_of_target(x) for x in ys) | ||
| if ys_types == {"binary", "multiclass"}: | ||
| ys_types = {"multiclass"} | ||
| if ys_types is None: |
There was a problem hiding this comment.
This was just to avoid a duplicate type_of_target (and thus unique) call (again, we cache, but this is only relevant for arrays with metadata attr) - but happy to remove as this is a public function.
There was a problem hiding this comment.
The idea is that passing in ys_types is an optimisation right? Is it really such a slow down/speed up to do this?
There was a problem hiding this comment.
Sorry I missed this. I did not do any benchmarking, and again since we do cache, it is only for specific array inputs where it would actually matter. Probably not a big difference overall, but it also did not seem like such a significant change to incorporate it...
|
Ping @ogrisel 🙏 |
ogrisel
left a comment
There was a problem hiding this comment.
I think there is a problem with unique_labels when array API dispatch is enabled (even with NumPy only inputs). See below:
|
Labeling this as array API because the remaining problems are array API support related. |
|
Thanks @ogrisel ! Tests added! I think this is also a good reminder to add array API tests even for small functions that we convert in PRs, and not rely on the fact that it's tested as part of another function. |
|
Gentle ping @ogrisel @lorentzenchr, I think this is ready for another review 🙏 |
ogrisel
left a comment
There was a problem hiding this comment.
Thanks @lucyleeow. Just minor feedback but otherwise LGTM.
|
Thanks @betatim ! I think this is ready for another look 🙏 thank you! |
|
Gentle ping @betatim |
| "input_kind", | ||
| ["array_of_str_objects", "array_of_fixed_width_strings", "list_of_str_objects"], |
There was a problem hiding this comment.
| "input_kind", | |
| ["array_of_str_objects", "array_of_fixed_width_strings", "list_of_str_objects"], | |
| "y_1", | |
| [ | |
| np.array(["spam"] * 3 + ["eggs"] * 2, dtype=object), # str object | |
| np.array(["spam"] * 3 + ["eggs"] * 2), # fixed width string | |
| ["spam"] * 3 + ["eggs"] * 2, | |
| ], |
Could save the if else logic inside.
| if y_type == "binary": | ||
| if unique_labels_.shape[0] > 2: | ||
| y_type = "multiclass" |
There was a problem hiding this comment.
Should we rename the values of y_type?
I find the logic strange. y_type was detected to be binary but it has more than 2 unique values. Then it is a false positive detection - or just a bad name.
There was a problem hiding this comment.
Looking at the old code and other instances where this type of code occurs e.g., label_binarize:
scikit-learn/sklearn/preprocessing/_label.py
Lines 598 to 607 in 78d1885
I think it would be the (rare) edge case where there are actually >2 classes (e.g., a, b, c) , and y_pred contains 2 classes (e.g., a b) and y_true contains 2 classes but a different 2 classes (e.g., a, c).
|
@lucyleeow Can I merge? |
|
@lorentzenchr yes thanks! |
Reference Issues/PRs
Fixes #33045
What does this implement/fix? Explain your changes.
Mixed string and number inputs do not error for the following thresholded classification metrics:
accuracy_scorehamming_losszero_one_lossmatthews_corrcoefconfusion_matrix- only whenlabelsis setwhen the mixed string and number inputs are:
np.array(['a','b'])Details
Mixed string and number inputs are dealt with via
_union1d(see #18192)scikit-learn/sklearn/metrics/_classification.py
Lines 154 to 165 in 6c084a8
However this only works for numpy arrays, where the string is specifically specified to be object type:
scikit-learn/sklearn/metrics/tests/test_common.py
Lines 1928 to 1933 in 0707e62
This test does not pass if the inputs are list or of dtype object is not forced.
np.union1dfirst concatenates the arrays and then performs unique on the result. If the concatenated array is of dtype object, unique givesTypeError, however if object is not forced to be object, e.g.:np.union1d(np.array([1,2]), np.array(['a','b']))the resulting concatenation is of type 'U' and no error is raised when unique is run. (I am not sure why this case was not considered initially, but I think it would not be unusual/rare for the input to be a string numpy array that is NOT of object type)
The other classification metrics do raise an error, due to a call to
unique_labels(see #33045 for details).This PR enforces:
for all thresholded metrics for all input types by running
unique_labelsinside_check_targets. This is an opinionated change so let me explain:_check_targetsis ONLY used within thresholded metrics in_classification.py- thus a non backwards compatible change is less of an issue.unique_labelsis also called,unique_labelsis always called after_check_targets_check_targetsandunique_labelsraise errors for similar problematic input combinations, except:unique_labelschecks label indicator inputs are of the same size (_check_targetsdoes not)unique_labels) will have this extra check, which I think is a fix._check_targetschecks target length consistency and does not allow continuous input or any multioutput target typesunique_labelsis used elsewhere in the code (e.g.,LinearDiscriminantAnalysis) - thus the changes made are backwards compatible and no changes in functionality was madeCons of this approach:
_check_targetsare more specific for classification metrics vs those inunique_labels. I addedunique_labelsright at the end of_check_targets, so all the current_check_targetswill be raised in priority. The only difference is that for mixed str and int inputs, the error message fromunique_labelsis raised instead, but I have tried to improve this message.Pros:
_union1dfrom_array_api.pyas this was the only place this function was usedytypes tounique_labelsreduces duplicatetype_of_targetcalls (and thus duplicateuniquecalls - I know we cache unique but this is only relevant for numpy arrays, not necessarily other array API supported arrays)I did also consider simply changing the logic in:
scikit-learn/sklearn/metrics/_classification.py
Lines 154 to 165 in 6c084a8
such that list and non object numpy inputs are also handled, but went with the above approach. Happy to take other opinions though.
AI usage disclosure
I used AI assistance for:
Any other comments?