Mixed string/numeric when input is list for classification metrics

> [!WARNING]
> This is not a good first (or second) issue to contribute. If you are interested in contributing to scikit-learn please have a look at our [contributing doc](https://scikit-learn.org/dev/developers/contributing.html) and in particular the section [Issues for new contributors](https://scikit-learn.org/dev/developers/contributing.html#new-contributors). 

Noticed while working on #32755

Classification metrics (`CLASSIFICATION_METRICS` from `test_common.py`, they are all defined in `sklearn/metrics/_classification.py`, but not all metrics in `_classification.py` fall in this list) vary in how they handle mixed string and numeric inputs (e.g., `y_true` is string and `y_pred` is numeric).

* `precision_recall_fscore_support` and friends (`precision_score`, `recall_score`, `f1_score`, `fbeta_score` and related `jaccard_score`) - raises `ValueError`:
   * via `_check_set_wise_labels`, which calls [`unique_labels`](https://github.com/scikit-learn/scikit-learn/blob/66200f149ba6c64a3a93dd73e37bdeea87bb5db8/sklearn/utils/multiclass.py#L41), which does not allow "mix of string and integer labels“
   * note that these metrics do take a `pos_label` parameter, so theoretically when `y_true` is string, we could use `y_true == pos_label` to convert to [1,0]. This would not make sense if `y_pred` was string.
* `confusion_matrix` (and metrics that use it - `balanced_accuracy_score`, `cohen_kappa_score`) - raises `ValueError` ONLY when `labels=None`:
   * `labels=None`: calls [`unique_labels`](https://github.com/scikit-learn/scikit-learn/blob/66200f149ba6c64a3a93dd73e37bdeea87bb5db8/sklearn/utils/multiclass.py#L41) like above
   * `labels` is set - AFAICT no error but the confusion matrix would only contain 0. e.g., if `y_true` is string and `labels` is set to all possible unique values in `y_true`, we would do the following conversion:
  
https://github.com/scikit-learn/scikit-learn/blob/66200f149ba6c64a3a93dd73e37bdeea87bb5db8/sklearn/metrics/_classification.py#L584-L587

Since none of the string `labels` are present in numeric `y_pred`, `y_pred` would be converted to array that consists only of `n_labels + 1` values

* `multilabel_confusion_matrix` - always raises raises `ValueError` (again via `unique_labels`):
   * unlike `confusion_matrix`, even when `labels` is given, we call `unique_labels(y_true, y_pred)` to get all labels present, and if any are not provided in `labels` they are added.

The following all DO NOT error:

* `accuracy_score` - no error but result would always be 0 (as `score = y_true == y_pred` would always be 0). Note not relevant for multilabel cases as input needs to be label indicator matrix.
* `hamming_loss` - no error but result would always be 1. Again not relevant for multilabel.
* `zero_one_loss` - no error but result would always be 1 (when `normalize=False`). Again not relevant for multilabel.
* `matthews_corrcoef` - no error but result would always be 0. Note that we transform `y_true` and `y_pred` using `LabelEncoder` fit on `y_true` and `y_pred` concat (meaning `y_true` and `y_pred` will be numbers, but not contain any numbers in common), before we pass to `confusion_matrix`.

For these classification metrics which require `y_pred` to be thresholded predictions, I don't think it makes sense to have mixed string and numeri (e.g., how would you match ['apple', 'orange', 'apple] to [2,3,2] ?). Indeed, for the metrics that do not error, the result is the 'worst' value.
For metrics that accept `pos_label` the way forward is less clear. Note that for several continuous classification metrics we do use something like `y_true == pos_label`, thus allowing `y_true` to be string.

Note that I will open a separate issue around this for continuous classification metrics (i.e. those in `_ranking.py`).

cc @ogrisel 



	if need_index_conversion:
	label_to_ind = {label: index for index, label in enumerate(labels)}
	y_pred = np.array([label_to_ind.get(label, n_labels + 1) for label in y_pred])
	y_true = np.array([label_to_ind.get(label, n_labels + 1) for label in y_true])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Mixed string/numeric when input is list for classification metrics #33045

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

Mixed string/numeric when input is list for classification metrics #33045

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions