You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is not a good first (or second) issue to contribute. If you are interested in contributing to scikit-learn please have a look at our contributing doc and in particular the section Issues for new contributors.
Classification metrics (CLASSIFICATION_METRICS from test_common.py, they are all defined in sklearn/metrics/_classification.py, but not all metrics in _classification.py fall in this list) vary in how they handle mixed string and numeric inputs (e.g., y_true is string and y_pred is numeric).
precision_recall_fscore_support and friends (precision_score, recall_score, f1_score, fbeta_score and related jaccard_score) - raises ValueError:
via _check_set_wise_labels, which calls unique_labels, which does not allow "mix of string and integer labels“
note that these metrics do take a pos_label parameter, so theoretically when y_true is string, we could use y_true == pos_label to convert to [1,0]. This would not make sense if y_pred was string.
confusion_matrix (and metrics that use it - balanced_accuracy_score, cohen_kappa_score) - raises ValueError ONLY when labels=None:
labels is set - AFAICT no error but the confusion matrix would only contain 0. e.g., if y_true is string and labels is set to all possible unique values in y_true, we would do the following conversion:
Since none of the string labels are present in numeric y_pred, y_pred would be converted to array that consists only of n_labels + 1 values
multilabel_confusion_matrix - always raises raises ValueError (again via unique_labels):
unlike confusion_matrix, even when labels is given, we call unique_labels(y_true, y_pred) to get all labels present, and if any are not provided in labels they are added.
The following all DO NOT error:
accuracy_score - no error but result would always be 0 (as score = y_true == y_pred would always be 0). Note not relevant for multilabel cases as input needs to be label indicator matrix.
hamming_loss - no error but result would always be 1. Again not relevant for multilabel.
zero_one_loss - no error but result would always be 1 (when normalize=False). Again not relevant for multilabel.
matthews_corrcoef - no error but result would always be 0. Note that we transform y_true and y_pred using LabelEncoder fit on y_true and y_pred concat (meaning y_true and y_pred will be numbers, but not contain any numbers in common), before we pass to confusion_matrix.
For these classification metrics which require y_pred to be thresholded predictions, I don't think it makes sense to have mixed string and numeri (e.g., how would you match ['apple', 'orange', 'apple] to [2,3,2] ?). Indeed, for the metrics that do not error, the result is the 'worst' value.
For metrics that accept pos_label the way forward is less clear. Note that for several continuous classification metrics we do use something like y_true == pos_label, thus allowing y_true to be string.
Note that I will open a separate issue around this for continuous classification metrics (i.e. those in _ranking.py).
Warning
This is not a good first (or second) issue to contribute. If you are interested in contributing to scikit-learn please have a look at our contributing doc and in particular the section Issues for new contributors.
Noticed while working on #32755
Classification metrics (
CLASSIFICATION_METRICSfromtest_common.py, they are all defined insklearn/metrics/_classification.py, but not all metrics in_classification.pyfall in this list) vary in how they handle mixed string and numeric inputs (e.g.,y_trueis string andy_predis numeric).precision_recall_fscore_supportand friends (precision_score,recall_score,f1_score,fbeta_scoreand relatedjaccard_score) - raisesValueError:_check_set_wise_labels, which callsunique_labels, which does not allow "mix of string and integer labels“pos_labelparameter, so theoretically wheny_trueis string, we could usey_true == pos_labelto convert to [1,0]. This would not make sense ify_predwas string.confusion_matrix(and metrics that use it -balanced_accuracy_score,cohen_kappa_score) - raisesValueErrorONLY whenlabels=None:labels=None: callsunique_labelslike abovelabelsis set - AFAICT no error but the confusion matrix would only contain 0. e.g., ify_trueis string andlabelsis set to all possible unique values iny_true, we would do the following conversion:scikit-learn/sklearn/metrics/_classification.py
Lines 584 to 587 in 66200f1
Since none of the string
labelsare present in numericy_pred,y_predwould be converted to array that consists only ofn_labels + 1valuesmultilabel_confusion_matrix- always raises raisesValueError(again viaunique_labels):confusion_matrix, even whenlabelsis given, we callunique_labels(y_true, y_pred)to get all labels present, and if any are not provided inlabelsthey are added.The following all DO NOT error:
accuracy_score- no error but result would always be 0 (asscore = y_true == y_predwould always be 0). Note not relevant for multilabel cases as input needs to be label indicator matrix.hamming_loss- no error but result would always be 1. Again not relevant for multilabel.zero_one_loss- no error but result would always be 1 (whennormalize=False). Again not relevant for multilabel.matthews_corrcoef- no error but result would always be 0. Note that we transformy_trueandy_predusingLabelEncoderfit ony_trueandy_predconcat (meaningy_trueandy_predwill be numbers, but not contain any numbers in common), before we pass toconfusion_matrix.For these classification metrics which require
y_predto be thresholded predictions, I don't think it makes sense to have mixed string and numeri (e.g., how would you match ['apple', 'orange', 'apple] to [2,3,2] ?). Indeed, for the metrics that do not error, the result is the 'worst' value.For metrics that accept
pos_labelthe way forward is less clear. Note that for several continuous classification metrics we do use something likey_true == pos_label, thus allowingy_trueto be string.Note that I will open a separate issue around this for continuous classification metrics (i.e. those in
_ranking.py).cc @ogrisel