-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
Fmax score (or maximum of F1/Fbeta) #26026
Copy link
Copy link
Closed as not planned
Labels
Needs Decision - Include FeatureRequires decision regarding including featureRequires decision regarding including featureNeeds InfoNew Feature
Description
Describe the workflow you want to enable
The maximum of F1 across thresholds is a well-studied metric and it is both robust and valid in binary and multilabel classification problems. Basically, it can be computed with precision_recall_curve, as shown below, for binary problems. For multilabel problems, I'm not sure if there is an efficient way to do this without looping thru all labels.
Describe your proposed solution
For binary:
def get_fmax(preds: np.ndarray, ys: np.ndarray, beta = 1.0, pos_label = 1):
"""
Radivojac, P. et al. (2013). A Large-Scale Evaluation of Computational Protein Function Prediction. Nature Methods, 10(3), 221-227.
"""
precision, recall, thresholds = precision_recall_curve(y_true = ys, probas_pred = preds, pos_label = pos_label)
numerator = (1 + beta**2) * (precision * recall)
denominator = ((beta**2 * precision) + recall)
fbeta = np.divide(numerator, denominator, out=np.zeros_like(numerator), where=(denominator!=0))
return np.nanmax(fbeta), thresholds[np.argmax(fbeta)]
Describe alternatives you've considered, if relevant
No response
Additional context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Needs Decision - Include FeatureRequires decision regarding including featureRequires decision regarding including featureNeeds InfoNew Feature