Skip to content

Fmax score (or maximum of F1/Fbeta) #26026

@jasperhyp

Description

@jasperhyp

Describe the workflow you want to enable

The maximum of F1 across thresholds is a well-studied metric and it is both robust and valid in binary and multilabel classification problems. Basically, it can be computed with precision_recall_curve, as shown below, for binary problems. For multilabel problems, I'm not sure if there is an efficient way to do this without looping thru all labels.

Describe your proposed solution

For binary:

def get_fmax(preds: np.ndarray, ys: np.ndarray, beta = 1.0, pos_label = 1):
    """
    Radivojac, P. et al. (2013). A Large-Scale Evaluation of Computational Protein Function Prediction. Nature Methods, 10(3), 221-227.
    """
    precision, recall, thresholds = precision_recall_curve(y_true = ys, probas_pred = preds, pos_label = pos_label)
    numerator = (1 + beta**2) * (precision * recall)
    denominator = ((beta**2 * precision) + recall)
    fbeta = np.divide(numerator, denominator, out=np.zeros_like(numerator), where=(denominator!=0))
    
    return np.nanmax(fbeta), thresholds[np.argmax(fbeta)]

Describe alternatives you've considered, if relevant

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions