-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
Open
Labels
Description
Several metrics in scikit-learn are using np.average(..., weight=sample_weight) under the hood:
mean_absolute_errormean_squared_errorexplained_variance_errorr2_scoremean_tweedie_deviance
When computing the average, numpy will divide by the sum of the weights. Is it the intended behaviour. For instance:
sample_weight = [1, 2, 3, 4]and
sample_weight = [1/10, 2/10, 3/10, 4/10]will lead to the same error/score. Dividing by the sum of the weight will also remove any meaning about the use of a unit (if sample_weight is related to a business unit for instance).
So I was wondering if we should multiply the average by the sum of the weight or not.
Reactions are currently unavailable