RFC Semantic of sample_weight in regression metrics

Several metrics in scikit-learn are using `np.average(..., weight=sample_weight)` under the hood:

* `mean_absolute_error`
* `mean_squared_error`
* `explained_variance_error`
* `r2_score`
* `mean_tweedie_deviance`

When computing the average, `numpy` will divide by the sum of the weights. Is it the intended behaviour. For instance:

```python
sample_weight = [1, 2, 3, 4]
```

and

```python
sample_weight = [1/10, 2/10, 3/10, 4/10]
```

will lead to the same error/score. Dividing by the sum of the weight will also remove any meaning about the use of a unit (if `sample_weight` is related to a business unit for instance).

So I was wondering if we should multiply the average by the sum of the weight or not. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC Semantic of sample_weight in regression metrics #15651

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

RFC Semantic of sample_weight in regression metrics #15651

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions