RFC Sample weight invariance properties

*This can wait after the release.*

A discussion happened in the GLM PR https://github.com/scikit-learn/scikit-learn/pull/14300 about what properties we would like `sample_weight` to have. 

### Current Versions
First, a short side comment about 3 ways simple weights (`s_i`) are currently used in loss functions with regularized generalized linear models in scikit-learn (as far as I understand),

- Version 1a: $L_{1a}(\omega) = \sum_i s_i \cdot l(x_i, \omega) + \alpha \lVert \omega\rVert$
  <details>
  <img src="https://user-images.githubusercontent.com/630936/69103411-9ff79900-0a65-11ea-8cf5-03a183a3a7bd.png" width="40%" />
  </details>
   
   For instance: `Ridge` (also `LogisticRegression` where `C=1/α`)
 
- Version 2a: $L_{2a}(\omega) = \frac{1}{n_{\text{samples}}}\sum_i s_i \cdot l(x_i, \omega) + \alpha \lVert \omega\rVert$
  <details>
  <img src="https://user-images.githubusercontent.com/630936/69103589-17c5c380-0a66-11ea-9bb2-bd6654c7b8ba.png" width="45%" />
  </details>

    For instance: `SGDClassifier`?

- Version 2b: $L_{2b}(\omega) = \frac{1}{\sum_i s_i}\sum_i s_i \cdot l(x_i, \omega) + \alpha \lVert \omega\rVert$
  <details>
  <img src="https://user-images.githubusercontent.com/630936/69103706-64110380-0a66-11ea-96cb-35fc8b4035dc.png" width="47%" />
  </details>

    ~~For instance, currently proposed in the GLM PR for `PoissonRegressor` etc (edit: meanwhile implemented this way)~~
    For instance `Lasso`, `ElasticNet`, `PoissonRegressor`, `GammaRegressor`, `TweedieRegressor`.

### Properties
For sample weight it's useful to think in term of invariant properties, as they can be directly expressed in common tests. For instance,

  1. checking that zero sample weight is equivalent to ignoring samples in https://github.com/scikit-learn/scikit-learn/pull/15015 (replaced by #17176) helped discovering a number of issues.
    ~~All of the above formulations should verify this.~~ It is verified only by `L_1a` and `L_2b`.

Similarly, paraphrasing https://github.com/scikit-learn/scikit-learn/pull/14300#issuecomment-543177937 other properties we might want to enforce, are,

  2. multiplying some sample weight by `N` is equivalent to repeating the corresponding samples N times.
  It is verified only by `L_1a` and `L_2b`.
  **Example:** For `L_2a` setting all weights to 2, is equivalent to having 2x more samples only if `α = α / 2`.
  
  3. Finally, that scaling sample weight has no effect. This is only verified by `L_2b`. For both `L_1a` and `L_2a` multiplying all samples weights by `k` is equivalent to setting `α = α / k`.
     
     This one is more controversial. Against enforcing this,
      - there are arguments of keeping a meaning for business metrics (e.g. https://github.com/scikit-learn/scikit-learn/issues/15651)
    
     in favor,
      - that we don't want a coupling between using samples weight and regularization. 
        **Example:** Say one has a model without sample weights, and one wants to see if applying samples weights (imbalanced dataset, sample uncertainty, etc) improves it. Without this property it's difficult to conclude: is the evaluation metric better with sample weights, due to those, or simply because we now have a better regularized model? One has to simultaneously consider these two factors.
        
Whether we want/need consistency between the use of sample weight in metrics in estimators is another question. I'm not convinced we do, since in most cases estimators don't care about the global scaling of the loss function, and  these formulations are equivalent up to a scaling of the regularization parameter. So maybe using the `L_1a` equivalent expression in metrics could be fine. 
        
In any case, we need to decide the behavior we want. This is a blocker for,
 - Poisson, Gamma and Tweedie Regression https://github.com/scikit-learn/scikit-learn/pull/14300
 - adding sample weights in `ElasticNet` and Lasso https://github.com/scikit-learn/scikit-learn/pull/15436
 - other tests for sample weights consistency in linear models by @lorentzenchr in https://github.com/scikit-learn/scikit-learn/pull/15554
 
**Note:** Ridge actually seem to have a different sample weight behavior for dense and sparse as reported in https://github.com/scikit-learn/scikit-learn/issues/15438

@agramfort 's option on this can be found in https://github.com/scikit-learn/scikit-learn/issues/15651#issuecomment-555210612 (if I understood correctly).

Please correct if I missed something (this could also use a more in depth review of how it is done in other libraries).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC Sample weight invariance properties #15657

Current Versions

Properties

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

RFC Sample weight invariance properties #15657

Description

Current Versions

Properties

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions