You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A discussion happened in the GLM PR #14300 about what properties we would like sample_weight to have.
Current Versions
First, a short side comment about 3 ways simple weights (s_i) are currently used in loss functions with regularized generalized linear models in scikit-learn (as far as I understand),
For instance, currently proposed in the GLM PR for PoissonRegressor etc (edit: meanwhile implemented this way)
For instance Lasso, ElasticNet, PoissonRegressor, GammaRegressor, TweedieRegressor.
Properties
For sample weight it's useful to think in term of invariant properties, as they can be directly expressed in common tests. For instance,
Similarly, paraphrasing #14300 (comment) other properties we might want to enforce, are,
multiplying some sample weight by N is equivalent to repeating the corresponding samples N times.
It is verified only by L_1a and L_2b. Example: For L_2a setting all weights to 2, is equivalent to having 2x more samples only if α = α / 2.
Finally, that scaling sample weight has no effect. This is only verified by L_2b. For both L_1a and L_2a multiplying all samples weights by k is equivalent to setting α = α / k.
This one is more controversial. Against enforcing this,
that we don't want a coupling between using samples weight and regularization. Example: Say one has a model without sample weights, and one wants to see if applying samples weights (imbalanced dataset, sample uncertainty, etc) improves it. Without this property it's difficult to conclude: is the evaluation metric better with sample weights, due to those, or simply because we now have a better regularized model? One has to simultaneously consider these two factors.
Whether we want/need consistency between the use of sample weight in metrics in estimators is another question. I'm not convinced we do, since in most cases estimators don't care about the global scaling of the loss function, and these formulations are equivalent up to a scaling of the regularization parameter. So maybe using the L_1a equivalent expression in metrics could be fine.
In any case, we need to decide the behavior we want. This is a blocker for,
This can wait after the release.
A discussion happened in the GLM PR #14300 about what properties we would like
sample_weightto have.Current Versions
First, a short side comment about 3 ways simple weights (
s_i) are currently used in loss functions with regularized generalized linear models in scikit-learn (as far as I understand),Version 1a:$L_{1a}(\omega) = \sum_i s_i \cdot l(x_i, \omega) + \alpha \lVert \omega\rVert$
For instance:
Ridge(alsoLogisticRegressionwhereC=1/α)Version 2a:$L_{2a}(\omega) = \frac{1}{n_{\text{samples}}}\sum_i s_i \cdot l(x_i, \omega) + \alpha \lVert \omega\rVert$
For instance:
SGDClassifier?Version 2b:$L_{2b}(\omega) = \frac{1}{\sum_i s_i}\sum_i s_i \cdot l(x_i, \omega) + \alpha \lVert \omega\rVert$
For instance, currently proposed in the GLM PR forPoissonRegressoretc (edit: meanwhile implemented this way)For instance
Lasso,ElasticNet,PoissonRegressor,GammaRegressor,TweedieRegressor.Properties
For sample weight it's useful to think in term of invariant properties, as they can be directly expressed in common tests. For instance,
All of the above formulations should verify this.It is verified only byL_1aandL_2b.Similarly, paraphrasing #14300 (comment) other properties we might want to enforce, are,
multiplying some sample weight by
Nis equivalent to repeating the corresponding samples N times.It is verified only by
L_1aandL_2b.Example: For
L_2asetting all weights to 2, is equivalent to having 2x more samples only ifα = α / 2.Finally, that scaling sample weight has no effect. This is only verified by
L_2b. For bothL_1aandL_2amultiplying all samples weights bykis equivalent to settingα = α / k.This one is more controversial. Against enforcing this,
in favor,
Example: Say one has a model without sample weights, and one wants to see if applying samples weights (imbalanced dataset, sample uncertainty, etc) improves it. Without this property it's difficult to conclude: is the evaluation metric better with sample weights, due to those, or simply because we now have a better regularized model? One has to simultaneously consider these two factors.
Whether we want/need consistency between the use of sample weight in metrics in estimators is another question. I'm not convinced we do, since in most cases estimators don't care about the global scaling of the loss function, and these formulations are equivalent up to a scaling of the regularization parameter. So maybe using the
L_1aequivalent expression in metrics could be fine.In any case, we need to decide the behavior we want. This is a blocker for,
ElasticNetand Lasso [MRG] Sample weights for ElasticNet #15436Note: Ridge actually seem to have a different sample weight behavior for dense and sparse as reported in #15438
@agramfort 's option on this can be found in #15651 (comment) (if I understood correctly).
Please correct if I missed something (this could also use a more in depth review of how it is done in other libraries).