Use sample_weight in the metric calculations in model validations#1
Use sample_weight in the metric calculations in model validations#1
Conversation
| Score returned by ``scorer`` applied to ``X`` and ``y`` given | ||
| ``sample_weight``. | ||
| """ | ||
| if sample_weight is None or np.all(sample_weight == 1): |
There was a problem hiding this comment.
This np.all(sample_weight == 1) check is here to make it so that if sample_weight are all ones, then it is the same as omitting. This allows us to use scorers that don't support sample_weight.
| if 'sample_weight' in str(e): | ||
| raise TypeError( | ||
| ( | ||
| "Attempted to use 'sample_weight' for training " |
There was a problem hiding this comment.
Repackages error with informative error message.
zexuan-zhou
left a comment
There was a problem hiding this comment.
Love the error handling part that deal with some scorers not supporting sample_weights
| else: | ||
| score = scorer(estimator, X, y) | ||
| else: | ||
| try: |
There was a problem hiding this comment.
Is there any benefit to passing sample_weight as a keyword argument? E.g.
kwargs = { 'sample_weight': sample_weight}
# ...
score = score(estimator, X, y, **kwargs)The reason I ask is that when I found the other PR that did importance weighting (https://github.com/scikit-learn/scikit-learn/pull/10806/files#diff-60033c11a662f460e1567effd5faa6f0R584) they had this. I don't know if that's important or not.
What does everyone think?
|
|
||
| # Unnecessary extra test to illustrate that this is not the desired | ||
| # metric value (but previously what has been historically returned). | ||
| assert train_metric != np.sum(y[train]) / y[train].shape[0] |
There was a problem hiding this comment.
This is unnecessary, but I like it for illustrative purposes.
Purpose
To correct metric calculations in model validation (and cross validation) by attempting to use
sample_weightin the metric calculations whensample_weightare supplied infit_params.Reference Issues/PRs
Fixes scikit-learn#4632. See also scikit-learn#10806.
What does this implement/fix? Explain your changes.
This change always uses the same
sample_weightused in training to weight the metric calculations in_fit_and_scoreand functions called by_fit_and_score, so it should be thought of changing the default behavior of things like cross validation. This is unlike PR scikit-learn#10806 in that it is isolated to_validation.pyand doesn't touch_search.py. This is very intentional.Any other comments?
This PR provides a simple test showing how unweighted metrics have very unfavorable consequences when models are trained with importance weights but not scored with the same weights. It is shown in the test that the current omission of
sample_weightin metrics calculations considerably overestimates the accuracy (0.5 accuracy vs 9.99999 * 10^-7).