Use sample_weight in the metric calculations in model validations by ryan-deak-zefr · Pull Request #1 · ZEFR-INC/scikit-learn

ryan-deak-zefr · 2019-02-14T23:54:28Z

Purpose

To correct metric calculations in model validation (and cross validation) by attempting to use sample_weight in the metric calculations when sample_weight are supplied in fit_params.

Reference Issues/PRs

Fixes scikit-learn#4632. See also scikit-learn#10806.

What does this implement/fix? Explain your changes.

This change always uses the same sample_weight used in training to weight the metric calculations in _fit_and_score and functions called by _fit_and_score, so it should be thought of changing the default behavior of things like cross validation. This is unlike PR scikit-learn#10806 in that it is isolated to _validation.py and doesn't touch _search.py. This is very intentional.

Any other comments?

This PR provides a simple test showing how unweighted metrics have very unfavorable consequences when models are trained with importance weights but not scored with the same weights. It is shown in the test that the current omission of sample_weight in metrics calculations considerably overestimates the accuracy (0.5 accuracy vs 9.99999 * 10^-7).

…calculation

ryan-deak-zefr · 2019-02-15T00:36:33Z

sklearn/model_selection/_validation.py

+        Score returned by ``scorer`` applied to ``X`` and ``y`` given
+        ``sample_weight``.
+    """
+    if sample_weight is None or np.all(sample_weight == 1):


This np.all(sample_weight == 1) check is here to make it so that if sample_weight are all ones, then it is the same as omitting. This allows us to use scorers that don't support sample_weight.

ryan-deak-zefr · 2019-02-15T00:36:52Z

sklearn/model_selection/_validation.py

+            if 'sample_weight' in str(e):
+                raise TypeError(
+                    (
+                        "Attempted to use 'sample_weight' for training "


Repackages error with informative error message.

zexuan-zhou

Love the error handling part that deal with some scorers not supporting sample_weights

ryan-deak-zefr · 2019-02-15T17:02:06Z

sklearn/model_selection/_validation.py

+        else:
+            score = scorer(estimator, X, y)
+    else:
+        try:


Is there any benefit to passing sample_weight as a keyword argument? E.g.

kwargs = { 'sample_weight': sample_weight} # ... score = score(estimator, X, y, **kwargs)

The reason I ask is that when I found the other PR that did importance weighting (https://github.com/scikit-learn/scikit-learn/pull/10806/files#diff-60033c11a662f460e1567effd5faa6f0R584) they had this. I don't know if that's important or not.

What does everyone think?

ryan-deak-zefr · 2019-02-15T17:03:00Z

sklearn/model_selection/tests/test_validation.py

+
+    # Unnecessary extra test to illustrate that this is not the desired
+    # metric value (but previously what has been historically returned).
+    assert train_metric != np.sum(y[train]) / y[train].shape[0]


This is unnecessary, but I like it for illustrative purposes.

vecchp

+1

… scorer

Updated _validation to use sample_weight from training in the metric …

b305d1a

…calculation

ghost added the wip label Feb 14, 2019

ghost assigned ryan-deak-zefr Feb 14, 2019

ryan-deak-zefr added 4 commits February 14, 2019 15:56

Removed old commented out code.

3f2e9da

test comment

8a8faea

rearranged comments in test

a8801a3

formatting and comment

81100fa

ryan-deak-zefr commented Feb 15, 2019

View reviewed changes

ryan-deak-zefr requested review from vecchp and zexuan-zhou February 15, 2019 00:41

comments

255c720

zexuan-zhou approved these changes Feb 15, 2019

View reviewed changes

ryan-deak-zefr commented Feb 15, 2019

View reviewed changes

vecchp approved these changes Feb 15, 2019

View reviewed changes

ryan-deak-zefr added 2 commits February 15, 2019 10:37

added comment for motivation behind directly passing sample_weight to…

e09ecf9

… scorer

added else: raise e to _apply_scorer

e12bd17

vecchp merged commit b364f40 into master Feb 15, 2019

vecchp deleted the DAS-1145_sample_wt_validation branch February 15, 2019 19:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use sample_weight in the metric calculations in model validations#1

Use sample_weight in the metric calculations in model validations#1
vecchp merged 8 commits intomasterfrom
DAS-1145_sample_wt_validation

ryan-deak-zefr commented Feb 14, 2019

Uh oh!

ryan-deak-zefr Feb 15, 2019

Uh oh!

ryan-deak-zefr Feb 15, 2019

Uh oh!

zexuan-zhou left a comment

Uh oh!

ryan-deak-zefr Feb 15, 2019

Uh oh!

ryan-deak-zefr Feb 15, 2019

Uh oh!

vecchp left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ryan-deak-zefr commented Feb 14, 2019

Purpose

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

ryan-deak-zefr Feb 15, 2019

Choose a reason for hiding this comment

Uh oh!

ryan-deak-zefr Feb 15, 2019

Choose a reason for hiding this comment

Uh oh!

zexuan-zhou left a comment

Choose a reason for hiding this comment

Uh oh!

ryan-deak-zefr Feb 15, 2019

Choose a reason for hiding this comment

Uh oh!

ryan-deak-zefr Feb 15, 2019

Choose a reason for hiding this comment

Uh oh!

vecchp left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants