This repository was archived by the owner on Nov 27, 2023. It is now read-only.

Use sample_weight in the metric calculations in model validations and cross validation#2

Merged

ryan-deak-zefr merged 17 commits intomasterfrom

DAS-1145_sample_wt_validation

Feb 27, 2019

ryan-deak-zefr commented Feb 26, 2019 •

edited

Loading

DO NOT MERGE WITH PR WITHOUT ASKING!

Purpose

To correct metric calculations in model validation (and cross validation) by attempting to use sample_weight in the metric calculations when sample_weight are supplied in fit_params.

Adds sample weighting for model validation methods like cross validation inside sklearn/model_selection/_validation.py and sklearn/model_selection/_search.py.

Reference Issues/PRs

Fixes scikit-learn#4632. See also scikit-learn#10806.

What does this implement/fix? Explain your changes.

This change always uses the same sample_weight used in training to weight the metric calculations in _fit_and_score and functions called by _fit_and_score, so it should be thought of changing the default behavior of things like cross validation. This is similar to PR scikit-learn#10806 but doesn't allow separate test weights to be specified. It changes _validation.py and _search.py.

None of the CV search classes (GridSearchCV, etc) are changed. This operates at the function level.

Any other comments?

This PR provides some simple test showing how unweighted metrics have very unfavorable consequences when models are trained with importance weights but not scored with the same weights. It is shown in the test that the current omission of sample_weight in metrics calculations considerably overestimates the accuracy.


          sample weighting for cross validation inside model_selection/_search.py

699a661

ghost added the wip label

ghost assigned ryan-deak-zefr

ryan-deak-zefr requested review from jon-morra-zefr, vecchp, zachary-mcpher and zexuan-zhou and removed request for zexuan-zhou

February 26, 2019 22:24

ryan-deak-zefr added 3 commits

February 26, 2019 15:03


          make tests more difficult to pass.

38313b6


          Fixed bugs that caused other tests to fail.

a32c4de


          collape_nones (sic) -> collapse_nones

73bc931

ryan-deak-zefr commented

View reviewed changes

Author

ryan-deak-zefr left a comment

Put a few places that I am concerned about.

sklearn/model_selection/_search.py

+                      # test_sample_weight_sums.
+                      if self.return_train_score:
+                          if train_sample_weight_sums is None:
+                              samples = int(np.sum(test_sample_counts[:n_splits]))

Author

ryan-deak-zefr Feb 27, 2019

These two lines are the hairiest of all. I'm not 100% certain about whether this is the right way to do it but unfortunately, I don't see a better way because the total dataset size isn't maintained anywhere inside the function. Any thoughts?

vecchp Feb 27, 2019

Can we not store the total data set size in some way and pass it in?

Author

ryan-deak-zefr Feb 27, 2019

I'm not sure if this is possible. The reason I say this is that we can't necessarily just take the data set size because I don't think that the cross validation iterator (the cv parameter in GridSearchCV) must iterate over the entire dataset. So it should be based on the sizes defined by cv.

Author

ryan-deak-zefr Feb 27, 2019

Should I put a comment to this effect in the code?

sklearn/model_selection/_search.py Outdated Show resolved Hide resolved

sklearn/model_selection/_search.py

+                              else:
+                                  out, train_wts, test_wts = ([], [], [])
+                              train_wts = collapse_nones(train_wts)

Author

ryan-deak-zefr Feb 27, 2019

Turns a list of None into a None.

sklearn/model_selection/_search.py Show resolved Hide resolved

zachary-mcpher reviewed

View reviewed changes

sklearn/model_selection/_search.py Outdated

+                          if train_sample_weight_sums is None:
+                              samples = int(np.sum(test_sample_counts[:n_splits]))
+                              train_sample_counts = \
+                                  np.array(test_sample_counts[:n_splits], dtype=np.int) - \

zachary-mcpher Feb 27, 2019

Could you explain why this difference is necessary?

Author

ryan-deak-zefr Feb 27, 2019

@zachary-mcpherson-zefr: Good catch! The arguments are backward. Will fix. In the meantime, here's the reasoning (assuming the code was correct):

Say that you have the following three folds for cross validation:

fold 1: 50 examples
fold 2: 51 examples
fold 3: 49 examples

So it typically works by iterating over the test fold. So let's say that we are on the second loop iteration and therefore, fold 2 is the test fold. That means that folds 1 and 3 are the training folds. So, there should be 99 training examples for test fold 2.

samples should represent the total number of rows of data in the entire dataset. Then, to get the number of training rows, we can subtract the number of testing rows from the total to get the number of training rows associated with the test fold. So, back to our example, if the test fold is fold 2, we have samples = 150 and fold 2 is 51, then 150 - 51 == 99. Make sense?

zachary-mcpher Feb 27, 2019

Yeah that makes sense, thanks!

ryan-deak-zefr added 10 commits

February 26, 2019 17:38


          Swapped subtraction arguments.

c83f5eb


          added more tests

2e6c892


          even more tests.

b042193


          Show that the math in test_sample_weight_cross_validation works

7e54469


          Use exact rational values for expected value

853c92c


          renaming test variables

5d39641


          more testing the tests

4c823fb


          updated tests

e5b876f


          simplified test

d145b50


          testing testing testing

fc84b51

vecchp suggested changes

View reviewed changes

sklearn/model_selection/_search.py

+                      # test_sample_weight_sums.
+                      if self.return_train_score:
+                          if train_sample_weight_sums is None:
+                              samples = int(np.sum(test_sample_counts[:n_splits]))

vecchp Feb 27, 2019

Can we not store the total data set size in some way and pass it in?

sklearn/model_selection/_search.py Outdated Show resolved Hide resolved

sklearn/model_selection/_search.py

+                              return x is None
+                          def collape_nones(xs):
+                              return None if xs is None or any(map(is_none, xs)) else xs

vecchp Feb 27, 2019

Perhaps the function can be written like with a generator instead of a map?

def collapse_nones_g(xs):
    if xs is None or any(x is None for x in xs):
        return None
    return xs

Author

ryan-deak-zefr Feb 27, 2019

It's already a generator:

In [32]: %time sum(range(200000000))
CPU times: user 3.86 s, sys: 9.2 ms, total: 3.86 s
Wall time: 3.86 s

In [34]: %timeit any(map(lambda x: x > 2, range(200000000)))
940 ns ± 6.97 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

vecchp approved these changes

View reviewed changes

ryan-deak-zefr requested a review from zexuan-zhou

February 27, 2019 18:37

ryan-deak-zefr added 3 commits

February 27, 2019 10:44


          added comment

0ac5755


          removed commented code

940b9a0


          comma in comment

0e5704e

zexuan-zhou reviewed

View reviewed changes

sklearn/model_selection/tests/test_validation.py

+                  (precision_score, True, False, [2000000, 1000000, 1, 999999]),
+                  (log_loss, False, True, [2500000, 500000, 200000, 100000]),
+                  (brier_score_loss, False, True, [2500000, 500000, 200000, 100000]),
+              ])

zexuan-zhou Feb 27, 2019

Should we test more scorers?

Author

ryan-deak-zefr Feb 27, 2019

I don't think it's necessary. We could, but this is way better than what seems to currently exist in sklearn. I also have https://gist.github.com/deaktator/94545f807f139eba2c8a15381f2495e0 for which I'd like to submit a future PR to sklearn. It's just test code. Not any change to implementation.

ryan-deak-zefr merged commit 5913e40 into master

ryan-deak-zefr deleted the DAS-1145_sample_wt_validation branch

February 27, 2019 19:10

ryan-deak-zefr mentioned this pull request

Closes DAS-1146: Sample weight metric calculations ZEFR-INC/dask-ml#3

Merged

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Reviewers

jon-morra-zefr Awaiting requested review from jon-morra-zefr

3 more reviewers

zexuan-zhou zexuan-zhou left review comments

zachary-mcpher zachary-mcpher left review comments

vecchp vecchp approved these changes

Labels

wip