[MRG] Fix MultinomialDeviance not using average logloss (#10055) by rempfler · Pull Request #10081 · scikit-learn/scikit-learn

rempfler · 2017-11-06T22:03:46Z

Changes

Changed MultinomialDeviance from total logloss to average logloss.

jnothman · 2017-11-07T00:45:42Z

Please add a test.

Re the codecov failure: It seems we're never running these losses with sample_weight=None (there's a fixme in the file to suggest it might be a good idea to support sample_weight=None more explicitly). I find it strange that such cases are implemented separately in the loss functions when they should provide negligible computational complexity improvement, and when they are never called. IMO, doing if sample_weight=None: sample_weight = np.array(1) in this case should
suffice...

j-xiao · 2017-11-08T20:51:37Z

not sure if need to change the computation of gradient also, I did not check the detail, but most like should update the gradient also.

massich · 2017-11-09T17:43:16Z

IMO, doing if sample_weight=None: sample_weight = np.array(1) in this case should

using numpy.average avoids the trouble all together. Since it has a weight parameter that can be None. see:

scikit-learn/sklearn/metrics/classification.py

Line 108 in effd75d

return np.average(sample_score, weights=sample_weight)

rempfler · 2017-11-13T22:55:35Z

IMO, doing if sample_weight=None: sample_weight = np.array(1)

This wouldnt take the mean in the unweighted case, since sample_weight.sum() would be 1, rather than the number of elements.

using numpy.average avoids the trouble all together.

Good idea.

Having looked at [1, Sect. 10.6], I actually believe the call with weights was not correct so far, since the sample_weights are not multiplied with the logsumexp-term:

     else:
        return np.sum(-1 * sample_weight * (Y * pred).sum(axis=1) +
                      logsumexp(pred, axis=1))

[1] Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning. Vol. 1. New York: Springer series in statistics, 2001.

codecov · 2017-11-13T23:11:39Z

Codecov Report

Merging #10081 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #10081      +/-   ##
==========================================
+ Coverage   96.19%   96.21%   +0.01%     
==========================================
  Files         336      337       +1     
  Lines       62739    62899     +160     
==========================================
+ Hits        60353    60518     +165     
+ Misses       2386     2381       -5

Impacted Files	Coverage Δ
...ble/tests/test_gradient_boosting_loss_functions.py	`95.62% <100%> (+0.57%)`	⬆️
sklearn/ensemble/gradient_boosting.py	`96.5% <100%> (+0.29%)`	⬆️
sklearn/utils/estimator_checks.py	`93.29% <0%> (-0.02%)`	⬇️
sklearn/cluster/_feature_agglomeration.py	`100% <0%> (ø)`	⬆️
sklearn/datasets/tests/test_samples_generator.py	`100% <0%> (ø)`	⬆️
sklearn/ensemble/bagging.py	`96.61% <0%> (ø)`	⬆️
sklearn/feature_selection/base.py	`94.79% <0%> (ø)`	⬆️
sklearn/neighbors/regression.py	`100% <0%> (ø)`	⬆️
sklearn/metrics/ranking.py	`98.8% <0%> (ø)`	⬆️
...klearn/cluster/tests/test_feature_agglomeration.py	`100% <0%> (ø)`
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update abb43c1...4970b02. Read the comment docs.

massich · 2017-11-14T23:23:42Z

good catch!

~~LGTM~~

Actually the test is just checking that something is computed. Can we find an example where we know the expected output for a given input? Something that serves as a regression test for the previous wrong computation. In other words, a wrong computation shows that we were missing this thest.

jnothman · 2017-11-15T10:49:46Z

I don't think your edited comment came through on email, @massich, requesting a better test...

rempfler · 2017-11-23T11:47:28Z

@massich: technically, the previously wrong computation would have been caught by

    loss_w_sw = loss(y, p, 0.5 * np.ones(p.shape[0], dtype=np.float32))
    assert_almost_equal(loss_wo_sw, loss_w_sw)

where the logsumexp terms would have had increased weight in the wrong case.

Nonetheless, i added one more small test with a manually pre-computed input-output pair.

massich · 2017-11-27T14:59:58Z

+    loss_w_sw = loss(y, p, 0.5 * np.ones(p.shape[0], dtype=np.float32))
+    assert_almost_equal(loss_wo_sw, loss_w_sw)
+
+    # second check


I would make this its own test

def test_loss_computation(pred=np.array([[1.0, 0, 0], [0, 0.5, 0.5]]), y=np.array([0, 1]), weights=np.array([1, 3]), expected_loss=0.85637): assert_almost_equal(loss(y, pred, weights), expected_loss, decimal=4)

Maybe even add a test for weights=None.

I would also split the rest of the test. I think that testing that the error raising should go on its own, etc..

massich · 2017-11-30T11:04:55Z

~~LGTM~~

glemaitre · 2017-11-30T13:55:10Z

+
+
+def test_mdl_exception():
+    # Check that MultinomialDeviance throws when n_classes <= 2


throws an error

glemaitre · 2017-11-30T13:55:54Z

+
+def test_mdl_exception():
+    # Check that MultinomialDeviance throws when n_classes <= 2
+    assert_raises(ValueError, MultinomialDeviance, 2)


Use assert_raises_regex and check which message is raise. We tend to check the error message rather than only checking that something is wrong ;)

Shall we use the context manager here?

I would not since sklearn already have helper function. If we want to use the context manager, we need to change all other asser_raises_*

glemaitre · 2017-11-30T13:56:57Z

+    assert_almost_equal(loss_wo_sw, loss_w_sw)
+
+
+def test_mdl_computation_unweighted(pred=np.array([[1.0, 0, 0],


Could you factorize this test and the next one using pytest.mark.parametrize?

@glemaitre I proposed such parametrization to avoid import pytest, since I'm not sure if @lesteve was trying to avoid the pytest dependency.

glemaitre · 2017-11-30T13:58:58Z

        assert deviance_wo_w == deviance_w_w
+
+
+def test_multinomial_deviance():


I think that you can use pytest.mark.parametrize as well here.

glemaitre · 2017-11-30T16:11:36Z

I thin this is only in common test. We have added some in columntransformer and transformedtargetregressor

massich · 2017-11-30T16:41:22Z

Ok that changes things !

lesteve · 2017-12-04T17:08:16Z

@glemaitre I proposed such parametrization to avoid import pytest, since I'm not sure if @lesteve was trying to avoid the pytest dependency.

For reference, I think we do not want pytest dependency in our modules but it is fine to use pytest in test_*.py files. Basically you should not require that pytest is installed if you just want to use scikit-learn. This is what CHECK_PYTEST_SOFT_DEPENDENCY is trying to enforce in .travis.yml.

About whether we should import directly pytest in test_*.py files I don't know. Personally I would say it is fine. One argument from never importing pytest directly would be in case we want to switch to a different testing framework than pytest. You could argue that we should keep all test-related stuff in sklearn.utils.testing and import from sklearn.utils.testing in test_*.py files. This is what we did when we wanted to make the move away from nose easier. Personally I think worrying about whether one day we will move away from pytest is YAGNI. Having said that, the nose to pytest move was a bit of a pain so maybe we want to make a similar move easier in the future.

jnothman · 2017-12-04T22:14:28Z

I don't think there is any good reason to apprehend a future in which we stop using pytest. by all means, import it in test files, use its built-in fixtures, etc.

lesteve · 2017-12-05T07:13:14Z

I don't think there is any good reason to apprehend a future in which we stop using pytest. by all means, import it in test files, use its built-in fixtures, etc.

Thanks this is what I felt as well, but this is always appreciated to have a second opinion.

jnothman · 2017-12-11T23:22:41Z

@glemaitre, @massich: your thoughts on the latest changes?

glemaitre · 2017-12-12T11:06:31Z

I would personally use pytest.approx or assert_allclose (I would probably used this one since it is wrapped in testing module) instead of assert_almost_equal.
Apart of that LGTM.

cmarmo · 2020-06-18T15:40:23Z

@glemaitre, @amueller, @lesteve, is the needed decision related to import or not pytest in the test? I suppose this is no longer a question...

lesteve · 2020-06-19T04:01:12Z

is the needed decision related to import or not pytest in the test? I suppose this is no longer a question...

Yeah importing pytest in the tests is a no-brainer. Looks like this one would need conflicts to be fixed and some reviews.

[MRG] Fix MultinomialDeviance not using average logloss (scikit-learn…

3249b06

…#10055)

Markus Rempfler added 2 commits November 14, 2017 00:08

Use np.average for handling weighted and unweighted cases

f3ca4c7

Add test for multinomial deviance

2d8095f

Extend test of multinomial deviance loss

4970b02

Fix issue with numerical precision in multinomial deviance loss test

22b52fe

massich reviewed Nov 27, 2017

View reviewed changes

Markus Rempfler added 2 commits November 29, 2017 23:58

Split MultinomialDeviance tests into several

6efc5ce

Add test case for unweighted MultinomialDeviance

fb7019b

glemaitre requested changes Nov 30, 2017

View reviewed changes

Markus Rempfler added 2 commits December 2, 2017 00:41

Add pytest.mark.parametrize to tests

cee722f

Fix import of assert_raises_regex

0e3cf98

glemaitre mentioned this pull request Dec 18, 2017

[MRG] Fix of Spectral embedding implementation #9062

Merged

amueller added the Needs Decision Requires decision label Aug 5, 2019

github-actions Bot added the module:ensemble label Mar 2, 2020

cmarmo added Stalled help wanted and removed Needs Decision Requires decision help wanted labels Jun 19, 2020

ghost mentioned this pull request Jun 24, 2020

FIX fix multinomial deviance by taking the weighted average instead of the sum #17694

Merged

glemaitre closed this in #17694 Jun 26, 2020



		def test_mdl_exception():
		# Check that MultinomialDeviance throws when n_classes <= 2

		assert_almost_equal(loss_wo_sw, loss_w_sw)


		def test_mdl_computation_unweighted(pred=np.array([[1.0, 0, 0],

		assert deviance_wo_w == deviance_w_w


		def test_multinomial_deviance():

Uh oh!

Conversation

rempfler commented Nov 6, 2017

Changes

Uh oh!

jnothman commented Nov 7, 2017

Uh oh!

j-xiao commented Nov 8, 2017

Uh oh!

massich commented Nov 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rempfler commented Nov 13, 2017

Uh oh!

codecov Bot commented Nov 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

massich commented Nov 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Nov 15, 2017

Uh oh!

rempfler commented Nov 23, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

massich commented Nov 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Nov 30, 2017 via email

Uh oh!

massich commented Nov 30, 2017

Uh oh!

lesteve commented Dec 4, 2017

Uh oh!

jnothman commented Dec 4, 2017 via email

Uh oh!

lesteve commented Dec 5, 2017

Uh oh!

jnothman commented Dec 11, 2017

Uh oh!

glemaitre commented Dec 12, 2017

Uh oh!

cmarmo commented Jun 18, 2020

Uh oh!

lesteve commented Jun 19, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

massich commented Nov 9, 2017 •

edited

Loading

codecov Bot commented Nov 13, 2017 •

edited

Loading

massich commented Nov 14, 2017 •

edited

Loading

massich commented Nov 30, 2017 •

edited

Loading