[MRG] TST Unskip test_importances in forest and loop over 64/32 bit for testing by raghavrv · Pull Request #9242 · scikit-learn/scikit-learn

raghavrv · 2017-06-28T15:29:29Z

Reference Issue

Attempts to fix #7656

What does this implement/fix? Explain your changes.

Right now I'm unskipping the test and waiting for travis to throw errors

raghavrv · 2017-06-29T13:05:56Z

sklearn/ensemble/tests/test_forest.py

@@ -228,18 +231,20 @@ def check_importances(name, criterion, X, y):
        assert_less(np.abs(importances - importances_bis).mean(), 0.001)


@ogrisel @jnothman All the builds fail with this error message at this line -

AssertionError: 0.0067716825379038909 not less than 0.001

I'm not sure if this has anything to do with the dataset per se.

@glouppe @jmschrei is it safe to reduce the tolerance of this test to 0.01 to pass the test?

Apparently only the regressors with "mae" fail here. Classifiers and other regression scores pass. It's probably the case that mae is more sensitive to changes in scale of the weights. I think it's ok to increase the threshold. I did some experiments in an interactive shell and the first three features (the informative onces) are always more important that the others.

Maybe you can replace this check by the following:

# The forest estimator can detect that only the first 3 features of the dataset # are informative: expected_mask = np.zeros(20, dtype=bool) expected_mask[:3] = True assert_array_equal(est.feature_importances_ > 0.1, expected_mask)

You call also add a check to rescale X instead of the sample_weight array and check that the features importances are exactly equal to the reference model.

Ah okay. In addition to your proposed test, is it okay if I check the criterion and iff it's mae, change the tolerance to 0.01?

yes let's do that.

Why not simply

assert np.all(importances[:3] > 0.1)

raghavrv · 2017-06-29T13:07:04Z

(I'm marking this blocker as the issue was marked blocker as well. Please feel free to remove if this need not make into 0.19)

ogrisel · 2017-06-29T18:01:57Z

sklearn/ensemble/tests/test_forest.py



-def check_importances(name, criterion, X, y):
+def check_importances(name, criterion, X, y, dtype):


To make the test reports easier to read, please put dtype before X and y. Or even put X, y as global variables at the beginning of the module.

Sorry I don't follow you. I thought since there were check_* functions, the parameters of the functions show up in pytest and it's easier to debug? Which is why I included them as parameters rather than looping them.

It's just that the arrays are long and the end of the parameters are truncated. Putting dtype before X would render the report more informative. The actual content of X and y is not very informative.

Okay thanks

ogrisel

LGTM once my comments are addressed and CI is green.

ogrisel · 2017-06-30T15:47:40Z

The RandomForest + mae test are really slow (more than 4s). It would be great to find a way to speed them up while still having the existing checks pass. For instance by reducing the dataset size (maybe the number of features), or the number of trees in the forest.

ogrisel · 2017-06-30T15:48:30Z

But actually maybe this reveals a performance issue in our code. If so it would be great to open a new issue to track it and skip the MAE case in this test.

ogrisel · 2017-07-05T12:56:57Z

Closing in favor of #9282.

TST Unskip test and loop over 64/32 bit for testing

5adb4e5

raghavrv commented Jun 29, 2017

View reviewed changes

raghavrv added this to the 0.19 milestone Jun 29, 2017

raghavrv added Blocker Bug labels Jun 29, 2017

ogrisel reviewed Jun 29, 2017

View reviewed changes

ogrisel approved these changes Jun 29, 2017

View reviewed changes

raghavrv added 2 commits June 30, 2017 11:26

TST Reduce testing tolerance for MAE

72a5b22

TST Ensure first 3 features are found important

d6fa74e

ogrisel mentioned this pull request Jul 5, 2017

[MRG+1] TST Unskip test_importances in forest and loop over 64/32 bit for testing #9282

Merged

ogrisel closed this Jul 5, 2017

		@@ -228,18 +231,20 @@ def check_importances(name, criterion, X, y):
		assert_less(np.abs(importances - importances_bis).mean(), 0.001)



		def check_importances(name, criterion, X, y):
		def check_importances(name, criterion, X, y, dtype):

Uh oh!

Conversation

raghavrv commented Jun 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issue

What does this implement/fix? Explain your changes.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raghavrv commented Jun 29, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Jun 30, 2017

Uh oh!

ogrisel commented Jun 30, 2017

Uh oh!

ogrisel commented Jul 5, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

raghavrv commented Jun 28, 2017 •

edited

Loading