Skip to content

Fixed issue where yield test wrapped with skip_if_32bit was not run#7656

Closed
Erotemic wants to merge 2 commits intoscikit-learn:masterfrom
Erotemic:fix_wrapped_gen_tests
Closed

Fixed issue where yield test wrapped with skip_if_32bit was not run#7656
Erotemic wants to merge 2 commits intoscikit-learn:masterfrom
Erotemic:fix_wrapped_gen_tests

Conversation

@Erotemic
Copy link
Copy Markdown
Contributor

In #7654 I discovered that a yield test was not being run by nosetests because it was wrapped in skip_if_32bit.

Previously
nosetests sklearn.ensemble.tests.test_forest:test_importances --verbose would output

sklearn.ensemble.tests.test_forest.test_importances ... ok
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK

However, this success was simply because nothing was being run.

With the change all 10 of the tests embeded in this one function are run. Unfortunately they are all failing.

(venv2) joncrall@hyrule:~/code/scikit-learn$ nosetests sklearn.ensemble.tests.test_forest:test_importances --verbose
sklearn.ensemble.tests.test_forest.test_importances('ExtraTreesClassifier', 'gini', array([[ 0.31727627,  1.05972592, -0.1446832 , ...,  0.5589773 , ... FAIL
sklearn.ensemble.tests.test_forest.test_importances('ExtraTreesClassifier', 'entropy', array([[ 0.31727627,  1.05972592, -0.1446832 , ...,  0.5589773 , ... FAIL
sklearn.ensemble.tests.test_forest.test_importances('RandomForestClassifier', 'gini', array([[ 0.31727627,  1.05972592, -0.1446832 , ...,  0.5589773 , ... FAIL
sklearn.ensemble.tests.test_forest.test_importances('RandomForestClassifier', 'entropy', array([[ 0.31727627,  1.05972592, -0.1446832 , ...,  0.5589773 , ... FAIL
sklearn.ensemble.tests.test_forest.test_importances('ExtraTreesRegressor', 'mse', array([[ 0.31727627,  1.05972592, -0.1446832 , ...,  0.5589773 , ... FAIL
sklearn.ensemble.tests.test_forest.test_importances('ExtraTreesRegressor', 'friedman_mse', array([[ 0.31727627,  1.05972592, -0.1446832 , ...,  0.5589773 , ... FAIL
sklearn.ensemble.tests.test_forest.test_importances('ExtraTreesRegressor', 'mae', array([[ 0.31727627,  1.05972592, -0.1446832 , ...,  0.5589773 , ... FAIL
sklearn.ensemble.tests.test_forest.test_importances('RandomForestRegressor', 'mse', array([[ 0.31727627,  1.05972592, -0.1446832 , ...,  0.5589773 , ... FAIL
sklearn.ensemble.tests.test_forest.test_importances('RandomForestRegressor', 'friedman_mse', array([[ 0.31727627,  1.05972592, -0.1446832 , ...,  0.5589773 , ... FAIL
sklearn.ensemble.tests.test_forest.test_importances('RandomForestRegressor', 'mae', array([[ 0.31727627,  1.05972592, -0.1446832 , ...,  0.5589773 , ... FAIL

======================================================================
FAIL: sklearn.ensemble.tests.test_forest.test_importances('ExtraTreesClassifier', 'gini', array([[ 0.31727627,  1.05972592, -0.1446832 , ...,  0.5589773 ,
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/joncrall/venv2/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/joncrall/code/scikit-learn/sklearn/ensemble/tests/test_forest.py", line 209, in check_importances
    assert_equal(n_important, 3)
AssertionError: 2 != 3
    '2 != 3' = '%s != %s' % (safe_repr(2), safe_repr(3))
    '2 != 3' = self._formatMessage('2 != 3', '2 != 3')
>>  raise self.failureException('2 != 3')


======================================================================
FAIL: sklearn.ensemble.tests.test_forest.test_importances('ExtraTreesClassifier', 'entropy', array([[ 0.31727627,  1.05972592, -0.1446832 , ...,  0.5589773 ,
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/joncrall/venv2/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/joncrall/code/scikit-learn/sklearn/ensemble/tests/test_forest.py", line 209, in check_importances
    assert_equal(n_important, 3)
AssertionError: 2 != 3
    '2 != 3' = '%s != %s' % (safe_repr(2), safe_repr(3))
    '2 != 3' = self._formatMessage('2 != 3', '2 != 3')
>>  raise self.failureException('2 != 3')


======================================================================
FAIL: sklearn.ensemble.tests.test_forest.test_importances('RandomForestClassifier', 'gini', array([[ 0.31727627,  1.05972592, -0.1446832 , ...,  0.5589773 ,
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/joncrall/venv2/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/joncrall/code/scikit-learn/sklearn/ensemble/tests/test_forest.py", line 209, in check_importances
    assert_equal(n_important, 3)
AssertionError: 2 != 3
    '2 != 3' = '%s != %s' % (safe_repr(2), safe_repr(3))
    '2 != 3' = self._formatMessage('2 != 3', '2 != 3')
>>  raise self.failureException('2 != 3')


======================================================================
FAIL: sklearn.ensemble.tests.test_forest.test_importances('RandomForestClassifier', 'entropy', array([[ 0.31727627,  1.05972592, -0.1446832 , ...,  0.5589773 ,
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/joncrall/venv2/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/joncrall/code/scikit-learn/sklearn/ensemble/tests/test_forest.py", line 209, in check_importances
    assert_equal(n_important, 3)
AssertionError: 2 != 3
    '2 != 3' = '%s != %s' % (safe_repr(2), safe_repr(3))
    '2 != 3' = self._formatMessage('2 != 3', '2 != 3')
>>  raise self.failureException('2 != 3')


======================================================================
FAIL: sklearn.ensemble.tests.test_forest.test_importances('ExtraTreesRegressor', 'mse', array([[ 0.31727627,  1.05972592, -0.1446832 , ...,  0.5589773 ,
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/joncrall/venv2/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/joncrall/code/scikit-learn/sklearn/ensemble/tests/test_forest.py", line 209, in check_importances
    assert_equal(n_important, 3)
AssertionError: 2 != 3
    '2 != 3' = '%s != %s' % (safe_repr(2), safe_repr(3))
    '2 != 3' = self._formatMessage('2 != 3', '2 != 3')
>>  raise self.failureException('2 != 3')


======================================================================
FAIL: sklearn.ensemble.tests.test_forest.test_importances('ExtraTreesRegressor', 'friedman_mse', array([[ 0.31727627,  1.05972592, -0.1446832 , ...,  0.5589773 ,
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/joncrall/venv2/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/joncrall/code/scikit-learn/sklearn/ensemble/tests/test_forest.py", line 209, in check_importances
    assert_equal(n_important, 3)
AssertionError: 2 != 3
    '2 != 3' = '%s != %s' % (safe_repr(2), safe_repr(3))
    '2 != 3' = self._formatMessage('2 != 3', '2 != 3')
>>  raise self.failureException('2 != 3')


======================================================================
FAIL: sklearn.ensemble.tests.test_forest.test_importances('ExtraTreesRegressor', 'mae', array([[ 0.31727627,  1.05972592, -0.1446832 , ...,  0.5589773 ,
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/joncrall/venv2/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/joncrall/code/scikit-learn/sklearn/ensemble/tests/test_forest.py", line 209, in check_importances
    assert_equal(n_important, 3)
AssertionError: 2 != 3
    '2 != 3' = '%s != %s' % (safe_repr(2), safe_repr(3))
    '2 != 3' = self._formatMessage('2 != 3', '2 != 3')
>>  raise self.failureException('2 != 3')


======================================================================
FAIL: sklearn.ensemble.tests.test_forest.test_importances('RandomForestRegressor', 'mse', array([[ 0.31727627,  1.05972592, -0.1446832 , ...,  0.5589773 ,
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/joncrall/venv2/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/joncrall/code/scikit-learn/sklearn/ensemble/tests/test_forest.py", line 209, in check_importances
    assert_equal(n_important, 3)
AssertionError: 2 != 3
    '2 != 3' = '%s != %s' % (safe_repr(2), safe_repr(3))
    '2 != 3' = self._formatMessage('2 != 3', '2 != 3')
>>  raise self.failureException('2 != 3')


======================================================================
FAIL: sklearn.ensemble.tests.test_forest.test_importances('RandomForestRegressor', 'friedman_mse', array([[ 0.31727627,  1.05972592, -0.1446832 , ...,  0.5589773 ,
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/joncrall/venv2/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/joncrall/code/scikit-learn/sklearn/ensemble/tests/test_forest.py", line 209, in check_importances
    assert_equal(n_important, 3)
AssertionError: 2 != 3
    '2 != 3' = '%s != %s' % (safe_repr(2), safe_repr(3))
    '2 != 3' = self._formatMessage('2 != 3', '2 != 3')
>>  raise self.failureException('2 != 3')


======================================================================
FAIL: sklearn.ensemble.tests.test_forest.test_importances('RandomForestRegressor', 'mae', array([[ 0.31727627,  1.05972592, -0.1446832 , ...,  0.5589773 ,
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/joncrall/venv2/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/joncrall/code/scikit-learn/sklearn/ensemble/tests/test_forest.py", line 209, in check_importances
    assert_equal(n_important, 3)
AssertionError: 2 != 3
    '2 != 3' = '%s != %s' % (safe_repr(2), safe_repr(3))
    '2 != 3' = self._formatMessage('2 != 3', '2 != 3')
>>  raise self.failureException('2 != 3')


----------------------------------------------------------------------
Ran 10 tests in 1.109s

FAILED (failures=10)

What I did to fix the initial problem was in skip_if_32bit I checked if the input function was a generator.
If it was not then the function runs like it used to. If it is a generator I create a different wrapper that re-yields the results if the function is enabled and yields functions that raise SkipTest errors if the function is disabled.

I also raised NotImplemented errors in other wrappers if a generator function is passed in.

@jnothman
Copy link
Copy Markdown
Member

Thanks @Erotemic.

Does one of the tree hackers (@nelson-liu?) want to fix the underlying bug, which lies in the calculation of feature_importances_ or in the assumptions the test makes about that calculation (with respect to data generated by make_classification)?

else:
return func(*args, **kwargs)
return run_test
if inspect.isgeneratorfunction(func):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can get rid of some of the code duplication.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it so if skip_if_32bit is called with a generator it recalls skip_if_32bit with the functions it generates. This maintains the original way the function was written and removes duplicate code. It does introduce technically introduce a recursive call, but in practice the recursion depth should never exceed 1.

@amueller amueller added this to the 0.18.1 milestone Oct 14, 2016
@amueller
Copy link
Copy Markdown
Member

can you maybe print importances in the failing test so we can see if this is just a slight numeric error that could be fixed with 30 trees?

@Erotemic
Copy link
Copy Markdown
Contributor Author

Erotemic commented Oct 15, 2016

@amueller I sorted the importances in descending order and then printed them out

 sorted_importances = [ 0.42  0.21  0.07  0.04  0.04  0.04  0.04  0.04  0.04  0.04]
Fsorted_importances = [ 0.4   0.2   0.08  0.05  0.05  0.05  0.05  0.05  0.04  0.04]
Fsorted_importances = [ 0.47  0.22  0.05  0.04  0.04  0.04  0.04  0.03  0.03  0.03]
Fsorted_importances = [ 0.46  0.23  0.05  0.04  0.04  0.04  0.04  0.04  0.04  0.03]
Fsorted_importances = [ 0.54  0.23  0.05  0.03  0.03  0.03  0.03  0.02  0.02  0.02]
Fsorted_importances = [ 0.54  0.23  0.05  0.03  0.03  0.03  0.03  0.02  0.02  0.02]
Fsorted_importances = [ 0.56  0.25  0.05  0.03  0.02  0.02  0.02  0.02  0.02  0.02]
Fsorted_importances = [ 0.6   0.22  0.03  0.03  0.02  0.02  0.02  0.02  0.02  0.02]
Fsorted_importances = [ 0.6   0.22  0.03  0.03  0.02  0.02  0.02  0.02  0.02  0.02]
Fsorted_importances = [ 0.74  0.12  0.02  0.02  0.02  0.02  0.02  0.02  0.01  0.01]
F

It does not seem to be just a numerical error. The third best feature is not close to 0.1

Perhaps this is an error in datasets.make_classification? Maybe there really are only 2 informative features in this dataset?

@jnothman
Copy link
Copy Markdown
Member

No, the tests make false assumptions because not all informative features
are necessary to make a decision. Use a concentric spheres dataset to fix
the test, i.e. all ?10 features random, but design y such that
classification depends on magnitude of first three features together so
that y=1 corresponds to a sphere.

On 16 October 2016 at 08:33, Jon Crall notifications@github.com wrote:

@amueller https://github.com/amueller I sorted the importances in
descending order and then printed them out

sorted_importances = [ 0.42 0.21 0.07 0.04 0.04 0.04 0.04 0.04 0.04 0.04]
Fsorted_importances = [ 0.4 0.2 0.08 0.05 0.05 0.05 0.05 0.05 0.04 0.04]
Fsorted_importances = [ 0.47 0.22 0.05 0.04 0.04 0.04 0.04 0.03 0.03 0.03]
Fsorted_importances = [ 0.46 0.23 0.05 0.04 0.04 0.04 0.04 0.04 0.04 0.03]
Fsorted_importances = [ 0.54 0.23 0.05 0.03 0.03 0.03 0.03 0.02 0.02 0.02]
Fsorted_importances = [ 0.54 0.23 0.05 0.03 0.03 0.03 0.03 0.02 0.02 0.02]
Fsorted_importances = [ 0.56 0.25 0.05 0.03 0.02 0.02 0.02 0.02 0.02 0.02]
Fsorted_importances = [ 0.6 0.22 0.03 0.03 0.02 0.02 0.02 0.02 0.02 0.02]
Fsorted_importances = [ 0.6 0.22 0.03 0.03 0.02 0.02 0.02 0.02 0.02 0.02]
Fsorted_importances = [ 0.74 0.12 0.02 0.02 0.02 0.02 0.02 0.02 0.01 0.01]
F

It does not seem to be just a numerical error. The third best feature is
not close to 0.1


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#7656 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz60ndbau-C9ekJdvj9bLm7O4T6RrGks5q0UafgaJpZM4KVUDN
.

@jnothman
Copy link
Copy Markdown
Member

even a cube will do

On 16 October 2016 at 08:58, Joel Nothman joel.nothman@gmail.com wrote:

No, the tests make false assumptions because not all informative features
are necessary to make a decision. Use a concentric spheres dataset to fix
the test, i.e. all ?10 features random, but design y such that
classification depends on magnitude of first three features together so
that y=1 corresponds to a sphere.

On 16 October 2016 at 08:33, Jon Crall notifications@github.com wrote:

@amueller https://github.com/amueller I sorted the importances in
descending order and then printed them out

sorted_importances = [ 0.42 0.21 0.07 0.04 0.04 0.04 0.04 0.04 0.04 0.04]
Fsorted_importances = [ 0.4 0.2 0.08 0.05 0.05 0.05 0.05 0.05 0.04 0.04]
Fsorted_importances = [ 0.47 0.22 0.05 0.04 0.04 0.04 0.04 0.03 0.03 0.03]
Fsorted_importances = [ 0.46 0.23 0.05 0.04 0.04 0.04 0.04 0.04 0.04 0.03]
Fsorted_importances = [ 0.54 0.23 0.05 0.03 0.03 0.03 0.03 0.02 0.02 0.02]
Fsorted_importances = [ 0.54 0.23 0.05 0.03 0.03 0.03 0.03 0.02 0.02 0.02]
Fsorted_importances = [ 0.56 0.25 0.05 0.03 0.02 0.02 0.02 0.02 0.02 0.02]
Fsorted_importances = [ 0.6 0.22 0.03 0.03 0.02 0.02 0.02 0.02 0.02 0.02]
Fsorted_importances = [ 0.6 0.22 0.03 0.03 0.02 0.02 0.02 0.02 0.02 0.02]
Fsorted_importances = [ 0.74 0.12 0.02 0.02 0.02 0.02 0.02 0.02 0.01 0.01]
F

It does not seem to be just a numerical error. The third best feature is
not close to 0.1


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#7656 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz60ndbau-C9ekJdvj9bLm7O4T6RrGks5q0UafgaJpZM4KVUDN
.

@nelson-liu
Copy link
Copy Markdown
Contributor

@jnothman I can try to take a look at this over the weekend, sorry for the long response time.

@amueller
Copy link
Copy Markdown
Member

amueller commented Nov 11, 2016

This test also fails on 64bit on many systems: https://travis-ci.org/MacPython/scikit-learn-wheels/jobs/175145582 these are actually 32bit. I'm silly

@amueller
Copy link
Copy Markdown
Member

Yeah so either we fix this for the release or we skip the test :-/

@amueller
Copy link
Copy Markdown
Member

Ok actually I have no idea what's happening here and I need to get dinner. But I'm very confused. Why does a test that has a skip_if_32 fail only on 32bit?

@amueller
Copy link
Copy Markdown
Member

ah, the tests are only skipped for the ensembles, not the trees.... if this test is fixed, the tree test needs to be fixed, too.

@raghavrv
Copy link
Copy Markdown
Member

raghavrv commented Nov 11, 2016

I'll can look into that tomorrow and try fixing it... :)

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Jul 5, 2017

Fixed by #9282.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants