[MRG+1] Stronger tests for variable importances#5261
[MRG+1] Stronger tests for variable importances#5261glouppe merged 5 commits intoscikit-learn:masterfrom
Conversation
9ad805b to
6290fcf
Compare
6290fcf to
78974de
Compare
There was a problem hiding this comment.
Not that I disagree, but what is the reason for this change?
There was a problem hiding this comment.
I was trying to make the tree construction more stable by automatically rescaling the sample weights. Nothing worked with much success (it just moved the issue to someplace else...), but this test failed a few times because of this. Using more samples made that test more stable.
|
I've made some comments, but I support this. |
58fafac to
d1371e5
Compare
|
Thanks for your review @jmschrei ! I believe I addressed all your suggestions. |
d1371e5 to
bcc6f1b
Compare
There was a problem hiding this comment.
Thanks for including my suggestion. I apologize for being pedantic, but I meant a scale of more like [1e-8, 1e-4, 1e-1, 1e1, 1e4, 1e8]. Do you think this test is sufficient?
There was a problem hiding this comment.
Unfortunately, which such larger/smaller scale factors, tests dont pass anymore. Numerical discrepancies creep in, leading to slightly different impurity values... I am not sure we can do anything about that. (My idea was to internally scale down sample_weight by 1. / sample_weight.max(), but this just moves the numerical issues somewhere else...)
So for me this test is more a safeguard that nothing obviously wrong is happening, rather than a bulletproof test.
There was a problem hiding this comment.
Ah, okay. Thanks for checking!
There was a problem hiding this comment.
but 1e-4 would still work, right?
|
Other than my one comment, LGTM. |
|
With the last comment resolved, this LGTM, +1. |
There was a problem hiding this comment.
This line confuses me.
X_new.shape[1] == 3
So he first expression evaluates to True, which is cast to 1, which is less than X.shape[1] == 10
So the assert would hold even if X_new.shape[1] == 11. Actually it succeeds for all values of X_new.shape[1] as long as X.shape[1] > 1
I guess remove the 0 <?
There was a problem hiding this comment.
woops :) This one has been there for long without anybody noticing
|
thanks for more tests :) looks good though I didn't check the math in the theoretical importances. |
|
@amueller Thanks for the comments! I fixed those |
There was a problem hiding this comment.
Why do you use assert_less over assert_array_almost_equal?
There was a problem hiding this comment.
I find it easier to control the quality of the approximation, rather than having all single importance values match up to some digit (but doing this, I only require their mean to do so).
|
I would add a test to check the parallel computation of importances. |
|
Otherwise looks good. |
This PR adds stronger tests for variable importances in forests, including: