[MRG + 1] fix test_20news_vectorized#7431
Conversation
|
Ouch this isn't great :(. It seems to be instant for me if the data has already been downloaded though. Maybe you can use the same strategy as the previous test: try:
data = datasets.fetch_20newsgroups(
subset='all', download_if_missing=False, shuffle=False)
except IOError:
raise SkipTest("Download 20 newsgroups to run this test")This way it will at least be run by developers on their machine ... |
Actually not that instant 1.7s if the data has already been downloaded. |
|
this is a bug when loading pre 0.17 bunches with >=0.17 code. Please remove the file from the |
I don't think so, I removed the 20news stuff from ~/scikit_learn_data just to double-check. Note this is not a bug per se. It's just a test that is always skipped. If you are curious enough to check whether it would pass if the |
|
Whoops. I thought it was the number of samples that was changed, not the number of features. Sorry. |
|
No worries, I am actually amazed that you remember stuff like this almost as well as it happened yesterday :). |
|
yeah, it's just a test that is never run, failing (presumably because it is never run 😉 ) |
|
@nelson-liu do you want to modify the tests as per #7431 (comment), i.e. only run the test if the data has already been downloaded. |
|
oops, I must have missed your suggestion there. I will go ahead and update that. |
|
LGTM |
It was there for historical reason to explain the raise SkipTest
|
@lesteve i left the comment in there since 1.7 seconds is pretty slow by test standards. In human-time standards, I guess it's ok though :) thanks for the fix |
It's kind of slow but I still don't think there is much value in the comment. Since the test is not run by Travis, I checked out your PR locally and found a problem which I fixed. I am getting a bit over-enthusiastic with the recent Github feature where maintainers can push into the PR branch directly (in my case I just edited the file via the Github UI), hope you don't mind ;-). For small things I feel it reduces greatly the overhead of asynchronous communication. |
|
no problem, go for it :) i agree with your changes, and I trust you to have good judgement. |
|
Thanks for patching this up @lesteve |
Run test if dataset has already been downloaded rather than always skipping it
Run test if dataset has already been downloaded rather than always skipping it
Run test if dataset has already been downloaded rather than always skipping it
Reference Issue
None
What does this implement/fix? Explain your changes.
Was mucking around in the test cases for
20newsgroupsand noticed that this test (which is not run due to being too slow) was failing on latest master. Maybe something changed in the vectorization that caused the number of features to change?Alternatively, if we don't want to maintain something like this, perhaps we should remove the test?
This change is