Skip to content

[MRG + 1] fix test_20news_vectorized#7431

Merged
lesteve merged 7 commits intoscikit-learn:masterfrom
nelson-liu:fix_20news_vectorized_test
Oct 5, 2016
Merged

[MRG + 1] fix test_20news_vectorized#7431
lesteve merged 7 commits intoscikit-learn:masterfrom
nelson-liu:fix_20news_vectorized_test

Conversation

@nelson-liu
Copy link
Copy Markdown
Contributor

@nelson-liu nelson-liu commented Sep 15, 2016

Reference Issue

None

What does this implement/fix? Explain your changes.

Was mucking around in the test cases for 20newsgroups and noticed that this test (which is not run due to being too slow) was failing on latest master. Maybe something changed in the vectorization that caused the number of features to change?

Alternatively, if we don't want to maintain something like this, perhaps we should remove the test?


This change is Reviewable

@lesteve
Copy link
Copy Markdown
Member

lesteve commented Sep 16, 2016

Ouch this isn't great :(. It seems to be instant for me if the data has already been downloaded though. Maybe you can use the same strategy as the previous test:

    try:
        data = datasets.fetch_20newsgroups(
            subset='all', download_if_missing=False, shuffle=False)
    except IOError:
        raise SkipTest("Download 20 newsgroups to run this test")

This way it will at least be run by developers on their machine ...

@lesteve
Copy link
Copy Markdown
Member

lesteve commented Sep 16, 2016

It seems to be instant

Actually not that instant 1.7s if the data has already been downloaded.

@amueller
Copy link
Copy Markdown
Member

this is a bug when loading pre 0.17 bunches with >=0.17 code. Please remove the file from the scikit_learn_data folder.

@lesteve
Copy link
Copy Markdown
Member

lesteve commented Sep 16, 2016

this is a bug when loading pre 0.17 bunches with >=0.17 code. Please remove the file from the scikit_learn_data folder.

I don't think so, I removed the 20news stuff from ~/scikit_learn_data just to double-check. Note this is not a bug per se. It's just a test that is always skipped. If you are curious enough to check whether it would pass if the raise SkipTest was removed then you realise, as @nelson-liu did, that it doesn't pass.

@amueller
Copy link
Copy Markdown
Member

Whoops. I thought it was the number of samples that was changed, not the number of features. Sorry.

@lesteve
Copy link
Copy Markdown
Member

lesteve commented Sep 16, 2016

No worries, I am actually amazed that you remember stuff like this almost as well as it happened yesterday :).

@nelson-liu
Copy link
Copy Markdown
Contributor Author

yeah, it's just a test that is never run, failing (presumably because it is never run 😉 )

@lesteve
Copy link
Copy Markdown
Member

lesteve commented Oct 5, 2016

@nelson-liu do you want to modify the tests as per #7431 (comment), i.e. only run the test if the data has already been downloaded.

@nelson-liu
Copy link
Copy Markdown
Contributor Author

oops, I must have missed your suggestion there. I will go ahead and update that.

@amueller
Copy link
Copy Markdown
Member

amueller commented Oct 5, 2016

LGTM

@amueller amueller changed the title [MRG] fix test_20news_vectorized [MRG + 1] fix test_20news_vectorized Oct 5, 2016
It was there for historical reason to explain the raise SkipTest
@nelson-liu
Copy link
Copy Markdown
Contributor Author

@lesteve i left the comment in there since 1.7 seconds is pretty slow by test standards. In human-time standards, I guess it's ok though :) thanks for the fix

@lesteve
Copy link
Copy Markdown
Member

lesteve commented Oct 5, 2016

@lesteve i left the comment in there since 1.7 seconds is pretty slow by test standards. In human-time standards, I guess it's ok though :) thanks for the fix

It's kind of slow but I still don't think there is much value in the comment. Since the test is not run by Travis, I checked out your PR locally and found a problem which I fixed.

I am getting a bit over-enthusiastic with the recent Github feature where maintainers can push into the PR branch directly (in my case I just edited the file via the Github UI), hope you don't mind ;-). For small things I feel it reduces greatly the overhead of asynchronous communication.

@nelson-liu
Copy link
Copy Markdown
Contributor Author

no problem, go for it :) i agree with your changes, and I trust you to have good judgement.

@lesteve lesteve merged commit fa6fafc into scikit-learn:master Oct 5, 2016
@nelson-liu
Copy link
Copy Markdown
Contributor Author

Thanks for patching this up @lesteve

amueller pushed a commit to amueller/scikit-learn that referenced this pull request Oct 14, 2016
Run test if dataset has already been downloaded rather than always skipping it
Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
Run test if dataset has already been downloaded rather than always skipping it
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
Run test if dataset has already been downloaded rather than always skipping it
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants