[MRG] Download and test datasets in cron job#16348
[MRG] Download and test datasets in cron job#16348thomasjpfan merged 20 commits intoscikit-learn:masterfrom
Conversation
|
With this update to master, PRs may have a negative coverage result (when compared to master), because they will not run the network tests. This would be a false positive. |
One alternative is to disable uploading of codecov results from nightly builds. It's already confusing since it doesn't run for each commit. I feel that running untested code is more important than having 100% accurate coverage results (which are not trustworthy already now). |
|
Looking at the |
|
Thanks for the feedback. In addition, I realized that tests might again be skipped, if their order should change at some point. Therefore, added wrappers to fetch the datasets, similar to how EDIT: Don't know about the failing doc-min-dependencies, seems unrelated? |
|
Merging with master should resolve the circleci issue |
thomasjpfan
left a comment
There was a problem hiding this comment.
How do you feel about putting the logic in sklearn/datasets/test/conftest.py as a fixture?
# sklearn/datasets/test/conftest.py
from os import environ
import pytest
from sklearn.datasets import fetch_kddcup99 as _fetch_kddcup99
def _wrapped_fetch(f, dataset_name):
download_if_missing = environ.get('SKLEARN_SKIP_NETWORK_TESTS', '1') == '0'
def wrapped(*args, **kwargs):
kwargs['download_if_missing'] = download_if_missing
try:
return f(*args, **kwargs)
except IOError:
pytest.skip("Download {} to run this test".format(dataset_name))
return wrapped
@pytest.fixture
def fetch_kddcup99():
return _wrapped_fetch(fetch_kddcup99, dataset_name='kddcup99')
# define the other fixtures here as well.And then in the test we can do
def test_percent10(fetch_kddcup99):
data = fetch_kddcup99()|
Gladly, much cleaner this way. Thanks! Some tests were still skipped, because pandas is available (e.g. |
I think it is out of scope. I would be happy to include this feature in another PR. On our CI, we have instances that do not have pandas installed to test checks that requires pandas to not be installed. |
|
I reverted the pandas-related changes, and created a new PR for that. |
rth
left a comment
There was a problem hiding this comment.
Thanks @VarIr ! I'm not fully convinced this is the right use case for fixtures. For me fixtures can allow to setup an environment (e.g. make test data) in which the test can be run. Having tests which test fixtures is more confusing as it means that pytest machinery is going to show in the traceback for errors.
Still since it was suggested in the review before, let's go forward with it. LGTM, aside for one comment below.
|
Renamed the fixtures. Unfortunately, the PR labeler still fails after merging master. |
Thanks. Never mind that: cf. #16520 |
There was a problem hiding this comment.
This PR is already quite advanced so please feel free to ignore this comment. Still:
In the long run I would rather like to consolidate everything under the Azure Pipelines CI and stop using travis (and maybe even circle ci). That would make it easier for maintainers not to have to deal with multiple logins on multiple CI systems with different permissions systems.
Azure Pipelines allow for scheduled jobs so we could run daily / weekly checks there.
I think we should still merge this PR as is: most of the changes are unrelated to CI provider. Then migrate to Azure Pipelines once #16603 is merged. |
|
Thank you all for reviewing! |
Reference Issues/PRs
Fixes #16340
What does this implement/fix? Explain your changes.
Download and test datasets in Travis cron job. Otherwise, these tests are never automatically run.
Any other comments?