-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
Description
When astropy downloads remote files, it can store them in a global cache; it can then avoid re-downloading them unless explicitly requested. For normal use this cache persists between astropy runs and can save time and load on remote servers (only some of which are associated with Astropy).
Currently when we run the test suite, we run the entire thing with the cache in a temporary location, so all remote files must be re-downloaded for every test run. For testing astropy.utils.data, which actually implements the caching mechanism, as of #9182 this is unnecessary, as it uses its own temporary caches for tests where they matter. For any other tests that require remote data, we simply don't run them unless explicitly requested by the user; if the user does request them, all files are re-downloaded every time.
I suggest that astropy not reset its cache to point somewhere temporary on every run. This will not affect astropy.utils.data, whose test suite demonstrates that astropy's caching and downloading mechanisms work correctly. All other tests that require remote data can then be set (by adding flags to their download_file calls) to:
- use cached data if available, not touching the server at all in this case,
- never use the cache because the requested data is different every time, or
- update the data in the cache if it is excessively old.
This will immediately improve the user experience when running astropy with --remote-data=astropy or --remote-data=any. Travis jobs that require remote data can then also be set to persist the Astropy cache, making them both more reliable and faster (for all tests in categories 1 and 3).
This will have the additional advantage that running the astropy test suite will populate the user's cache with things like the IERS tables; they can use this as a first pass at preparing for disconnected operation.
It is probably a good idea to add testing flags to use a temporary cache or to clear the cache (instead of running tests) so that users can easily cope with strange behaviour that comes about if cached files are changed online (though currently users simply stumble over such situations without guidance in normal operation).
One early step of any PR would be to note the volume the cache was populated with after a full --remote-data=any run, which would help decide whether keeping the cache around was a good idea.