Check mirror cache before attempting download by pllim · Pull Request #6987 · astropy/astropy

pllim · 2017-12-14T19:44:02Z

astropy-bot · 2017-12-14T19:44:28Z

Hi there @pllim 👋 - thanks for the pull request! I'm just a friendly 🤖 that checks for issues related to the changelog and making sure that this pull request is milestoned and labeled correctly. This is mainly intended for the maintainers, so if you are not a maintainer you can ignore this, and a maintainer will let you know if any action is required on your part 😃.

Everything looks good from my point of view! 👍

If there are any issues with this message, please report them here.

pllim · 2017-12-14T19:48:24Z

astropy/utils/data.py

                    return url2hash[url_key]
+                # If there is a cached copy from mirror, use it.
+                else:
+                    for cur_url in (conf.dataurl, conf.dataurl_alias):


Note: I am open to a more elegant way to do this.

You can decide if it's really "elegant", but this at least has the advantage of not requiring that dataurl_alias be specified:

dataurls_to_alias = {} # this could probably just sit at module-level global cope? # do this whenever the first time is the dataurl is actually *accessed* from urllib.requests import urlopen u = urlopen(conf.dataurl) u.close() dataurls_to_alias[conf.dataurl] = u.geturl() # then at the line this comment is at, do this: urls_to_try = [conf.dataurl] if conf.dataurl in dataurls_to_alias: urls_to_try.append(dataurls_to_alias[conf.dataurl]) for cur_url in urls_to_try: ...

Okay, I'll give it a try. Thank you!

Looking at this more, this solution will be harder to backport. I'll implement Python 3 only code in this PR and comment on how to make it Python 2 compatible.

pllim · 2017-12-14T19:49:19Z

astropy/utils/tests/test_data.py

-
-
-@pytest.mark.remote_data('astropy')
-def test_get_cached_urls():


Note: This is now indirectly tested in test_download_parallel above. No need to have duplicate tests that do the same thing.

Maybe just note in a comment in test_download_parallel that it does this? Just want to make sure someone in the future who might decide to edit test_download_parallel knows that it should also be checking this

adrn · 2017-12-14T20:42:45Z

Nice! Thanks for this. I have to wrap my head around whether the test you have is sufficient...but looks good otherwise.

bsipocz · 2017-12-14T23:01:45Z

astropy/utils/tests/test_data.py

+                        _get_download_cache_locs, get_cached_urls)
+
+    main_url = 'http://data.astropy.org/intersphinx/README'
+    mirror_url = 'http://www.astropy.org/astropy-data/intersphinx/README'


is astropy.org really a mirror, isn't it more fool proof to link directly to github as the mirror here?

It is the mirror listed in astropy.utils.data.conf. I could use GitHub but it would be inconsistent with actual config -- @astrofrog ?

oh, OK, then never mind my comment.

eteq

I like the general idea, @pllim, but I don't like requiring the user to know what the alias is. In the inline comments I offer an idea for how to determine it, though. I tested and it seems to work just as you'd like for the astropy data case!

eteq · 2017-12-22T06:23:15Z

astropy/utils/data.py

    Configuration parameters for `astropy.utils.data`.
    """

+    # NOTE: Make sure all the dataurl values have trailing "/".


Maybe put this in the "help" string (on line 48) instead? While unlikely, random user might decide to re-assign the data site to some local copy and they should know the trailing / is necessary.

eteq · 2017-12-22T06:25:40Z

astropy/utils/tests/test_data.py

-
-
-@pytest.mark.remote_data('astropy')
-def test_get_cached_urls():


Maybe just note in a comment in test_download_parallel that it does this? Just want to make sure someone in the future who might decide to edit test_download_parallel knows that it should also be checking this

eteq · 2017-12-22T06:35:53Z

astropy/utils/data.py

                    return url2hash[url_key]
+                # If there is a cached copy from mirror, use it.
+                else:
+                    for cur_url in (conf.dataurl, conf.dataurl_alias):


You can decide if it's really "elegant", but this at least has the advantage of not requiring that dataurl_alias be specified:

dataurls_to_alias = {} # this could probably just sit at module-level global cope? # do this whenever the first time is the dataurl is actually *accessed* from urllib.requests import urlopen u = urlopen(conf.dataurl) u.close() dataurls_to_alias[conf.dataurl] = u.geturl() # then at the line this comment is at, do this: urls_to_try = [conf.dataurl] if conf.dataurl in dataurls_to_alias: urls_to_try.append(dataurls_to_alias[conf.dataurl]) for cur_url in urls_to_try: ...

pllim

@eteq , I think I have addressed your comments.

pllim · 2017-12-23T00:22:37Z

CHANGES.rst

 astropy.utils
 ^^^^^^^^^^^^^

+- ``download_file`` function will check for cache downloaded from mirror URL


NOTE: If we decide not to backport, this milestone needs to be moved.

pllim · 2017-12-23T00:23:27Z

astropy/utils/data.py

+    # Check if URL is Astropy data server, which has alias, and cache it.
+    if (url_key.startswith(conf.dataurl) and
+            conf.dataurl not in _dataurls_to_alias):
+        with urllib.request.urlopen(conf.dataurl, timeout=timeout) as remote:


NOTE: Use contextlib for PY2 compat (see example in 2.x code just a few lines below this one).

pllim · 2017-12-23T00:26:15Z

astropy/utils/tests/test_data.py

+    # Now test that download_file looks in mirror's cache before download.
+    # https://github.com/astropy/astropy/issues/6982
+    dldir, urlmapfn = _get_download_cache_locs()
+    with shelve.open(urlmapfn) as url2hash:


NOTE: Use _open_shelve for PY2 compat (see example in 2.x code in data.py).

eteq · 2017-12-23T03:45:27Z

Merged!

@pllim - I think we do want to backport this, if we can. Do you think you can write a PR that does the backporting since it requires significant code changes?

What I'm not sure about is the best way to do the backporting given that it involves substantial changes. Probably the best thing is to make a branch from 2.0.x, starting with a git cherry-pick 2203fa5f4c1868555f156929d1c06290d3548989, and then add a second commit and create a PR against the 2.0.x branch? But perhaps @bsipocz or @astrofrog know if this will break the release-related scripts?

bsipocz · 2017-12-23T10:50:44Z

yes, if we don't directly cherry-pick to the branch but through a feature branch, then the scripts are not recognizing it (currently, hopefully they will one day). But, we can always flag this up for manual backport, so then the script skips the checks.

pllim · 2017-12-23T13:30:41Z

If it's easier to do manual backport, I don't mind opening another PR but it'll have to be next year. ;)

Check mirror cache before attempting download

pllim · 2018-01-10T20:05:59Z

Looks like @bsipocz beat me to it (481b67c). Thank you!

pllim added Affects-release Bug utils labels Dec 14, 2017

pllim added this to the v2.0.4 milestone Dec 14, 2017

pllim requested a review from adrn December 14, 2017 19:44

pllim force-pushed the fix-utils-cache branch from b8313c5 to 1556b28 Compare December 14, 2017 19:45

pllim commented Dec 14, 2017

View reviewed changes

bsipocz reviewed Dec 14, 2017

View reviewed changes

eteq requested changes Dec 22, 2017

View reviewed changes

pllim added 2 commits December 22, 2017 18:36

Check mirror cache before attempting download.

4c5abd7

Address review comments

9c731e5

pllim force-pushed the fix-utils-cache branch from 1556b28 to 9c731e5 Compare December 23, 2017 00:19

pllim commented Dec 23, 2017

View reviewed changes

eteq approved these changes Dec 23, 2017

View reviewed changes

eteq merged commit 2203fa5 into astropy:master Dec 23, 2017

bsipocz pushed a commit that referenced this pull request Dec 31, 2017

Merge pull request #6987 from pllim/fix-utils-cache

5b8190f

Check mirror cache before attempting download

bsipocz added a commit that referenced this pull request Dec 31, 2017

Fixing py2 compatibility of #6987 backport as suggested in the PR

481b67c

pllim deleted the fix-utils-cache branch January 8, 2018 15:41

pllim mentioned this pull request Apr 20, 2018

TST: Fix parallel download caching in test #7395

Merged

pllim mentioned this pull request May 23, 2018

Unicode error in astropy.utils.data in Python 2.7 #7488

Closed

pllim mentioned this pull request Jun 18, 2018

Astropy data mirror download infrastructure is not robust #6061

Closed

This was referenced Nov 20, 2018

Feature Request: Functional of_site for offline use #7653

Closed

BUG: Ignore mirror caching when no internet #8163

Merged

pllim mentioned this pull request Aug 30, 2019

initialize data url alias for data.astropy.org #9187

Closed



		@pytest.mark.remote_data('astropy')
		def test_get_cached_urls():

Uh oh!

Conversation

pllim commented Dec 14, 2017

Uh oh!

astropy-bot bot commented Dec 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adrn commented Dec 14, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eteq left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pllim left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eteq commented Dec 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bsipocz commented Dec 23, 2017

Uh oh!

pllim commented Dec 23, 2017

Uh oh!

pllim commented Jan 10, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

astropy-bot bot commented Dec 14, 2017 •

edited

Loading

eteq left a comment •

edited

Loading

eteq commented Dec 23, 2017 •

edited

Loading