Cache name resolve results in astropy.coordinates#9162
Conversation
|
I wanted to say this seems dangerous... But honestly when I think on it I realize this would indeed be super useful for me in my own science some of the time - often I grumble that I can't keep going with something using I do think it needs a way to bust the cache easily though. I.e., "I know sesame was updated last week and I want the newest coordinates". Right now that requires knowing how to clear the cache via a url the user never sees... Maybe just have an option for |
|
Good point! I added an |
d87a526 to
0d95f33
Compare
|
Why is it necessary to manually handle locking and caching? Would astropy.utils.data.download_file do what you want if it had an |
|
Like the idea and wonder if we can at least make a session-local cache the default... But agree with @aarchiba that the actual cache handling should be in |
|
Needs a rebase and review being addressed. Most likely also a milestone change. |
0d95f33 to
994253d
Compare
|
I rebased and switched to using the new and In the previous set of commits (overwritten with the rebase), this is why I was manually interfacing with the cache: I needed to set the hash to the URL instead of the file content. So, in order to get this PR to work, we would need to add functionality to |
|
I think you already can. See "If they are concerned that the primary source of the data..." in https://astropy.readthedocs.io/en/latest/utils/data.html |
Indeed, the result is going to be different every time, and if the cache works, you will get the same result as the first successful try (unless you choose to use cache="update" and re-download). Why is this a problem?
Why? The hash need not be user-visible.
Why? I don't understand. If I understand what you want (maybe I don't?) you should be able to write: response = get_file_contents(download_file("http://cds.query/goes?here", cache=True))This will get you a fresh request the first time and a copy of that result every later time. Isn't that what you want? The hash value appears only in the filename returned from |
|
This seems worthwhile, so just wanted to give a 👍 on the idea! |
994253d to
03c246e
Compare
|
Oh right, this PR! Updated and rebased. |
03c246e to
ee51f5b
Compare
|
RE: @aarchiba's comments - I don't remember why I was confused before. On revisiting this, it seemed pretty straightforward to get this working, so I think the new commits address the remaining issues. The test right now is pretty "weak," so open to suggestions there, but don't think it's critical. |
There was a problem hiding this comment.
I'm basically happy on this, but one minor consideration: I think overwrite is a bit ambiguous of a keyword on its own, which is why I was proposing cache='overwrite'. So what about overwrite->overwrite_cache?
I don't want this to block merging, though, so I'm approving since everything else looks good. And the test failure is a coverage red-herring because it's tested via remote_data.
|
OK, sure! |
astropy/coordinates/name_resolve.py
Outdated
| download_file(url, cache=cache, show_progress=False, | ||
| timeout=data.conf.remote_timeout)) |
There was a problem hiding this comment.
| download_file(url, cache=cache, show_progress=False, | |
| timeout=data.conf.remote_timeout)) | |
| download_file(url, cache=cache, show_progress=False)) |
timeout setting here is redundant, as it is already the default behavior:
Lines 1138 to 1139 in 986d5b7
| "you are using caching, i.e. if `cache=True`.") | ||
|
|
||
| if overwrite_cache and cache: | ||
| overwrite_cache = "update" |
There was a problem hiding this comment.
A bug?
| overwrite_cache = "update" | |
| cache = "update" |
| sub-arcsecond accuracy for coordinates. | ||
| cache : bool, optional | ||
| Determines whether to cache the results or not. | ||
| overwrite_cache : bool, optional |
There was a problem hiding this comment.
I don't understand the need to have two different keywords to do this. Why not ask user to use cache='update' directly like in get_icrs_coordinates above?
There was a problem hiding this comment.
Sigh, fine. I just hate string arguments like this so much!!
There was a problem hiding this comment.
Maybe we can slowly move to Enum - https://docs.python.org/3/library/enum.html
|
|
||
| icrs1 = get_icrs_coordinates("castor", cache=True) | ||
|
|
||
| # This is a weak test: we just check to see that a url is added to the |
There was a problem hiding this comment.
Why not have a stronger test with astropy.utils.data.get_cached_urls?
There was a problem hiding this comment.
Because I didn't know about that :) - I'll add a new check
| with shelve.open(urlmapfn) as url2hash: | ||
| assert len(url2hash) == 1 | ||
|
|
||
| # Try reloading coordinates, now should just reload cached data: |
There was a problem hiding this comment.
I wonder if you can temporarily disable internet with no_internet to really test this. 🤔
|
Question: Do we want the default for |
739f825 to
1f76352
Compare
|
Thanks @pllim - I addressed your comments. |
f5ebb48 to
000bec4
Compare
|
Note to self: rebase when #10274 is merged (assuming that fixes the failing test) |
|
@adrn - on whether to cache or not by default - my preference would be something that expires, but I don't think we have that yet! Failing that, I think |
pass through cache key test caching name resolve add an overwrite option for name resolve caching changelog hash on URL not response content actual testing of caching funcitonality simplify download using new caching machinery
000bec4 to
0e9644c
Compare
|
Thanks @mhvk - I also think that |
mhvk
left a comment
There was a problem hiding this comment.
The changelog text is out of date so I fear you'll need a ci skip update, but otherwise it all looks great.
|
@bsipocz This is not critical and could be moved to 4.2. On the other hand, I think it is ready to be merged now. I leave it up to you! |
|
Well, my goal this morning was to empty the slate of 4.1-milestoned issues and PRs, so I'm going to continue to do so and merge... (hoping that I do more good than bad - though my inner dictator is having a blast, with the part of me that stifles it distracted by the beautiful weather - on a morning like this, with the sky pure blue, without brown haze on the horizon and unspoiled by con trails, the air pure and the ears filled with bird song and wind blowing throw new, light green leaves, and the noise smelling flowers rather than traffic - I am hoping some things of the lockdown will stay...). |
|
@mhvk OK, thanks. Yes, about to head outside myself - my eyes aren't focusing at infinity as much as they need to be! BTW: that message should be archived under #githubpoetry! |
|
@adrn -- I created a |
This PR adds support for caching results from the name resolve (Sesame) functionality in astropy.coordinates (primarily used via
SkyCoord.from_name).The motivator here was that, in astropy-tutorials (and in my own scripts), we use
.from_name()to retrieve coordinates of some objects every time the build is run, but this unnecessarily hammers the Sesame servers and occasionally fails because of connection issues.Before I add a changelog entry, I figured I would open the PR to see if anyone has strong opinions about this.