Get a Coordinates object from an astronomical name#556
Conversation
|
This is very cool! Now to the main comment I have: I wonder whether we should host an intermediate page that will ensure that there is no breakage if the query URL or the format of the output changes? This is an issue with any other online querying (e.g. the survey querying functionality we discussed with @demitri) at the meeting. Maybe this requires having a page on astropy.org with some javascript acting as the intermediate? Then if the Sesame page moves or changes output, we just update the javascript. Otherwise users would have to wait for the next stable release for things to work again. |
|
It may be more work, but I'm still a strong advocate of hosting our own data and database. Unless you are using a solid, stable, and funded(!) API service, the only way you can be sure to serve data reliably is to do it yourself. It's easy to do, and I think web scraping should not be considered as an option. (If that's what you're doing - I've not looked.) Also, yes, very cool idea. I'd argue there's no reason for the method name a = coord.ICRSCoordinates("M42") It's just another string we can parse. |
|
@demitri - sesame is not web-scraping, it is a proper API, so I think it would be safe enough to rely on them. I'm just suggesting adding a middle-man that can translate if their API ever changes, which would be sufficient, and could probably be done with a simple intermediate page. What you are suggesting would be great, but it's not something we can set up overnight. |
|
@astrofrog Yea, I agree -- just wanted an initial implementation to get the idea out there :) @demitri Not web scraping, it just provides a GET interface to querying these databases and returns a structured response. Actually I see now that one of the options is XML, which may be better than what this currently does (regex matching) |
|
OK, that's better. I do worry about depending on others, and I'm worried that a (particularly javascript) middleware would be a bottleneck. And no, such a database can't be set up overnight. But pretty close to it - it's a very simple thing. For things like common names (even many thousands of them), the external service is not really proving any added value. |
|
Just a thought - we could always have the middleware as a fallback, then no performance issues by default. |
|
I'd feel much better about that. |
|
This is a really nice demonstration! As for middleware -- fallback sounds like a good idea, given that you don't know when someone is going to decide to run a search on several thousand objects at once -- how much bandwidth and traffic is this hypothetical middleware server going to handle? @demitri -- by querying via Sesame, you're not running queries on "thousands of names", you're gaining access to tens of millions of names. (Well, NED claims to have ~ 190 million unique objects, so "tens of millions" is an underestimate.) I think it's a little overly ambitious to try duplicating all of that functionality. |
|
@perwin Thousands of names, a few tens of millions... with today's databases and hardware, there's hardly a difference. Can Sesame handle queries of several thousand objects at once? If we point to many different databases, how do we communicate the different limitations of each? At the moment, we're only talking about name translation. I don't know all of what Sesame does and certainly don't want to replicate all that it does, but I think there are low-hanging fruit. |
|
@demitri - I think you are vastly underestimating the effort required to maintain a name server like what is provided through Sesame. It's not just a one-time ingest, the names and data are continually being updated as well. |
|
@demitri @taldcroft Yea, I agree -- that is extremely ambitious and seems unnecessary. I'll see if I can massage this in to the string parsing stuff (get rid of the classmethod), but at first glance it seems like it might be kind of kludgy to get right. Will look in to it more tonight.. |
|
I feel like |
|
|
|
I'm fine with either |
|
Using a separate method allows additional arguments without interfering with the main |
|
Ah, right, that's why I made it a classmethod -- so the user could specify a search database. |
|
That looks really cool - however I would separate this from |
|
@wkerzendorf - I agree that the core sesame functionality can live in e.g. |
|
Yea I agree this should probably go elsewhere, but the API doesn't have to change if we move it somewhere else. |
|
I'll just add: I really like the existing classmethod approach. But I'm not the intended audience :) |
|
+1 on class method, -1 on |
|
One suggestion--wrap your tests in a check to see whether sesame is up. I wasted almost an hour a couple weeks ago when a similar lookup I wrote had tests that suddenly started to fail because sesame wasn't responding to requests. |
|
@mwcraig Good idea! Thanks for the tip |
|
If Sesame is down, would the idea be to just mark the tests as skipped? |
|
I think you can control an output message, for example see here: http://pytest.org/latest/skipping.html#evaluation-of-skipif-xfail-conditions I think I just have to add something like this to the test: I've added that to the tests, and pushed it up. |
|
Great--I think that's important for something like this, were I could hypothetically run the tests multiple times in a row without changing a thing and see different numbers of tests being skipped. I would want to know why so that I don't go crazy. |
|
Huh. The build seems to be failing for Python 3.2, but not where I expected. I didn't realize that |
|
I'm pretty sure 2to3 will convert urllib2 -> urllib. |
|
The error you're getting is because reading from a web site returns bytes by default--you have to decode them before passing them through a regular expression meant for text. |
|
Yea, I see that, this must be a 3.0 thing? |
modification from eteq/coordinates-ned)
|
The tests are passing except for one that hung, so I went ahead and merged this (with some minor modifications blessed by @adrn). backport when ready, @iguananaut! |
|
Oh, and I reassigned this to v0.2 so that your script will see it, @iguananaut |
modification from eteq/coordinates-ned)
modification from eteq/coordinates-ned)
Remove affiliated package decisions from Coco
I threw this together in ~20 minutes, but I thought it'd be killer to have a feature that would let us get a Coordinates object from an astronomical name without any extra work. Right now, this looks like:
It just does a query to
Sesame, and parses the returned text (see: http://cdsweb.u-strasbg.fr/doc/sesame.htx). The service lets you specify a database to search, e.g. SIMBAD, NED, Vizier, or all, so you can specify that like:but the default is to just search all.
Anyway, it's a pretty dumb-simple implementation so if anyone has ideas, let me know.