Skip to content

Get a Coordinates object from an astronomical name#556

Merged
eteq merged 36 commits into
astropy:masterfrom
adrn:coordinates-ned
Jan 25, 2013
Merged

Get a Coordinates object from an astronomical name#556
eteq merged 36 commits into
astropy:masterfrom
adrn:coordinates-ned

Conversation

@adrn

@adrn adrn commented Dec 12, 2012

Copy link
Copy Markdown
Member

I threw this together in ~20 minutes, but I thought it'd be killer to have a feature that would let us get a Coordinates object from an astronomical name without any extra work. Right now, this looks like:

import astropy.coordinates as coord
m42_coords = coord.ICRSCoordinates.resolve_name("M42")
m42_gal_coords = coord.GalacticCoordinates.resolve_name("M42")

It just does a query to Sesame, and parses the returned text (see: http://cdsweb.u-strasbg.fr/doc/sesame.htx). The service lets you specify a database to search, e.g. SIMBAD, NED, Vizier, or all, so you can specify that like:

castor_coords = coord.ICRSCoordinates.resolve_name("castor", database="simbad")

but the default is to just search all.

Anyway, it's a pretty dumb-simple implementation so if anyone has ideas, let me know.

@astrofrog

Copy link
Copy Markdown
Member

This is very cool!

Now to the main comment I have: I wonder whether we should host an intermediate page that will ensure that there is no breakage if the query URL or the format of the output changes? This is an issue with any other online querying (e.g. the survey querying functionality we discussed with @demitri) at the meeting. Maybe this requires having a page on astropy.org with some javascript acting as the intermediate? Then if the Sesame page moves or changes output, we just update the javascript. Otherwise users would have to wait for the next stable release for things to work again.

@demitri

demitri commented Dec 12, 2012

Copy link
Copy Markdown
Contributor

It may be more work, but I'm still a strong advocate of hosting our own data and database. Unless you are using a solid, stable, and funded(!) API service, the only way you can be sure to serve data reliably is to do it yourself. It's easy to do, and I think web scraping should not be considered as an option. (If that's what you're doing - I've not looked.)

Also, yes, very cool idea. I'd argue there's no reason for the method name from_name:

a = coord.ICRSCoordinates("M42")
a = coord.ICRSCoordinates("12h45m23s 23.4313")

It's just another string we can parse.

@astrofrog

Copy link
Copy Markdown
Member

@demitri - sesame is not web-scraping, it is a proper API, so I think it would be safe enough to rely on them. I'm just suggesting adding a middle-man that can translate if their API ever changes, which would be sufficient, and could probably be done with a simple intermediate page. What you are suggesting would be great, but it's not something we can set up overnight.

@adrn

adrn commented Dec 12, 2012

Copy link
Copy Markdown
Member Author

@astrofrog Yea, I agree -- just wanted an initial implementation to get the idea out there :)

@demitri Not web scraping, it just provides a GET interface to querying these databases and returns a structured response. Actually I see now that one of the options is XML, which may be better than what this currently does (regex matching)

@demitri

demitri commented Dec 12, 2012

Copy link
Copy Markdown
Contributor

OK, that's better. I do worry about depending on others, and I'm worried that a (particularly javascript) middleware would be a bottleneck.

And no, such a database can't be set up overnight. But pretty close to it - it's a very simple thing. For things like common names (even many thousands of them), the external service is not really proving any added value.

@astrofrog

Copy link
Copy Markdown
Member

Just a thought - we could always have the middleware as a fallback, then no performance issues by default.

@demitri

demitri commented Dec 12, 2012

Copy link
Copy Markdown
Contributor

I'd feel much better about that.

@perwin

perwin commented Dec 12, 2012

Copy link
Copy Markdown

This is a really nice demonstration!

As for middleware -- fallback sounds like a good idea, given that you don't know when someone is going to decide to run a search on several thousand objects at once -- how much bandwidth and traffic is this hypothetical middleware server going to handle?

@demitri -- by querying via Sesame, you're not running queries on "thousands of names", you're gaining access to tens of millions of names. (Well, NED claims to have ~ 190 million unique objects, so "tens of millions" is an underestimate.) I think it's a little overly ambitious to try duplicating all of that functionality.

@demitri

demitri commented Dec 12, 2012

Copy link
Copy Markdown
Contributor

@perwin Thousands of names, a few tens of millions... with today's databases and hardware, there's hardly a difference. Can Sesame handle queries of several thousand objects at once? If we point to many different databases, how do we communicate the different limitations of each? At the moment, we're only talking about name translation. I don't know all of what Sesame does and certainly don't want to replicate all that it does, but I think there are low-hanging fruit.

@taldcroft

Copy link
Copy Markdown
Member

@demitri - I think you are vastly underestimating the effort required to maintain a name server like what is provided through Sesame. It's not just a one-time ingest, the names and data are continually being updated as well.

@adrn

adrn commented Dec 12, 2012

Copy link
Copy Markdown
Member Author

@demitri @taldcroft Yea, I agree -- that is extremely ambitious and seems unnecessary.

I'll see if I can massage this in to the string parsing stuff (get rid of the classmethod), but at first glance it seems like it might be kind of kludgy to get right. Will look in to it more tonight..

@astrofrog

Copy link
Copy Markdown
Member

I feel like a = coord.ICRSCoordinates("M42") may be slightly too magical, especially since there may be cases where source names start to look a lot more like coordinates. So I'm ok with the current class method approach - though maybe we could call it resolve_name instead of from_name?

@demitri

demitri commented Dec 12, 2012

Copy link
Copy Markdown
Contributor

a = coord.ICRSCoordinates(name="M42")

@adrn

adrn commented Dec 12, 2012

Copy link
Copy Markdown
Member Author

I'm fine with either WhateverCoordinates.resolve_name("m42") or WhateverCoordinates(name="M42") -- anyone else have an opinion?

@astrofrog

Copy link
Copy Markdown
Member

Using a separate method allows additional arguments without interfering with the main __init__ ones. I would personally go with the class method for now.

@adrn

adrn commented Dec 12, 2012

Copy link
Copy Markdown
Member Author

Ah, right, that's why I made it a classmethod -- so the user could specify a search database.

@wkerzendorf

Copy link
Copy Markdown
Member

That looks really cool - however I would separate this from coordinates. Sesame is not only linked to coordinates, but other datatypes like object type, reference count, .... . How about the other way round: you resolve an astronomical-object and you get back a python-object with one of the properties a coordinate object (which can then be converted to anything). Maybe it can live in vo.

@astrofrog

Copy link
Copy Markdown
Member

@wkerzendorf - I agree that the core sesame functionality can live in e.g. utils or vo, but on the other hand there's no reason there can't be a shortcut in the coordinates sub-package.

@adrn

adrn commented Dec 12, 2012

Copy link
Copy Markdown
Member Author

Yea I agree this should probably go elsewhere, but the API doesn't have to change if we move it somewhere else.

@embray

embray commented Dec 12, 2012

Copy link
Copy Markdown
Member

I'll just add: I really like the existing classmethod approach. But I'm not the intended audience :)

@taldcroft

Copy link
Copy Markdown
Member

+1 on class method, -1 on name in the coordinates __init__. There will very likely need to be additional args which will clutter up the coordinates __init__. @adrn - this is a nice bit of shiny-ness!

@mwcraig

mwcraig commented Dec 13, 2012

Copy link
Copy Markdown
Member

One suggestion--wrap your tests in a check to see whether sesame is up. I wasted almost an hour a couple weeks ago when a similar lookup I wrote had tests that suddenly started to fail because sesame wasn't responding to requests.

@adrn

adrn commented Dec 13, 2012

Copy link
Copy Markdown
Member Author

@mwcraig Good idea! Thanks for the tip

@embray

embray commented Dec 13, 2012

Copy link
Copy Markdown
Member

If Sesame is down, would the idea be to just mark the tests as skipped?
I forget, but when py.test skips a test is there a way to output a message as to why it was skipped? I would want to know that too. Or at least output a warning.

@adrn

adrn commented Dec 14, 2012

Copy link
Copy Markdown
Member Author

I think you can control an output message, for example see here: http://pytest.org/latest/skipping.html#evaluation-of-skipif-xfail-conditions

I think I just have to add something like this to the test:

if urllib.urlopen("http://cdsweb.u-strasbg.fr/cgi-bin/nph-sesame").getcode() != 200:
    pytest.skip("SESAME appears to be down, skipping test_database_specify.py:test_names()...")

I've added that to the tests, and pushed it up.

@embray

embray commented Dec 14, 2012

Copy link
Copy Markdown
Member

Great--I think that's important for something like this, were I could hypothetically run the tests multiple times in a row without changing a thing and see different numbers of tests being skipped. I would want to know why so that I don't go crazy.

@adrn

adrn commented Dec 14, 2012

Copy link
Copy Markdown
Member Author

Huh. The build seems to be failing for Python 3.2, but not where I expected. I didn't realize that urllib2 got merged into urllib in 3.2, so why isn't the build failing at the import urllib2 line?

@embray

embray commented Dec 14, 2012

Copy link
Copy Markdown
Member

I'm pretty sure 2to3 will convert urllib2 -> urllib.

@embray

embray commented Dec 14, 2012

Copy link
Copy Markdown
Member

The error you're getting is because reading from a web site returns bytes by default--you have to decode them before passing them through a regular expression meant for text.

@adrn

adrn commented Dec 15, 2012

Copy link
Copy Markdown
Member Author

Yea, I see that, this must be a 3.0 thing?

eteq added a commit that referenced this pull request Jan 25, 2013
modification from eteq/coordinates-ned)
@eteq eteq merged commit b5616d7 into astropy:master Jan 25, 2013
@eteq

eteq commented Jan 25, 2013

Copy link
Copy Markdown
Member

The tests are passing except for one that hung, so I went ahead and merged this (with some minor modifications blessed by @adrn). backport when ready, @iguananaut!

@eteq

eteq commented Jan 25, 2013

Copy link
Copy Markdown
Member

Oh, and I reassigned this to v0.2 so that your script will see it, @iguananaut

eteq added a commit that referenced this pull request Jan 25, 2013
modification from eteq/coordinates-ned)
keflavich pushed a commit to keflavich/astropy that referenced this pull request Oct 9, 2013
jeffjennings pushed a commit to jeffjennings/astropy that referenced this pull request Jul 2, 2025
Remove affiliated package decisions from Coco
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants