Skip to content

Connect astropy.io.ascii to astropy.table#359

Merged
astrofrog merged 11 commits into
astropy:masterfrom
astrofrog:io/connect-ascii-table
Aug 29, 2012
Merged

Connect astropy.io.ascii to astropy.table#359
astrofrog merged 11 commits into
astropy:masterfrom
astrofrog:io/connect-ascii-table

Conversation

@astrofrog

Copy link
Copy Markdown
Member

Since astropy.io.ascii works natively with Table objects, it is straightforward to connect it to the I/O routines in astropy.table. This means it's easy to do:

t1 = Table.read('daophot.dat', format='daophot')
t2 = Table.read('cds.dat', format='cds')
t3 = Table.read('latex1.tex')
t4 = Table.read('short.rdb')
t5 = Table.read('ipac.dat', format='ipac')

etc.

@taldcroft, if you would be happy for this to go ahead, I can add docs and tests.

@astrofrog

Copy link
Copy Markdown
Member Author

Actually, at the moment I have to add:

from astropy.io.ascii import connect

at the top of my script to force the registrations to happen. I need to think about how to avoid that. If you have suggestions, let me know. Maybe connect will have to be imported from astropy.table.

@ghost ghost assigned taldcroft Aug 24, 2012
@astrofrog

Copy link
Copy Markdown
Member Author

I added an import to astropy.io.ascii.connect from astropy.table, which seems to do the trick.

@taldcroft

Copy link
Copy Markdown
Member

@astrofrog - looks great, please do continue with tests and docs!

@embray

embray commented Aug 24, 2012

Copy link
Copy Markdown
Member

For that matter, though I haven't played with asciitables/astropy.io.ascii, could it be hooked up to read/process FITS ASCII tables? Once io.fits starts moving over to using the table class, using io.ascii for ASCII tables could also enable killing some duplicate logic, maybe. Just a thought, though one I haven't explored the details of yet.

@astrofrog

Copy link
Copy Markdown
Member Author

@taldcroft - I've added tests and docs. A couple of points/questions:

  • Is there a better way to refer to the filenames in test_connect.py?
  • I'm not sure how to clarify the docs so that users see the difference between using astropy.io.ascii directly, and using it via astropy.table. Ultimately, I find the astropy.table API simpler for users, though they will need to look into astropy.io.ascii if they want to add their own format. I guess at the moment it's not obvious where a user should go if they want to read e.g. an IPAC table. Should they read astropy.io.ascii, or astropy.table first? (my opinion is the second, but how do we make that more obvious?). Once they have a table object and want to write to e.g. RDB, do they do write(table, ...) or Table.write(...)? The same question will arise when we do that for FITS and VO. There's a bit of duplication.
  • There's also some duplication in terms of the 'guessing', i.e. both astropy.io.ascii and astropy.table have their own frameworks.

Anything else that you think is missing from this PR?

Comment thread astropy/io/ascii/connect.py Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a requirement for IPAC table files? From http://irsa.ipac.caltech.edu/applications/DDGEN/Doc/ipac_tbl.html it seems that any extension is considered valid. The .tbl extension could easily be used for other formats. Will these be issues if the is_ipac identifier is registered?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPAC table files often use .tbl, but I agree it's a bit generic. I can remove it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@taldcroft

Copy link
Copy Markdown
Member
  • Is there a better way to refer to the filenames in test_connect.py?

I can't think of anything better at the moment.

  • I'm not sure how to clarify the docs so that users see the difference between using astropy.io.ascii directly, and using it via astropy.table. Ultimately, I find the astropy.table API simpler for users, though they will need to look into astropy.io.ascii if they want to add their own format. I guess at the moment it's not obvious where a user should go if they want to read e.g. an IPAC table. Should they read astropy.io.ascii, or astropy.table first? (my opinion is the second, but how do we make that more obvious?). Once they have a table object and want to write to e.g. RDB, do they do write(table, ...) or Table.write(...)? The same question will arise when we do that for FITS and VO. There's a bit of duplication.

Ultimately both methods are going to be useful, and users might have their own preferences in any case. It depends to some extent on whether users have simple cases where all the defaults work (use Table) or more complicated cases where you really need to read the io.ascii docs (at which point using io.ascii.read might feel more natural). Or for example with FITS, using Table.read is fine if you have just one extension that you want to read, but in other cases having the full set of HDUs is the right answer. So I would just cross reference from each side and maybe include something like the discussion above.

  • There's also some duplication in terms of the 'guessing', i.e. both astropy.io.ascii and astropy.table have their own frameworks.

I'm not sure I follow. astropy.table does not have any sort of guessing framework and it doesn't do any splitting of string inputs. If you are talking about Table.read() then that doesn't guess (as I understand) it just requires format or stops at the first identified format based on filename. astropy.io.ascii does indeed have a somewhat complicated guessing mechanism based on the file contents.

Anything else that you think is missing from this PR?

Looks good to me.

@taldcroft

Copy link
Copy Markdown
Member

@iguananaut - I'm not sure astropy.io.ascii will be a good tool for ASCII FITS. It has a lot of machinery to allow for flexibility in dealing with a wide variety of formats. That means it is relatively slow. For ASCII FITS tables the data format is very straightforward so you can optimize the parsing a lot.

@astrofrog

Copy link
Copy Markdown
Member Author

@taldcroft - regarding the 'duplication' in astropy.io.ascii and astropy.table, I do agree we need both, I guess I wonder how we can improve the docs to make it clearer to users. But in any case, it's not something that has to be solved in this release - the utility of Table.read and Table.write will be greatest once it also reads VO and FITS tables.

Regarding the guessing, what I meant was that astropy.io.ascii could tecnically register its complicated guessers with astropy.table so that if there are multiple options, the astropy.table exception is shown. But this would only make sense if we got rid of the astropy.io.ascii read/write functions, which is not a good idea at this time.

@astrofrog

Copy link
Copy Markdown
Member Author

I think I've addressed all your comments above - thanks for reviewing this!

@eteq

eteq commented Aug 28, 2012

Copy link
Copy Markdown
Member

I haven't looked too closely at this, but will the write part of this address #330, or is it using the same io.ascii machinery that doesn't use the Table.format attribute?

@astrofrog

Copy link
Copy Markdown
Member Author

@eteq - I don't think it will, as Table.write just calls the write function from astropy.io.ascii.

@eteq

eteq commented Aug 28, 2012

Copy link
Copy Markdown
Member

Ok, that's what I figured - just wanted to check.

@astrofrog

Copy link
Copy Markdown
Member Author

@taldcroft - do you think this can merged as is, or should we clarify the docs further?

Comment thread docs/table/io.rst Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Ipac class does not support writing, so this example won't work. Better to use Rdb perhaps.

(BTW I made this comment earlier but maybe forgot to actually hit the "Comment on this line" button. Sorry.)

@taldcroft

Copy link
Copy Markdown
Member

After the one final comment above I think the docs are fine.

On the general philosophy of table = package.read(...) vs. table = Table.read(...), one factor in favor of the package.read() side is in API documentation and accessibility. The Table.read/write docstring is necessarily generic and so for anything requiring some twiddling the user needs to go to the package docs. At that point the user needs to understand that some arguments in package.read() are being provided through the registered Table reader function.

For instance in io.ascii the file name and Reader class are being provided. That sort of mental API translation (drop the first arg and the Reader kwarg) automatically occurs for us but might not be completely transparent to users. This isn't a huge issue or show stopper, but just to raise the point that a user can enter astropy.io.ascii.read? and see the exact API that is available. Maybe the Table.read/write functions could somehow support getting the backend API, perhaps through a help=True arg or something.

@astrofrog

Copy link
Copy Markdown
Member Author

@taldcroft - I agree, and I just want to add that I think one of the things about Table is that users can also register their own readers/writers, so in those cases it can be advantageous to use astropy.table. In addition, for most simple cases, users don't actually need to know the astropy.io.ascii API, e.g:

t = Table('2mass.tbl', format='ipac')

Anyway, do you think this is something we should clarify now, or should we just open an issue to remind us to improve the docs?

@astrofrog

Copy link
Copy Markdown
Member Author

By the way, I fixed the example in the docs.

@astrofrog

Copy link
Copy Markdown
Member Author

Just to add to what I said above - I guess that the high-level API that is useful is:

t = Table('file.tbl', format='ipac')
t = Table('table.tex', format='tex')
t = Table('2mass.rdb')

because it saves a lot of typing and the user doesn't need to know about reader classes. On the other hand:

t = Table('file.txt', format='ascii')

is less useful unless it guesses the format, because then it would be just as easy to use astropy.io.ascii. So maybe the key is to not register the ascii format, or to do it but only mention it in an 'advanced' section in the docs, so that most users don't need to use it?

@taldcroft

Copy link
Copy Markdown
Member

@astrofrog - I think it's good to go. If you are ready then go ahead and merge.

@taldcroft

Copy link
Copy Markdown
Member

In the example t = Table.read('file.txt', format='ascii') it will try to guess the format, so I think that it useful. I would not take out ascii since that is the one I would choose most often!

@astrofrog

Copy link
Copy Markdown
Member Author

Ok - but maybe in the docs (not necessarily in this PR) we could give a little more details about what to do when things go wrong when selecting ascii. I'll merge this for now, and will work on the docs to try and clarify all this!

astrofrog added a commit that referenced this pull request Aug 29, 2012
Connect astropy.io.ascii to astropy.table
@astrofrog astrofrog merged commit 607e938 into astropy:master Aug 29, 2012
keflavich pushed a commit to keflavich/astropy that referenced this pull request Oct 9, 2013
Connect astropy.io.ascii to astropy.table
@astrofrog astrofrog deleted the io/connect-ascii-table branch July 5, 2016 18:46
jeffjennings pushed a commit to jeffjennings/astropy that referenced this pull request Jul 2, 2025
Add subsection to Contribute page about justifying effort
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants