Skip to content

Add format arg to io.ascii read/write and overhaul io.ascii.connect#961

Merged
taldcroft merged 13 commits into
astropy:masterfrom
taldcroft:ascii-format-arg
Jul 6, 2013
Merged

Add format arg to io.ascii read/write and overhaul io.ascii.connect#961
taldcroft merged 13 commits into
astropy:masterfrom
taldcroft:ascii-format-arg

Conversation

@taldcroft

Copy link
Copy Markdown
Member

This PR makes io.ascii.read() / write() more consistent with other parts of astropy by adding a format keyword (as a string) and deprecating the Reader and Writer keywords (which accept only an io.ascii reader classes).

# OLD
dat = ascii.read('ipac.dat', Reader=ascii.Ipac)  
ascii.write(dat, 'fixed.dat', Writer=ascii.FixedWidth)
# NEW
dat = ascii.read('ipac.dat', format='ipac')
ascii.write(dat, 'fixed.dat', format='fixed_width')
ascii.write(dat, 'out.dat', format='commented_header')

In conjunction the I/O registration of io.ascii readers/writers has been automated so that every format class automatically registers via a base metaclass and class attributes. This removed a bunch of boilerplate code in io.connect. There is a new routine that (mostly) generates the sphinx table output which summarizes the IO registry ascii formats that are available.

I'm putting this PR out before it is fully baked in order to ask for feedback on the format names for the IO registry. Going in to this I was hesitant to about putting all of the ASCII formats into the registry with their native format name. For instance, with no context this would seem odd to me:

dat = Table.read(filename, format='commented_header')

With that in mind I made every ASCII format in the registry default to being prefixed by ascii:, so it would be

dat = Table.read(filename, format='ascii:commented_header')

Of course some of the formats do make sense in a standalone fashion, like

dat = Table.read(filename, format='ipac')

This is required anyway for back-compatibility with 0.2. But now when I look at the registry table it all looks a bit messy. So this is a call for advice / opinions about how to make this tidy.

cc: @hamogu

@mdboom

mdboom commented Apr 9, 2013

Copy link
Copy Markdown
Contributor

This is the first time I've really looked at the io.ascii readers, but I can see how this is an improvement.

OT, but would it make sense to update the table readers to work in a similar way, i.e. to use classes with special attributes rather than free-floating functions?

@taldcroft

Copy link
Copy Markdown
Member Author

@mdboom - yeah, I can imagine it might clean things up to have some sort of IO connector class that encapsulates the read, write, identifier functions, handles aliases, suffixes and whatnot.

@mdboom

mdboom commented Apr 9, 2013

Copy link
Copy Markdown
Contributor

Ok -- obviously not urgent, but I open #962 for that idea.

@hamogu

hamogu commented Apr 10, 2013

Copy link
Copy Markdown
Member

This looks like a good idea.
Two thoughts to your suggestion:

  • It looks messy because there are many options. However, that's only an issue of documentation: Since the guessing mechanism works so well, the 90% use case for reading should be just the plain old ascii reader. All other formats are already special cases, thus (for the purposes of table reading), they can be hidden in a special page of the documentation. For writing, many of those formats are unsuitable (no writer implemented) or at least unusual, e.g. why write in ascii:basic? All functionality is available in plain ascii as well. Again, this can be hidden in a deeper layer of the documentation. I suggest to say "Reading tries all formats automatically, these are the most common cases for writing: ascii, aastex, fixed_width, latex, noheader and see THIS link for full listing".
  • I think you can remove the aliases that are only there for backwards compatibility from the documentation (but keep them in the code). Old code will most likely "just work" and new code should use the new mechanism anyway. Astropy.version << 1, so I think that's acceptable.

@eteq

eteq commented Apr 23, 2013

Copy link
Copy Markdown
Member

@hamogu - I'm not comfortable with "hiding" some readers. I find the basic reader works for me at best ~50% of the time. A lot of tables in the wild (at least, in my sub-fields) seem to be imporoperly formatted with spaces mixed with tabs or headers with spaces, or the like. In that case I usually need one of the specialized ones.

But you are right that the basic one is still most common - so I guess mainly what I'm suggesting is instead "Reading tries all formats automatically, these are the most common cases for writing: ascii, aastex, fixed_width, latex, noheader and see the table below (but on this page) for the full listing".

@taldcroft - one of the concerns I have with this is that it makes it harder to find the appropriate reader/writer class, and hence the relevant specific documentation. That's one thing that was nice about Reader and Writer accepting classes. I'm sure there's some way to have our cake and eat it too, though (e.g., use strings and still make it clear how to get the Reader classes.) Perhaps putting some version of the string-to-class table actually in the docstrings? (see #1011 for a slightly more specific version of this).

@hamogu

hamogu commented May 31, 2013

Copy link
Copy Markdown
Member

One disadvantage of this scheme is that is is harder to subclass a Reader or Writer and use it. Before, I just make a Reader and pass it with Reader = .... In the new scheme, I would have to register it and to do so I might even have to wrap it in a special IOConnector class (see comments above).
I don't know if anyone except me subclasses ascii.Reader classes, but I find myself doing that quite frequently.
This is not a show-stopper, but we should make sure to document how to register new ascii Readers.

@taldcroft

Copy link
Copy Markdown
Member Author

Ah, but you didn't read the fine print. :-) By use of metaclasses, whenever you inherit from BaseReader then your new class is automatically registered and connected to the Table reader. This happens at the time your derived class is defined. The only special thing you need to do is put _format_name and _format_description as class attributes so it knows how to register it.

@hamogu

hamogu commented May 31, 2013

Copy link
Copy Markdown
Member

That addresses my concern!
+1

@astrofrog

Copy link
Copy Markdown
Member

I wonder whether in the long term, we should have a BaseReader class that lives in astropy.table and auto-connects to the Table I/O, and then have astropy.io.ascii define a BaseASCIIReader class that inherits from BaseReader and adds anything needed by default by ASCII readers? Then the BaseReader is the base class for #962?

@taldcroft

Copy link
Copy Markdown
Member Author

I would go one level further and say that there should be a BaseConnector class that sits below the Table readers and the NDData readers and future Thingy readers. Then certainly each subclass can specialize as needed, but the basic registration would be in BaseConnector.

@astrofrog

Copy link
Copy Markdown
Member

@taldcroft - ah yes of course, the I/O infrastructure is also there for NDData

@taldcroft

Copy link
Copy Markdown
Member Author

Closes #385.

@taldcroft

Copy link
Copy Markdown
Member Author

@hamogu @astrofrog @eteq @mdboom

This one got stuck for a while but I think it's ready now (at least from my perspective). I did a significant rework of what was there before and now have (nominally) written the required tests and done all the doc updates. Although not strictly part of the PR to add a format arg to ascii.read/write, I ended doing a bunch of reworking the unified interface documentation since I discovered big chunks that were redundant.

Preview of docs at: http://hea-www.harvard.edu/~aldcroft/tmp/astropy-format/html/index.html

I see now there is a merge conflict again, sigh. (Third time now that I have had to rebase this branch because of conflicts).

@taldcroft

Copy link
Copy Markdown
Member Author

Rebased and fixed merge conflict in io/ascii/write.rst documentation.

@astrofrog

Copy link
Copy Markdown
Member

Looks great, thanks for working on this!

The documentation is already much improved with this, though I think we can try and do even better once we work on #962 - I think that ideally, someone using only the unified I/O shouldn't really have to go to the docstring of the e.g. Daophot class to know what arguments to pass, but we should somehow be able to centralize all this information in one place. For example, if I am a user and follow the docs for Daophot, I end up here:

http://hea-www.harvard.edu/~aldcroft/tmp/astropy-format/html/_generated/astropy.io.ascii.daophot.Daophot.html#astropy.io.ascii.daophot.Daophot

and I have no idea what options can be passed for reading (I just noticed that the read method is missing from the API). Anyway, as I mentioned, this is more something for #962.

Before this can be merged:

Failed test due to the use of {} (which doesn't work in Python 2.6) - use {0} instead. It also looks like there's a docs error:

/home/travis/build/astropy/astropy/docs/table/io.rst:16: ERROR: Unknown target name: "built-in table readers/writers".

@taldcroft

Copy link
Copy Markdown
Member Author

Yes, the online docs situation needs work, but as you said that's out of scope for this PR. I did just add explicit references to the io.ascii read and write sections that provide long descriptions of every parameter, so if anyone is reading the web docs they'll have a reasonably clear idea of what's available:

http://hea-www.harvard.edu/~aldcroft/tmp/astropy-format/html/io/unified.html#ascii-formats

@taldcroft

Copy link
Copy Markdown
Member Author

OK, travis is passing now.

@astrofrog

Copy link
Copy Markdown
Member

Ok - this looks good to me! Feel free to merge (unless you want to wait for someone else to review too)

taldcroft added a commit that referenced this pull request Jul 6, 2013
Add format arg to io.ascii read/write and overhaul io.ascii.connect
@taldcroft taldcroft merged commit b308445 into astropy:master Jul 6, 2013
taldcroft added a commit that referenced this pull request Jul 7, 2013
Update for PR #1171 and #961.
@eteq

eteq commented Jul 8, 2013

Copy link
Copy Markdown
Member

@taldcroft - this looks great. I agree with @astrofrog's comments about clearing up the online and interactive docs, but that is indeed a separate issue.

FYI, I noticed a whitespace typo in the docs that was mucking up the format of the formats table, which I fixed directly in 374883d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants