Skip to content

Support table metadata in io.ascii#683

Closed
taldcroft wants to merge 2 commits into
astropy:masterfrom
taldcroft:ascii/asciitable-format
Closed

Support table metadata in io.ascii#683
taldcroft wants to merge 2 commits into
astropy:masterfrom
taldcroft:ascii/asciitable-format

Conversation

@taldcroft

Copy link
Copy Markdown
Member

This is an initial feature request to more fully support table metadata in io.ascii. This was discussed in #664 and #659.

This issue is meant to address just reading and writing metadata into an ASCII-formatted table.

There is the related issue of providing a framework for manipulating metadata beyond the current concept of a simple ordered dict. This has a much larger scope and potentially touches nddata, io.fits and others.

cc: @kbarbary

@eteq

eteq commented Jan 29, 2013

Copy link
Copy Markdown
Member

yep, that would be great. But I think its important that we keep at least the NDData and Table meta objects in sync in terms of capabilities. io.fits would be nice, but that's probably a lot more work.

@taldcroft

Copy link
Copy Markdown
Member Author

Attached is an initial effort at an ASCII file format that fully preserves a Table object. The new io.ascii reader class is called AsciiTable, which hopefully isn't too confusing or generic (or maybe it is?).

For a quick look at this in action see: http://nbviewer.ipython.org/4742168/

cc: @astrofrog @eteq @iguananaut @mdboom

@eteq

eteq commented Feb 10, 2013

Copy link
Copy Markdown
Member

Are you thinking this should be the default ascii output? That is, this comes out when I do ``tab.write(format='ascii')? I can see the virtue of the default having all the metadata, but it might be good to have a 'plainascii' format or something like that which preserves the current (simpler) output. I've found that useful for tables where the person I'm sending it to might not know what to do with even a JSON-like header.

It would also be nice to have the units in the column headers if such a plain option continues. E.g.,

# time len
# seconds meters
1.3 2.1
2.1 3.4

That might be best treated as a separate PR, but that's contingent on whether or not the 'plain' version remains.

@taldcroft

Copy link
Copy Markdown
Member Author

@eteq - I was not planning to have this new format be the default output just because it has so much cruft (if you don't care about fully storing a table). I think most people (both of us included) will prefer the simple output as the default and be able to choose format='asciitable' to get the full JSON version.

Your second suggestion is definitely worth discussion, in a separate PR as you say.

@astrofrog

Copy link
Copy Markdown
Member

Just a thought - why not simply define a format json and store the whole table in JSON, which would be a format guaranteed to round-trip? This looks great, but my only minor comment is that asciitable is quite a generic name for this format.

@taldcroft

Copy link
Copy Markdown
Member Author

@astrofrog - The point is to stay within the confines of the "ASCII table" format, which is (for these purposes) a row-oriented, character-delimited, human-readable table. The big benefit of writing a table out in this way is that it will still be readable by every other CSV-style reader, like np.genfromtxt, IDL readcol, etc. The value-added with AsciiTable is that if you read it back with io.ascii then it round-trips.

There is certainly nothing wrong with making a JSON reader/writer as well, but that would live in its own package separate from io.ascii and basically analogous to votable.

I struggled with the name and I'm open for suggestions. I was thinking about the Ascii representation of a Table, which gets you to AsciiTable. I thought about putting Astro or Astropy in there somewhere, but there is nothing that has to be specific to astronomy or even astropy. I guess the distinguishing characteristic is that the header is stored as JSON, but I think something like JsonHeader would be completely unintuitive for most. That's sort of why I came back to the generic AsciiTable, since it's natural and to some extent we have a claim to that generic label by heritage with the asciitable package.

@astrofrog

Copy link
Copy Markdown
Member

@taldcroft - I see what you mean. By the way, could you register this format with the I/O registry?

@eteq

eteq commented Feb 11, 2013

Copy link
Copy Markdown
Member

hmm, so that means there will be both an 'ascii' format and an 'asciitable' format? I think that will be very confusing... what about AsciiWithExtraInfo or AsciiWithHeader or similar? That still gets across what distinguishes it from the plain format while not being too technical.

@kbarbary

Copy link
Copy Markdown
Member

AsciiMeta or AsciiWithMeta since the distinguishing feature is metadata?

@taldcroft

Copy link
Copy Markdown
Member Author

Thanks for the feedback, which has given me a new idea that I think is much better. Instead of having an entirely new io.ascii format class that supports metadata, how about adding a new option to the read and write funcs, perhaps called meta_format which controls what to do about available metadata. Options:

  • None : no extra header info (default)
  • 'units' : column name, units in tabular style, same as Include units in Table column descriptions #756
  • 'all' : everything in JSON, as in this PR
  • 'columns': column name, units, format, description in a tabular style

This could of course support additional or user-specified meta formats. The default could be set as a configuration item so if @eteq always wants to see units, then he can. These meta formats will get used in the reading process, thereby allowing a configurable degree of round-tripping. This also allows an easy way to create data files by hand with limited metadata already included.

The guessing process within read() would cycle through known meta formats (just like it now does for delimiter etc), so for the most part users would not need to worry about providing meta_format for reading.

Because this now uses a new option, then all the io.connect machinery works exactly as before with format='ascii', so there is no struggle to find a decent name for a new format.

I try hard to avoid new kwargs in read and write, but I think this is useful enough to warrant an exception because it brings a lot of new functionality.

One slight wart - in order to make detection of these meta formats reliable, I would want to use some sort of start / stop markers in the header. It should be easy enough that people can create it by hand but unambiguous.

Another issue - in some cases there is redundant information, for instance where the column names are encoded both in the meta data and in the table itself (e.g. as the first uncommented line in the default format). This can lead to conflicts that need to be resolved.

@eteq

eteq commented Feb 11, 2013

Copy link
Copy Markdown
Member

I like your new idea @taldcroft - a few comments on it:

I share your wish to not add new kwargs - what about doing exactly what you suggest here but using the format keyword? E.g. 'ascii' would give the default, 'asciiunits' would give the 'units' option you have, 'asciiwithheader' the 'header' format, ''asciicolumns', etc. Or is that too awkward to do with the way the format keyword currently works? It's not a disaster to have new keywords, I just think it might be confusing if e.g. someone gives a meta_header and a format that is not 'ascii'.

Can't the stop/start markers just be the beginning and ned of the first comment block or something? (Or say that if you want additional comments, you have to add a special start/stop marker?)

And I think redundant info is ok if you have the reader just fail if there's a conflict. I think that's the right thing to do rather than try to guess at how the user wants it resolved.

@taldcroft

Copy link
Copy Markdown
Member Author

Overtaken by APE6 and #2319.

@taldcroft taldcroft closed this May 2, 2014
@taldcroft taldcroft deleted the ascii/asciitable-format branch August 25, 2015 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants