Implement support for the ECSV format proposed in APE6#2319
Conversation
da0ec3d to
35097fc
Compare
|
DTIF is Not dead yet... see http://nbviewer.ipython.org/gist/taldcroft/a13b670ab15db5684f49 This iteration of the DTIF reader/writer now uses YAML and is simplified from the original APE6 idea. Some points:
|
|
👍 to the format! Definite improvement for sending some small table to collaborators. One small item: would it be possible to ensure that in the output file, the column name always is the first entry? Since it is an unordered dict, I would guess it does not matter for reading, but for human viewing it is good. Indeed, a fixed order for output is probably best, say EDIT: well, probably |
|
Maintaining the ordering would probably require using Possibly there is a clever way to compactify this, but I'm not sure. Note that in the YAML output the keys are in alphabetical (not random) order, so for most use cases name will be first. (When there is no format or description). |
|
That output looks substantially less nice... But is there a requirement to have the items be in alphabetical order? I.e., could one just postprocess the column lines and put name first? |
OK, I figured out a clean way to do this. As for the question of order, I think I prefer your original of having |
|
Just a quick comment - what is a way to unambiguously identify a file as DTIF? I'm thinking maybe we could consider using the first line as a file format signature, optionally with a format version? The nice thing about e.g. HDF5 is that if you read the first 8 bytes, you know it's an HDF5 file. So having a format signature would be nice. |
|
I'm thinking something like: |
|
Just another comment - if we go ahead with this, I think we should straight away provide |
|
Another request - I think DTIF should be very clear on how to mask values and the output in the file should preferably be e.g. |
|
@astrofrog: YAML has a standard for specifying the file type, which is a line starting with |
|
@mdboom - perfect! I'd highly recommend doing this. |
|
@mdboom - when I put in the Do I need to register this or something? I couldn't find anything in a quick scan of the pyyaml docs, but maybe I didn't look hard enough. |
|
OK, got the ordering fixed in the last commit. |
|
@taldcroft: It seems these metadata lines only work if you have a "document start marker" ( |
|
As suggested, I have added a DTIF header line and check for its presence manually, then strip it before YAML parsing. @mdboom - now that this YAML, what do you think should be done to make DTIF most closely integrate with ASDF? One idea was to make it very easy to drop a DTIF file in as a support data block format. In the current ASDF-standard docs I don't see anything defining how data column meta (type, unit, format, etc) are going to be encoded. DTIF does kind of the simplest possible thing, so do you think that will be a legal subset of what ASDF defines? Plan B is to purposely keep DTIF as a simple and somewhat specialized "standard" that doesn't necessarily follow the ASDF conventions? It still should be straightforward to write a DTIF encoder/decoder outside of the Python reference implementation (io.ascii.dtif). |
|
The notebook has been updated accordingly: http://nbviewer.ipython.org/gist/taldcroft/a13b670ab15db5684f49 |
|
BTW, what about rebranding DTIF as ASCI Table with Meta (ATM)? Maybe "Data Table Interchange Format" overstates the scope of what this really is. |
|
👍 from me on this, with a rebranding like you suggested, @taldcroft. On the rebranding: it's actually not necessarily limited to ASCII, right? That is, unicode is also possible for column names? So maybe instead "Text Table with Meta" (TTM)? That also has the advantage of being a less overloaded acronym, while still being 3 characters so it looks good as a file extension. |
|
Unicode is not possible for column names because numpy doesn't accept them. The fact that astropy Table accepts unicode column names is trickery on our part. They are encoded to ascii. |
|
@eteq - on further reflection you are completely right that the format |
|
+1 to ECSV :) |
|
👍 to ECSV from me too. (And it looks like either ".esv" or ".ecv" are currently unused extensions.) |
…ters The change to default_converters is to make it an empty list so that no guessing of data type is ever allowed.
|
APE6 has been accepted, so I'm merging this. Thanks @taldcroft ! |
Implement support for the ECSV format proposed in APE6
|
Thanks @taldcroft! 🎉 |
|
Long live the meta! |
APE6 proposes a new standard Data-table Text Interchange Format for storing data tables in a text-only format. This PR provides a demonstration implementation of that in
astropy.io.ascii. This is by no means complete and should not be merged.