Implement support for the ECSV format proposed in APE6 by taldcroft · Pull Request #2319 · astropy/astropy

taldcroft · 2014-04-13T11:28:28Z

APE6 proposes a new standard Data-table Text Interchange Format for storing data tables in a text-only format. This PR provides a demonstration implementation of that in astropy.io.ascii. This is by no means complete and should not be merged.

taldcroft · 2014-09-05T01:33:38Z

DTIF is Not dead yet... see http://nbviewer.ipython.org/gist/taldcroft/a13b670ab15db5684f49

This iteration of the DTIF reader/writer now uses YAML and is simplified from the original APE6 idea. Some points:

This is all still proof of concept.
For simple tables without much meta, the header definition is quite simple and is reasonably close to what @eteq requested in the venerable Include units in Table column descriptions #756.
Not to state the obvious, but PyYaml is a dependency for this.
It uses custom Loader and Dumper classes that handle OrderedDict nicely by using the !!omap tag. I got this from a gist via http://pyyaml.org/ticket/29. This should also (I think) allow using a safe loader.
One decision would be whether to keep this as a new format or make this extra header be option argument to the io.ascii read/write routines.
The hope is that by going to YAML it should be easier to integrate an ASCII table as a data block in ASDF (@mdboom, @embray).

mhvk · 2014-09-05T13:09:05Z

👍 to the format! Definite improvement for sending some small table to collaborators.

One small item: would it be possible to ensure that in the output file, the column name always is the first entry? Since it is an unordered dict, I would guess it does not matter for reading, but for human viewing it is good. Indeed, a fixed order for output is probably best, say name, unit, type, format, anything else.

EDIT: well, probably format before type, since it often implies it anyway.

taldcroft · 2014-09-05T14:34:09Z

Maintaining the ordering would probably require using !!omap for the column attributes. By default this would serialize to something like:

columns:
- !!omap
  - {name: a}
  - {unit: m / s}
  - {type: float32}
- !!omap
  - {name: b}
  - {unit: km}
  - {type: float64}

Possibly there is a clever way to compactify this, but I'm not sure. Note that in the YAML output the keys are in alphabetical (not random) order, so for most use cases name will be first. (When there is no format or description).

mhvk · 2014-09-05T14:49:12Z

That output looks substantially less nice... But is there a requirement to have the items be in alphabetical order? I.e., could one just postprocess the column lines and put name first?

taldcroft · 2014-09-05T15:31:18Z

Indeed, a fixed order for output is probably best, say name, unit, type, format, anything else.

OK, I figured out a clean way to do this. As for the question of order, I think I prefer your original of having type before format. First, type will always be there, while format is somewhat rare, so the ordering will be more consistent. Also, I think of type being a more fundamental property, so it should be higher priority (more to the left).

astrofrog · 2014-09-05T15:34:25Z

Just a quick comment - what is a way to unambiguously identify a file as DTIF? I'm thinking maybe we could consider using the first line as a file format signature, optionally with a format version? The nice thing about e.g. HDF5 is that if you read the first 8 bytes, you know it's an HDF5 file. So having a format signature would be nice.

astrofrog · 2014-09-05T15:35:52Z

I'm thinking something like:

# format: DTIF1
# columns:
# - {name: a, type: float32, unit: m / s}
# - {name: b, type: uint8}
a b
1.0 2

astrofrog · 2014-09-05T15:39:07Z

Just another comment - if we go ahead with this, I think we should straight away provide dtiflint, a command-line tool to validate DTIF tables, to make sure that anyone else writing custom writers can test it straight away. Note that a linter can be stricter than the reader.

astrofrog · 2014-09-05T15:41:31Z

Another request - I think DTIF should be very clear on how to mask values and the output in the file should preferably be e.g. - rather than the usual paradigm of 'fill' values and null values in the header.

mdboom · 2014-09-05T15:41:48Z

@astrofrog: YAML has a standard for specifying the file type, which is a line starting with %. So in this case:

# %DTIF-1.0

astrofrog · 2014-09-05T15:48:39Z

@mdboom - perfect! I'd highly recommend doing this.

taldcroft · 2014-09-05T16:54:02Z

@mdboom - when I put in the %DTIF-1.0 at the front it gave:

ScannerError: while scanning for the next token
found character '%' that cannot start any token
  in "<string>", line 1, column 2:
     %DTIF-1.0
     ^

Do I need to register this or something? I couldn't find anything in a quick scan of the pyyaml docs, but maybe I didn't look hard enough.

taldcroft · 2014-09-05T16:54:59Z

OK, got the ordering fixed in the last commit.

mdboom · 2014-09-05T17:11:01Z

@taldcroft: It seems these metadata lines only work if you have a "document start marker" (---) following them. I don't know if you want to require that for such a simple format. This may just have to strip off that first line before passing to pyyaml instead. Ideally, it should be something that can be tested without doing a full YAML parse anyway, so that's not necessarily a bad thing.

taldcroft · 2014-09-05T18:04:46Z

As suggested, I have added a DTIF header line and check for its presence manually, then strip it before YAML parsing.

@mdboom - now that this YAML, what do you think should be done to make DTIF most closely integrate with ASDF? One idea was to make it very easy to drop a DTIF file in as a support data block format. In the current ASDF-standard docs I don't see anything defining how data column meta (type, unit, format, etc) are going to be encoded. DTIF does kind of the simplest possible thing, so do you think that will be a legal subset of what ASDF defines?

Plan B is to purposely keep DTIF as a simple and somewhat specialized "standard" that doesn't necessarily follow the ASDF conventions? It still should be straightforward to write a DTIF encoder/decoder outside of the Python reference implementation (io.ascii.dtif).

taldcroft · 2014-09-05T18:05:09Z

The notebook has been updated accordingly: http://nbviewer.ipython.org/gist/taldcroft/a13b670ab15db5684f49

taldcroft · 2014-09-05T21:54:52Z

BTW, what about rebranding DTIF as ASCI Table with Meta (ATM)? Maybe "Data Table Interchange Format" overstates the scope of what this really is.

eteq · 2014-09-11T21:05:54Z

👍 from me on this, with a rebranding like you suggested, @taldcroft.

On the rebranding: it's actually not necessarily limited to ASCII, right? That is, unicode is also possible for column names? So maybe instead "Text Table with Meta" (TTM)? That also has the advantage of being a less overloaded acronym, while still being 3 characters so it looks good as a file extension.

taldcroft · 2014-09-11T21:43:52Z

Unicode is not possible for column names because numpy doesn't accept them.

In [35]: np.array([(1,)], dtype=[('a', int)])
Out[35]: 
array([(1,)], 
      dtype=[('a', '<i8')])

In [36]: np.array([(1,)], dtype=[(u'a', int)])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-36-3200fb1aa3d7> in <module>()
----> 1 np.array([(1,)], dtype=[(u'a', int)])

TypeError: data type not understood

The fact that astropy Table accepts unicode column names is trickery on our part. They are encoded to ascii.

taldcroft · 2014-09-13T12:32:20Z

@eteq - on further reflection you are completely right that the format
definition shouldn't be limited by our current implementation. There is no
reason to limit the format to ascii. TTM could work but I am starting to
think that the "meta" reference might be lost on many users and not quite
sink in. Just to toss out an idea I had (on overnight flight... ) what
about ECSV for extended (or enhanced) CSV. It kinda rolls of the tongue
and brings some known context to make the concept more immediately
understandable. I also imagine proposing a PR to pandas and/or numpy to
implement read_ecsv as a way of promoting adoption.

astrofrog · 2014-09-13T14:02:05Z

+1 to ECSV :)

eteq · 2014-09-24T22:58:10Z

👍 to ECSV from me too. (And it looks like either ".esv" or ".ecv" are currently unused extensions.)

…ters The change to default_converters is to make it an empty list so that no guessing of data type is ever allowed.

- Fix issues related to default comments input/output - Change table_meta key to meta in accord with APE-6 - Improve code readability by calling "meta" variable "header"

…mpatibility

eteq · 2015-01-26T21:52:32Z

APE6 has been accepted, so I'm merging this. Thanks @taldcroft !

Implement support for the ECSV format proposed in APE6

astrofrog · 2015-01-26T22:01:15Z

Thanks @taldcroft! 🎉

taldcroft · 2015-01-26T22:52:00Z

Long live the meta!

cdeil mentioned this pull request Apr 13, 2014

APE6 - Enhanced Character Separated Values table format astropy/astropy-APEs#7

Merged

taldcroft mentioned this pull request May 2, 2014

Support table metadata in io.ascii #683

Closed

astrofrog added this to the Future milestone May 9, 2014

astrofrog added the io.ascii label May 9, 2014

astrofrog assigned taldcroft May 9, 2014

embray force-pushed the master branch from cd9dfdf to 5d7acd6 Compare August 29, 2014 21:15

taldcroft force-pushed the ascii-dtif branch from da0ec3d to 35097fc Compare September 3, 2014 22:17

taldcroft mentioned this pull request Sep 5, 2014

Added ability to show unit in fixed width tables #2869

Closed

taldcroft added 19 commits January 21, 2015 10:24

Python 3 compatibility

4b691e1

Inherit from basic reader instead of base

54688d5

Remove float128 type in testing for Windows

b192721

Put ECSV at the top of the guess list

081e2c8

Handle delimiters properly and test

a9732d3

Update CHANGES.rst

e7667a3

Add index doc writing example of ECSV

3efce29

Skip ECSV doctests because of optional pyyaml dependency

d0d8e4c

Implement comments from @mwcraig and @cdeil and change default conver…

1c87312

…ters The change to default_converters is to make it an empty list so that no guessing of data type is ever allowed.

Add PyYAML as optional dependency [skip ci]

54d15e5

Raise exception when writing table with multi-dim column

60643cf

Update doc example for python 3 compatibility [skip ci]

c3a066a

Various:

e7f0c8a

- Fix issues related to default comments input/output - Change table_meta key to meta in accord with APE-6 - Improve code readability by calling "meta" variable "header"

Change 'columns' and 'type' to 'datatype' for flexibility and ASDF co…

5f2268c

…mpatibility

Change ECSV version number to 0.9

ddd07ff

Allow for non-existent column 'ndim' attribute

856f9d8

Make ECSV writer be mixin-compliant

aa64a2d

Skip ECSV mixin writing test if no pyyaml

99a8794

Install pyyaml by default in appveyor build

29913ca

taldcroft force-pushed the ascii-dtif branch from 693d76c to 29913ca Compare January 21, 2015 16:45

eteq added a commit that referenced this pull request Jan 26, 2015

Merge pull request #2319 from taldcroft/ascii-dtif

9585a8b

Implement support for the ECSV format proposed in APE6

eteq merged commit 9585a8b into astropy:master Jan 26, 2015

embray added Affects-release and removed Ready-for-final-review labels Jan 26, 2015

taldcroft mentioned this pull request Jan 27, 2015

Add APE-6 references to io.ascii docs #3365

Merged

taldcroft mentioned this pull request Mar 4, 2015

WIP Store table and column meta as JSON in Table HDF5 interface #3568

Closed

taldcroft deleted the ascii-dtif branch February 25, 2019 20:23

Uh oh!

Uh oh!

Conversation

taldcroft commented Apr 13, 2014

Uh oh!

taldcroft commented Sep 5, 2014

Uh oh!

mhvk commented Sep 5, 2014

Uh oh!

taldcroft commented Sep 5, 2014

Uh oh!

mhvk commented Sep 5, 2014

Uh oh!

taldcroft commented Sep 5, 2014

Uh oh!

astrofrog commented Sep 5, 2014

Uh oh!

astrofrog commented Sep 5, 2014

Uh oh!

astrofrog commented Sep 5, 2014

Uh oh!

astrofrog commented Sep 5, 2014

Uh oh!

mdboom commented Sep 5, 2014

Uh oh!

astrofrog commented Sep 5, 2014

Uh oh!

taldcroft commented Sep 5, 2014

Uh oh!

taldcroft commented Sep 5, 2014

Uh oh!

mdboom commented Sep 5, 2014

Uh oh!

taldcroft commented Sep 5, 2014

Uh oh!

taldcroft commented Sep 5, 2014

Uh oh!

taldcroft commented Sep 5, 2014

Uh oh!

eteq commented Sep 11, 2014

Uh oh!

taldcroft commented Sep 11, 2014

Uh oh!

taldcroft commented Sep 13, 2014

Uh oh!

astrofrog commented Sep 13, 2014

Uh oh!

eteq commented Sep 24, 2014

Uh oh!

eteq commented Jan 26, 2015

Uh oh!

astrofrog commented Jan 26, 2015

Uh oh!

taldcroft commented Jan 26, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants