Skip to content

Refactor I/O framework#659

Merged
embray merged 11 commits into
astropy:masterfrom
astrofrog:refactor-io-framework
Jan 28, 2013
Merged

Refactor I/O framework#659
embray merged 11 commits into
astropy:masterfrom
astrofrog:refactor-io-framework

Conversation

@astrofrog

Copy link
Copy Markdown
Member

Astropy has two main data containers, Table, and NDData, which are equivalent to structured and normal arrays in Numpy. It makes sense for these two objects to behave in a similar way - yet, the current I/O functionality is only implemented for Table.

This PR takes the I/O code out of Table and instead puts it in astropy.io.registry. A DataIO class is defined, and both NDData and Table then inherit from it (as @taldcroft suggested off list, I think we need a better name than DataIO). This made it very easy to extend the I/O stuff to NDData, although no readers/writers are defined inside Astropy currently. Here's an example of defining a (dumb) FITS reader:

import numpy as np

from astropy.nddata import NDData
from astropy.io import fits, registry

def read_fits_data(filename, hdu=0):
    hdu = fits.open(filename)[hdu]
    return NDData(hdu.data, meta=hdu.header)

registry.register_reader('fits', NDData, read_fits_data)

d = NDData.read('MSX_E.fits', format='fits')

print(np.mean(d))
print(d.shape)
print(d.meta['TELESCOP'])

with the result:

1.09424543196e-05
(599, 599)
MSX

The documentation still needs to be updated (the I/O page would be moved out of Table and moved to io/registry.rst) but I wanted to first get some feedback on this.

@eteq @taldcroft @iguananaut @mdboom - what do you think?

@iguananaut - I've tagged this 0.3. Of course, if it gets finalized soon, I don't see a reason it can't be put in 0.2 instead.

@ghost ghost assigned astrofrog Jan 19, 2013
@embray

embray commented Jan 22, 2013

Copy link
Copy Markdown
Member

Ooh, I do like this. In the long term I intend to build this functionality into pyfits/io.fits directly. But in the meantime this could serve as a workaround.

This could also add a format recognition method. So that register_reader takes (perhaps optionally) an argument which takes a function that, given access to the raw data, can recognize the data format so that it doesn't have to be specified explicitly with .read().

@embray

embray commented Jan 22, 2013

Copy link
Copy Markdown
Member

Also, personally I might prefer to call the read() classmethod something like readfrom() instead. But that's just me.

@astrofrog

Copy link
Copy Markdown
Member Author

@iguananaut - actually there is already register_identifier which you can use to register format identifiers that can use the filename or contents of the file to identify the format. We're already using this for the Table class, I just factored it out in this case.

I don't think this should be a workaround for now in terms of the infrastructure (the FITS example I showed isn't actually in the code). This is intended to be long term code so that Table.read and NDData.read work. I think what you mean is that we can add preliminary FITS readers/writers that would then get replaced by what you add in io.fits? That does make sense - and it's like what we are doing for VO tables - @mdboom implemented to_table and from_table in the VO table objects, so then we just connect these to the 'readers' and 'writers' in the I/O infrastructure. Does that make sense and address your concerns?

@astrofrog

Copy link
Copy Markdown
Member Author

By the way, here's the docs for the current infrastructure (which lives in astropy.table):

http://docs.astropy.org/en/latest/table/io.html#creating-a-custom-reader-writer

it shows you can define 'identifier' functions.

@astrofrog

Copy link
Copy Markdown
Member Author

I've moved the docs for the I/O registry to io/registry.rst, but I'm not sure where to include it in the TOC. It isn't going to be of interest to most users. Is it time to start separating out an 'advanced' section from the main TOC tree? (which would also include logging and config).

@eteq

eteq commented Jan 23, 2013

Copy link
Copy Markdown
Member

+1 from me on the main idea of this PR!

A few organizational thoughts, though: I definitely think it makes sense to have Table and NDData inherit from the same thing. However, It's a little confusing that this thing lives in astropy.io - after all, I can do NDData([1,2,3]) and there was no I/O involved at all (at least, not disk/remote I/O, which is what this is mostly referring to). So maybe just call it DataContainer or even just Data, and put it in a module astropy/dataobj.py that only contains that? If you go that route, perhaps it should also have meta? My memory is that the original intent for meta was that it be in all data containing classes (although I'm not sure if it's in Table, and certainly is not widely used).

Regardless, DataIO or whatever it ends up called should probably be abstract, with __init__ an abstract method.

@astrofrog - I'm not so sure about a separate "advanced" TOC... It's not clear to me what should go where. After all, users may well need to know some things about the configuration system (e.g., it might be useful to know runtime tricks once #556 gets merged). I had been thinking we indicate "advanced" by implication with something being lower down in the TOC (advanced or just niche, which is basically the same thing).

@astrofrog

Copy link
Copy Markdown
Member Author

@eteq - I agree with the idea of having a more-general top-level class, though I think for clarity there would be at least two files (dataobj.py and registry.py). Should those live in an astropy.data module? I'm not too keen on adding a whole new top-level heading though, so if you have a better solution, let me know.

Of course, maybe having astropy.data would make sense instead of having astropy.table and astropy.nddata (then we actually reduce the number of top-level headings). So then it would be:

from astropy.data import Table, NDData

and this module could also contain the I/O. But is it too late to change that? (if we're going to do it, we should do it for 0.2). And would it be too messy inside that module? We could still keep most of the code inside say astropy.data.table and astropy.data.nddata sub-modules, but import all the user classes to the top-level astropy.data.

@eteq @wkerzendorf @taldcroft - what do you think?

@embray

embray commented Jan 23, 2013

Copy link
Copy Markdown
Member

@eteq The DataIO class that they're both using should be viewed as just a mix-in class. It makes sense to me that it lives in astropy.io.

@astrofrog

Copy link
Copy Markdown
Member Author

@eteq @iguananaut - maybe the name IOManager would be better than DataIO and less prone to confusion?

@taldcroft

Copy link
Copy Markdown
Member

Thinking of DataIO as a mix-in class makes good sense to me. In that context the name DataIO seems good.

@eteq

eteq commented Jan 24, 2013

Copy link
Copy Markdown
Member

@taldcroft @iguananaut - I agree that it can be a mixin, in which case this layout is fine, but I was suggesting instead that we re-organized into a more traditional class hierarchy. With just what's here now that doesn't necessarily give us much, which is why I was suggesting adding the bit about meta. That way we can ensure that meta is consistent across data objects. (If that's what we want... @taldcroft, what do you think about meta in Table?)

I'm also concerned that a mixin is actually a somewhat advanced way of using OO, and a lot of our poor users who aren't quite as up-to-snuff on their design patterns might not understand what it's doing (and users will be interacting with these classes). That's not a reason on its own not to do this, but if we already have a reason (as I've implied above with meta), to me its further weight.

So with that in mind, I like @astrofrog's suggestion of subsuming nddata and table into one package.

If we opt not to do this and stick with a mixin, it should be noted as such in the DataIO docstring, though. Perhaps even rename it DataIOMixin? That would at least immediately clue me in that it's not intended to be a "true" superclass. And if a mixin, it's probably even more useful that it be an abstract class so that things are completely clear.

@embray

embray commented Jan 24, 2013

Copy link
Copy Markdown
Member

It is a true superclass, it's just not a strict superclass. I see no harm in educating users on multiple inheritance :) In any case most of them wouldn't even be aware of it--how many users are checking what base classes something has?

I think everyone here would enjoy it, if you can find a spare 30 minutes at some point, to watch Raymond Hettinger's talk on subclassing from last year's PyCon: http://pyvideo.org/video/879/the-art-of-subclassing It seems very basic, and it is, but I but I found it very eye-opening nonetheless. It was definitely one of the most memorable for me.

@eteq

eteq commented Jan 24, 2013

Copy link
Copy Markdown
Member

(note that @astrofrog has wisely moved the discussion of reorganizing nddata and table back to the mailing list)

@astrofrog

Copy link
Copy Markdown
Member Author

I've now rebased this, and got rid of the base class for now. I think this is ready for a final review.

@embray

embray commented Jan 28, 2013

Copy link
Copy Markdown
Member

This will have to be rebased again it looks like.

@astrofrog

Copy link
Copy Markdown
Member Author

Will do!

@astrofrog

Copy link
Copy Markdown
Member Author

I've rebased this!

embray added a commit that referenced this pull request Jan 28, 2013
@embray embray merged commit 824b0e2 into astropy:master Jan 28, 2013
@embray

embray commented Jan 28, 2013

Copy link
Copy Markdown
Member

Okay. Finally ready to merge it looks like :)

@astrofrog

Copy link
Copy Markdown
Member Author

Awesome, thanks!

embray added a commit that referenced this pull request Jan 29, 2013
Refactor I/O framework
Conflicts:

	CHANGES.rst
keflavich pushed a commit to keflavich/astropy that referenced this pull request Oct 9, 2013
Refactor I/O framework
Conflicts:

	CHANGES.rst
@astrofrog astrofrog deleted the refactor-io-framework branch July 5, 2016 15:51
jeffjennings pushed a commit to jeffjennings/astropy that referenced this pull request Jul 2, 2025
Astropy Strategic Planning Committee Beta-Year Members
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants