Skip to content

Refactoring NDData#2905

Merged
astrofrog merged 59 commits into
astropy:masterfrom
mwcraig:nddata-reorg
Dec 24, 2014
Merged

Refactoring NDData#2905
astrofrog merged 59 commits into
astropy:masterfrom
mwcraig:nddata-reorg

Conversation

@mwcraig

@mwcraig mwcraig commented Aug 29, 2014

Copy link
Copy Markdown
Member

This PR is a companion to #2855 and also came out of the NDData telecon. The goal is to simplify the base NDData class so that the base class becomes a simple container for data interchange.

The arithmetic methods on the current NDData are factored out into an NDarithmetic mixin class. The class NDDataArithmetic is include as a convenience and works as a replacement for the current NDData (I've check that the current ccdproc tests pass if I make make one change to the code: subclass from NDDataArithmetic instead of NDData).

I've also moved the NDData arithmetic tests into a separate file.

edit 9/30/2014: the flags property is also removed from NDData.
edit 10/2/2014: Uncertainty and mask implementation moved from NDData to NDDataArithmetic and removed automatic conversion of NDData to a numpy array. Updated task list.

A rendered version of the documentation changes is here.

Still not done:

  • Remove checks for FlagCollection shape (should this be done?)
  • Attempt to make mask, unit point to the mask of the data (if it has one) instead of separating the mask from the data, and the same for the unit.
  • Fix doctest failures
  • Update docs to reflect

@Cadair -- please check to see whether this revision has wcs working the way you wanted.

cc @astrofrog @cdeil @perrygreenfield @crawfordsm @larrybradley @adamginsburg @bsipocz

@Cadair

Cadair commented Aug 29, 2014

Copy link
Copy Markdown
Member

@mwcraig wcs looks great, I had a quick look over the rest of it.

Comment thread astropy/nddata/nddata.py Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point I think we're going to have to reconsider how units in NDData are handled or if it's even appropriate for an NDData object to have a unit. If NDData is going to be a common data container object it's going to also have to support cases where different axes represented by the array dimensions have different units. The obvious example is, say, long slit spectra where one axis will have some dimensions either in pixels or in length (depending on how the WCS is used) while another axis will have some spectral unit (possibly also length in the case of wavelength but possibly also frequency, etc.).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add--of course an equivalency could be possible too, but I don't necessarily think it should be that every axis have equivalent units...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@embray - i think that unit here refers to the unit of the data, not of the axes, so I think it should be appropriate for most situations to have a single unit.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct -- it is like the BUNIT from a FITS header.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a little confusing, and maybe not useful for "generic" data. I don't know though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's reasonable to assume that array data might have no unit or a single unit, right? What would it mean for the data to have more than one unit? I know we are trying to make sure NDData is generic, but I would say that saying that the values can have a unit is acceptable. Of course one could have a 3-d cube where each slice is a different thing that has different units, but in that case I would argue that a single NDData is not the right container. We don't want to be that generic :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Units should be consistent for the whole array.

@astrofrog

Copy link
Copy Markdown
Member

@mwcraig - regarding FlagCollection - I guess it depends if we want to assume anything about flags now we are moving to a more generic data container. It sounded like we agreed that flags doesn't even have to be a property in the base class?

@astrofrog

Copy link
Copy Markdown
Member

Basically I think we had agreed that we should strip everything out of NDData apart from the property definitions, correct? That is, there should basically be no functionality in the core NDData (apart from using data.mask and data.unit if available).

@mwcraig

mwcraig commented Sep 9, 2014

Copy link
Copy Markdown
Member Author

That was my recollection too -- I'll update tonight or tomorrow.

@perrygreenfield

Copy link
Copy Markdown
Member

Other than eventually supporting slicing, I would agree.

@mwcraig

mwcraig commented Oct 1, 2014

Copy link
Copy Markdown
Member Author

Finally got around to pulling out flags tonight.

I've left in slicing and the methods (like __array_prepare__) that allow an NDData object to be treated like a numpy array. That is primarily so that recovering a masked numpy array from a masked NDData object is easy. The properties like dtype, etc., make sense to keep with slicing.

An alternative is to moving slicing and those properties into NDDataArithmetic.

@astrofrog

Copy link
Copy Markdown
Member

@mwcraig - are you also planning on simplifying the content of the uncertainty and mask properties? I think we had decided that we are going to strip these down to something like:

@property
def uncertainty(self):
    return self._uncertainty

@uncertainty.setter
def uncertainty(self, value):
    self._uncertainty = value

so that NDData is truly just defining the API but not really making any assumptions about checking, etc. For masking the only difference is that the getter would also check the mask property on the data.

Then for slicing I would simply make it that the __getitem__ will loop over the different properties, and for each one will try and slice them if they are not None (so then it's up to things like WCS to define how to be sliced). If any of them cannot be sliced, the operation fails.

For clarity, it might be nice to move the arithmetic classes to a separate arithmetic.py file.

@astrofrog astrofrog added this to the v1.0.0 milestone Oct 2, 2014
@mwcraig

mwcraig commented Oct 2, 2014

Copy link
Copy Markdown
Member Author

@astrofrog --

For mask the line setting mask to a numpy array could certainly go. Are there reasons not to check the shape, though? I see that as something which may catch user errors more than limiting what can be done with the mask.

The bits about np.ma.nomask only need to stay if base NDData keeps its __array__ and __array_prepare__ methods. It would make sense to move those "math-y" aspects to arithmetic.py.

The bulk of uncertainty is handling the possibility that the uncertainty is specified with units. It could be vastly simplified by simply raising an error in the uncertainty setter if the value has a units property. I think the check for units is there because of the expectation that a user passing in data with units is also likely to be setting an uncertainty that has units, and if the user does then NDData should make sure the units are consistent.

Looking over the code that handles that I think it could be simplified and the restriction that the uncertainty be a NDUncertainty can be removed (or at least imposed only in the NDDataArithmetic subclass).

I'll update PR this evening, then we can continue the discussion from there.

@astrofrog

Copy link
Copy Markdown
Member

@mwcraig - thanks for the comments. Just to be clear, what I'm suggesting here and in the draft APE that you saw is that we completely remove all uncertainty code from the base class. Of course, we could have a mix-in class that enables uncertainty stuff but in the default NDData class, this would not be present (according to the APE). We should discuss these points as part of the APE since the APE makes the implementation path very clear for the base class.

@astrofrog

Copy link
Copy Markdown
Member

@mwcraig - regarding the shape of the mask, again it depends. For our SpectralCube package, we actually are not using arrays for the mask but lazily evaluated functions - for example we can do cube.mask = np.inf (or something similar at any rate) and it gets evaluated as needed. This is an example of why the base class should not even be assuming the mask is an array. One could also imagine having sub-classes where it might make sense to allow masks to be broadcast.

@mwcraig

mwcraig commented Oct 2, 2014

Copy link
Copy Markdown
Member Author

@astrofrog -- that makes sense...I keep forgetting that in the new arrangement NDData wouldn't be used directly. I'll make more extensive modifications, then, so people have something concrete to look at and try out if they want to see how the changes might affect them.

I'll include a subclass (maybe in arithmetic, maybe somewhere else) that acts like the old NDData to make the transition as easy as possible for anyone who hasn't been involved in this.

@embray

embray commented Oct 3, 2014

Copy link
Copy Markdown
Member

I don't know that NDData by itself couldn't be used directly. I think it can and should still be useful as a container for arrays and metadata (and uncertainties? I forget if that is still on). It's just that it would be more useful as a base class for more sepcific data.

@astrofrog

Copy link
Copy Markdown
Member

@mwcraig - now that the APE has been accepted, can you finalize this PR to implement the decisions in the APE? On my side I'll finish up the implementation of the decorator stuff.

@mwcraig

mwcraig commented Dec 18, 2014

Copy link
Copy Markdown
Member Author

@astrofrog - Working on it, will see how far I get today.

@mwcraig

mwcraig commented Dec 18, 2014

Copy link
Copy Markdown
Member Author

@astrofrog -- mostly done, still need doc updates and an I/O mixin (though I suppose that could come later)

@mwcraig

mwcraig commented Dec 18, 2014

Copy link
Copy Markdown
Member Author

Any suggestions for this? [Am currently thinking "wait til later" is the best approach for now]

In [4]: class Foo(Quantity, NDDataBase):
   ...:     pass
   ...:
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-bf917ed358f2> in <module>()
----> 1 class Foo(Quantity, NDDataBase):
      2     pass
      3

TypeError: Error when calling the metaclass bases
    metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

@embray

embray commented Dec 18, 2014

Copy link
Copy Markdown
Member

I think you need to define a metaclass for Foo that subclasses both InheritDocstrings and ABCMeta. I feel like there might be a way to avoid having to do this in the future, but for now that's what would be necessary. I know I've had to do things like that before when I've run into this error, but I forget the exact context.

@astrofrog

Copy link
Copy Markdown
Member

@mwcraig - I can work on the IO mix-in on Friday since I'm familiar with that part. I'll review the rest of what's here tomorrow. Thanks!

Comment thread astropy/nddata/nddatabase.py Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The try...except is a little confusing. Why not simply do:

if not hasattr(value, 'uncertainty_type') or not isinstance(value.uncertainty_type, six.string_types):
    raise TypeError(...)

?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply do:

Don't remember...done in any event.

@astrofrog

Copy link
Copy Markdown
Member

@mwcraig - once that is fixed, an you also rename this issue to remove the WIP?

@mwcraig mwcraig changed the title WIP: Refactoring NDData Refactoring NDData Dec 22, 2014
@mwcraig

mwcraig commented Dec 22, 2014

Copy link
Copy Markdown
Member Author

@astrofrog -- fyi, am making progress on I/O mixin, will push today.

@mwcraig

mwcraig commented Dec 23, 2014

Copy link
Copy Markdown
Member Author

@astrofrog -- my attempt at an i/o mixin has been pushed.

Some caveats/obvious limitations:

  • It only implements a FITS reader/writer
  • I may have used classmethods unnecessarily...I got fairly confused at one point.
  • If you read from a FITS file then the metadata is kept as a FITS header. Probably that should be controllable by an optional argument, since I can image cases where either that behavior or one in which the metadata is converted to a dict with some cruff removed, like in Table, is desirable.

Longer term, I think there is some refactoring to be done to reduce duplicate code between this and io.fits.connect, which implements readers/writers for Tables but contains some stuff that is not Table-specific (that I copied and pasted here).

@astrofrog

Copy link
Copy Markdown
Member

@mwcraig - thanks, I will do a PR to your branch to bring the I/O stuff in line with what Table does.

@astrofrog

Copy link
Copy Markdown
Member

@mwcraig - see mwcraig#1 - however, unless you agree with my changes there, I think this may actually take some more discussion so it might make sense to remove the I/O commit from this PR (first let's see if you agree with mwcraig#1)

@astrofrog

Copy link
Copy Markdown
Member

@mwcraig - if you disagree with the changes in my pull request, then in order to make sure we merge this PR today, please remove the latest IO changes and instead simply add a very simple IOMixin class:

class NDIOMixin(object):
    """
    Mixin class to connect NDData to the astropy input/output registry.

    This mixin adds two methods to its subclasses, ``read`` and ``write``.
    """

    @classmethod
    def read(cls, *args, **kwargs):
        """
        Read and parse gridded N-dimensional data and return as an
        NDData-derived object.

        This function provides the NDDataBase interface to the astropy unified
        I/O layer.  This allows easily reading a file in the supported data
        formats.
        """
        return io_registry.read(cls, *args, **kwargs)

    def write(self, *args, **kwargs):
        """
        Write a gridded N-dimensional data object out in specified format.

        This function provides the NDDataBase interface to the astropy unified
        I/O layer.  This allows easily writing a file in the supported data
        formats.
        """
        io_registry.write(self, *args, **kwargs)

and nothing else (no auto-registration).

For Astropy 1.0, affiliated packages will then be able to define and register their own readers/writers for their NDData subclasses if needed.

We can then postpone to later the decision of which built-in readers/writers to provide in the core, what their behavior should be, and whether NDData should inherit from IOMixin.

@astrofrog

Copy link
Copy Markdown
Member

Sorry for the multiple messages - actually I would suggest that regardless of whether you agree with mwcraig#1 you follow the suggestion in my previous comment and remove the FITS reader/writer to keep this PR simple. Then later today I can open a PR with the changes I am suggesting but we can decide to defer to 1.1 if needed.

@embray

embray commented Dec 23, 2014

Copy link
Copy Markdown
Member

I agree with @astrofrog that if at all possible it would be best to bring in any FITS-specific stuff in a separate change.

Comment thread astropy/nddata/io.py Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stuff does look a bit bungled (as you suspected), but I'll have to take a closer look later.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, looks like @astrofrog already addressed this.

@mwcraig

mwcraig commented Dec 23, 2014

Copy link
Copy Markdown
Member Author

@astrofrog -- just pushed the minimal version of the mixin you proposed, with no readers/writers registered. Agree with you and @embray that it makes sense to punt on this for now.

I'm fine with the rearrange in mcraig#1 but didn't merge because we went simple here. Please ping me if you do a PR and I''ll take a closer look.

@astrofrog

Copy link
Copy Markdown
Member

@mwcraig - thanks for making the change back to a simpler I/O class.

If anyone else has any comments on this, please review this now, otherwise I will merge this later this evening (I will do a final review now).

Comment thread docs/nddata/index.rst Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think gridded should be removed here? (I think when writing the APE we agreed that some day this may be used as a parent class for ungridded data too?)

@astrofrog

Copy link
Copy Markdown
Member

These are my only comments - other than that this is good to go. Thanks for your work on this @mwcraig!

@mwcraig

mwcraig commented Dec 23, 2014

Copy link
Copy Markdown
Member Author

My pleasure; the comments are addressed.

@astrofrog

Copy link
Copy Markdown
Member

@mwcraig - thanks! I will merge this tomorrow morning if there are no other objections. This implements APE7 so the API has already been discussed, and the implementation looks good to me.

astrofrog added a commit that referenced this pull request Dec 24, 2014
@astrofrog astrofrog merged commit 76c7d90 into astropy:master Dec 24, 2014
@mwcraig mwcraig deleted the nddata-reorg branch July 16, 2015 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants