-
-
Notifications
You must be signed in to change notification settings - Fork 2k
Introduce a Masked class, apply to Quantity, attempt Column, Representation #10423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b2e2212 to
1f7b14f
Compare
Also refactor tests a little.
And try to assume less that data is an ndarray.
So that they are useful even for a version of masking that does not have such a method.
Plus a number of rebase fixes.
|
@mhvk - without actually looking at this, can you talk to the rationale for trying to apply this to MaskedColumn in a single PR? Offhand this seems like a huge PR and maybe it would be best to try to break it into smaller bits starting with only Quantity. |
|
Yes, I think the reason I pushed this here was just to show that it could be done, but would involve some API changes. But I kept separate branches even locally. I am in the process of rebasing and will push the quantity-only one separately. There's a bit of a chicken-and-egg problem in that ideally there be more documentation than the tests, but to find the energy to write that, I'd like to get a sense that this is actually going to get used... |
|
@taldcroft - see #11127. Most important would be to decide on the points noted there... |
|
@taldcroft - I'm wondering if going the route of an APE would be good to get our thoughts properly organized and checked. Would you be willing to think along? We could perhaps start with a wiki page or a google docs or so. E.g., another design decision I made was to keep masking close to |
Makes almost all of test_column.py pass. The exception is a structured dtype, which MaskedNDArray cannot yet handle.
1f7b14f to
a84aeb9
Compare
|
@mhvk - looking again at this and #11127, I realize there is a lot of value in this PR that shows both the base implementation and how it gets used in practice. That (with the tests) is the effective first version of documentation for developers, and it proves that the infrastructure supports the intended use cases within astropy. Some points / questions: At this point it seems the only way to make a MaskedQuantity is with something like If I look at the MRO of About My suggestion would be to get to the point where the new infrastructure is passing our tests (already there, I guess?), and then back out changes to For me the next step here would be to modify table/operations.py to support all operations for all mixins! If you are somewhat careful about commits and keep the "applications" separate from the infrastructure (basically the new There is of course the elephant in the room of whether this functionality is so widely useful that it should be released as a separate package. 😄 Anyway, thanks for the amazing work. |
|
Can we have this work: where this accepts all the |
|
One thing that isn't quite working now: |
|
@taldcroft - thanks for the broad review and thoughts! I agree with the plan to start with just the infrastructure, so that initially nothing truly depends on it and we can experiment without feeling we've locked in the API (including e.g. structured We probably should discuss in #11127 what is still missing - looking at it after nearly half a year myself it would be good to have a basic design description for developers (can be just in the module docstring) as well as perhaps a very brief overview in the docs (which may include a short list of differences for people used to Explicit I very much like the idea of I also agree with the suggestion to drop the idea of making a direct replacement for On more complicated mixins: One nice thing is that since the |
|
Going to close this one, since now that the base implementation is merged, it is better to just slowly add pieces. |
EDIT: the base implementation is in #11127 - this one tries to use
Maskedfor columns and gives a sense how it would work for representations.@astrofrog - as per your request in #10119, here's what I have. I got things to working quite well with
Quantityand all its subclasses; the first thing to look at may be the quite detailed tests (which should be useful for whatever we end up implementing).I had started on
MaskedColumnand had also tried to move the information up in representations (to go for coordinates) but that is not finished. Part of the difficulty withMaskedColumnis that I made some design choices which are different fromMaskedArray, so it would need a kind of compatibility layer if we want to keep exact one-to-one matches, and/or some way to decide which type one wants.It may be worth pointing out two specific things:
MaskedArray- any time there is any items in the reduction masked, the result is masked as well. So, e.g., for.sum(), masked elements are not implicitly replaced by zero. The logic is that one cannot interpret the sum meaningfully without knowing how many elements actually contributed. Conversely, following that same logic, for methods where skipping a masked element does make sense (like.mean(), where one divides by the number of unmasked elements), I do use it.MaskedArraydoes the latter and it is quite a hassle to keep track, but perhaps better.Anyway, suggestions very welcome. Indeed, this is definitely at a point where some further design choices have to be made, and it would be good to step back a bit and decide what exactly we want masking to mean.
cc @taldcroft
EDIT: Masked column was very much a work in progress. Tests still passed without the 3 last commits (i.e., at 379a881)