Skip to content

Expand JsonCustomEncoder#5471

Merged
taldcroft merged 2 commits intoastropy:masterfrom
profxj:jsonify
Sep 28, 2017
Merged

Expand JsonCustomEncoder#5471
taldcroft merged 2 commits intoastropy:masterfrom
profxj:jsonify

Conversation

@profxj
Copy link
Contributor

@profxj profxj commented Nov 14, 2016

Adds support for several astropy-specific objects including
units and Quantity.

Also includes a simple method to use in place of json.dumps
for those less accustomed to its syntax: astropy.utils.misc.jsondumps

Includes new tests (passing tests on Travis)

Copy link
Member

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I thought I was the only one using it. Good to see other kind of usage, but I have some comments. Probably @mhvk also want to review regarding the Unit/Quantity logic.

from .. import units as u
import numpy as np
if isinstance(obj, (np.ndarray, np.number)):
if isinstance(obj, np.number):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you didn't just put Quantity as the first check and retain isinstance(obj, (np.ndarray, np.number)) as-is as the second check?

return json.JSONEncoder.default(self, obj)


def jsondumps(obj, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say not. This just obscures the calling code (effectively hiding the astropy JsonCustomEncoder).

assert json.dumps({1: 2},
cls=misc.JsonCustomEncoder) == '{"1": 2}' # default

assert json.dumps({1: u.m}, cls=misc.JsonCustomEncoder) == '{"1": "m"}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about composite unit (e.g., fnu) and log unit (e.g., abmag)?

@pllim
Copy link
Member

pllim commented Nov 14, 2016

Also, this will need a change log once the milestone is set.

if isinstance(obj, np.number):
return obj.tolist()
elif isinstance(obj, u.Quantity):
return dict(value=obj.value, unit=obj.unit.to_string())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this dict called recursively? If so, there would seem to be no need to use to_string on the unit -- that would automatically get done in the recursion.

return obj.decode()
elif isinstance(obj, u.UnitBase):
return obj.name
elif obj is u.dimensionless_unscaled:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since isinstance(u.dimensionless_unscaled, u.UnitBase) is true, this needs to be rewritten (and tested!). Since it also is not a unique unit, I would write:

elif isinstance(obj, (u.UnitBase, u.FunctionUnitBase)):
    if obj == u.dimensionless_unscaled:
        obj = 'dimensionless_unit'
    else:
        obj = obj.to_string()

(note the inclusion of function units here -- we do need #3650... -- as well as the use of to_string instead of name, which I think is more reliable).

@mhvk
Copy link
Contributor

mhvk commented Nov 14, 2016

Beyond the specific comments: is there something that reads in the json as well? It would be good to check round-tripping. I also slightly worry about being too specific. In particular, for a unit one could also just rely on str or repr doing the right thing. But I don't really know the purpose of this package...

@pllim
Copy link
Member

pllim commented Nov 14, 2016

As far as Astropy code base in concerned, JsonCustomEncoder is used to read/write Cone Search service catalog (see astropy/vo/client/vos_catalog.py). It is important to check the remote data tests to make sure this does not break any Cone Search tests, as Travis can still report a "pass" when remote data tests fail.

@pllim
Copy link
Member

pllim commented Nov 14, 2016

Along the line of @mhvk's question... @profxj , why do you need this functionality?

@mhvk
Copy link
Contributor

mhvk commented Nov 14, 2016

@pllim - but does conesearch read the data generated? I'd think you'd need a custom decoder for that...

@profxj - sorry to be a bit pedantic, but if we expand the serialisation, it would be good to try to ensure it is done such that the output is unambiguous.

@pllim
Copy link
Member

pllim commented Nov 14, 2016

@mhvk , for Cone Search, yes, what I write out can be loaded back simply using json.load() (see VOSDatabase.from_json()). However, that does not mean that the new stuff added here does not need a custom decoder as well. So you did raise a good point.

@profxj
Copy link
Contributor Author

profxj commented Nov 15, 2016

Before I dig into the detailed comments (which are most appreciated), let me explain
the motivation:

I have a number of applications which generate nested dicts that include astropy Quantity's and similar objects. I have thus far used my own approach to the custom encoding but adding these few lines will achieve it, and may be quite useful to others.

ps. I do have code that reads in the dumps output. Happy to add that in too.

@mhvk
Copy link
Contributor

mhvk commented Nov 15, 2016

@profxj - I think this would be very useful to have more generally (which is indeed why I think it is important to think it through somewhat carefully). E.g., Table.meta also is a dict which can and (if used) likely will contain other astropy objects. Let me cc @taldcroft here, since he has thought more about how to serialise such information in ascii files (via yaml for ecsv). I also think it might end up being quite relevant for fits files and nddata so cc @MSeifert04.

@MSeifert04
Copy link
Contributor

MSeifert04 commented Nov 15, 2016

I actually don't know if it really applies to FITS or "big data" in any way because JSON (like ASCII) is quite inefficient compared to binary formats (FITS, HDF5, or even numpy.safe).

You just have to check what it does do numpy arrays: array.tolist()!

It would certainly be handy to have some more support in JSON-encoders and decoders but the really nice step forward would be to use something like HDF5 instead. But that's very subjective and I have to admit I don't even know what this JSONCustomEncoder is for and where it is used, the utils.misc was always a mystery to me. 😄

@mhvk
Copy link
Contributor

mhvk commented Nov 15, 2016

@MSeifert04 - I mostly felt is was relevant since fits does store meta data in ascii headers. I'm not quite sure what hdf5 does, but at some point we'll need to address that too.

Aside: I definitely don't want to hold this simple PR hostage to these discussions, just think it is good to have a few more eyes look over the general direction.

@pllim
Copy link
Member

pllim commented Nov 15, 2016

Depending on how involved the decoding stuff is, this might actually need to be moved to somewhere less obscure than misc.

@taldcroft
Copy link
Member

This is timely as I am actively working on lossless serialization of key astropy classes in support of serializing arbitrary Table objects (including mixins) to ECSV. This is done with YAML and includes SkyCoord, Time, np.ndarray, Quantity, Unit, Location (and whatever else might pop up as a dependency).

The key thing here for lossless round-tripping is that you need to identify the class of the object. This is conveniently done with a YAML tag, e.g. !!astropy.time.core.Time. Note that I am maintaining a hard requirement to use safe loading, i.e. do not leverage the capability of pyyaml to serialize arbitrary objects by running arbitrary code. Instead there will be a limited set of safe classes that can be serialized, with trusted load functions.

So I think this will all work and be ready for 1.3 based on my initial proof of concept work.

This particular solution using JSON does not allow unique round-tripping. I.e. a u.dimensionless_unscaled object just turns into the string 'dimensionless_unit' and there is no way for the loader to know if that is a plain string or u.dimensionless_unscaled. So I'm a bit ambivalent about including the additional dumping capability in this PR into core astropy since it is incomplete (at least in the current form).

@astrofrog
Copy link
Member

Depending on how involved the decoding stuff is, this might actually need to be moved to somewhere less obscure than misc

I wonder whether classes should maybe have a serialize method that could potentially take different arguments, e.g. YAML, JSON, etc? That would be a nice way to expose it.

@taldcroft
Copy link
Member

I wonder whether classes should maybe have a serialize method that could potentially take different arguments, e.g. YAML, JSON, etc? That would be a nice way to expose it.

If we were to take this route then we might want to stick with load/dump, which are used (with slight variations) by pickle, yaml, and json. But there isn't quite a perfect symmetry. This operation is always meaningful:

q_json = my_quantity.dump(method='json')

But in the loading step there is no reason to expect the input to contain only members of one class, and Quantity.load(q_json) doesn't really make sense unless it is a limited loader that only works for Quantity. This would be not very useful.

So you always need to load with an astropy-wide (not-class-specific) loader, e.g.

from astropy.utils.serialize import load_json
out = load_json(skycoord_json)

This might have a SkyCoord which has Time objects in the metadata. So in the interest of just one way of doing things, it would probably make sense to have a converse dump_json function (or whatever) that is always the way of serializing any supported astropy object or arbitrary structure of astropy objects.

@astrofrog
Copy link
Member

@taldcroft - regarding dump_json, we should consider whether it wouldn't make more sense to simply recommend the 'proper' way to do it which is json.dump[s](cls=AstropyEncoder) or something like this? I don't have strong feelings, but fitting into the current Python API probably makes sense?

@taldcroft
Copy link
Member

@astrofrog - agreed that we should stick with the recommended Python API as you said. Sorry for the distraction of just making something up.

In terms of this PR and the broader context of JSON support, should we impose a requirement that there be complementary and unambiguous encode / decode functionality for any supported object? Unfortunately the current code does not meet that requirement...

@pllim
Copy link
Member

pllim commented Nov 30, 2016

@taldcroft , if by current code, you include the stuff I originally added for Cone Search, whatever you do, please make sure it is backward compatible. I don't feel like messing with what is already working for Cone Search.

@profxj
Copy link
Contributor Author

profxj commented Jan 11, 2017

Sorry for the delay in responding to the extensive comments.

I believe that JSON is not a good way to serialize astropy objects in a manner
that can be fully round-tripped. YAML is best for that and you've implemented
a solid approach. Indeed, I'm likely to start using it.

There are times, however, when one wants a simple ASCII file to view and
JSON is a fine solution. I've put in edits to the comments and a new method
that I've used to convert a dict made with this JSON encoder.

If you decide it isn't worth the bother, that's ok. I just think it makes sense to
modestly extend the encoder you already need.

@taldcroft
Copy link
Member

@profxj - I agree that the new dump functionality in default() has utility and it is in keeping with existing code. So 👍 on that.

However, I would propose removing the convert_quantity_in_dict function so there is a clear understanding that round-trip serialization with JSON is not supported. That function is a specialized loader for a narrow use-case, and would probably make more sense in your own personal library.

@taldcroft taldcroft added this to the v2.0.0 milestone Jan 11, 2017
@taldcroft taldcroft self-assigned this Jan 11, 2017
return json.JSONEncoder.default(self, obj)


def convert_quantity_in_dict(idict):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per comment in main section, remove this function and associated testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. I agree that it is quite specialized.

Copy link
Member

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also recommend squashing all your commits into one. Otherwise, LGTM since remote tests passed. Thanks!

assert json.dumps({1: u.m}, cls=misc.JsonCustomEncoder) == '{"1": "m"}'
# Quantities
tmp = json.dumps({'a': 5*u.cm}, cls=misc.JsonCustomEncoder)
assert '"unit": "cm"' in tmp # Order of dict keys is unknown here
Copy link
Member

@pllim pllim Jan 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking: Two space between code and start of comment (PEP 8). You might want to use pycodestyle or pyflake or something similar to ensure PEP8 adherence. Not critical but would be nice, especially if someone decides to make code style check stricter on Travis CI in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the white space.

If someone can teach me how to squash, I'll gladly do so. ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html is a good resource for me. But if you have trouble with it, one of the maintainers can also do it for you as long as "allow edits from maintainers" box is checked.

assert json.dumps({1: u.m}, cls=misc.JsonCustomEncoder) == '{"1": "m"}'
# Quantities
tmp = json.dumps({'a': 5*u.cm}, cls=misc.JsonCustomEncoder)
assert '"unit": "cm"' in tmp # Order of dict keys is unknown here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to do these tests in a more transparent and complete way, something like:

out = json.loads(json.dumps({'a': 5 * u.cm}, cls=misc.JsonCustomEncoder))
assert out == {'a': {'unit': 'cm', 'value': 5.0}}

Even though the key order might vary, two dicts can be compared like shown. So the test requirement is not that some particular text is emitted but rather that the emitted text represents the JSON encoding of a specific Python data structure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Am committing without squashing as I am worried I'd
get that wrong. Need to practice a bit with my own work first..

@gregreen
Copy link

gregreen commented Apr 7, 2017

I had to implement JSON encoding/decoding of Quantity and SkyCoord for a project of mine, so I would be happy to share what I came up with. The code is here, and the basic idea is to encode each item as a dictionary with a _type key. For example, here's what a numpy.ndarray might serialize to:

{'_type': 'np.ndarray', 'value': [1.0, 2.0, 3.0], 'dtype': 'f8'}

That way, the decoder unambiguously knows to turn this back into a numpy array.

Anyways, I implemented numpy.dtype, numpy.ndarray, Quantity and SkyCoord (the latter definitely only works for a subset of SkyCoords, since I don't understand the intricacies of the SkyCoord class). There were some subtleties in dealing with more complicated datatypes in numpy.ndarray, but I think my solution for that datatype should work pretty generally.

@taldcroft
Copy link
Member

@gregreen - thanks for the offer! One question I have is if you absolutely need JSON instead of YAML. Serializing the core astropy objects to YAML is available in 1.3:

http://astropy.readthedocs.io/en/latest/io/misc.html#module-astropy.io.misc.yaml

To include the analogous JSON serialization in the astropy core, it should have a similar level of functionality to the YAML code. For instance a SkyCoord can have Time and EarthLocation metadata.

@pllim
Copy link
Member

pllim commented Jun 15, 2017

I don't think this is going to make it to v2. I'm removing the milestone.

@pllim pllim removed this from the v2.0.0 milestone Jun 15, 2017
@astropy-bot
Copy link

astropy-bot bot commented Sep 27, 2017

Hi humans 👋 - this pull request hasn't had any new commits for approximately 8 months. I plan to close this in a month if the pull request doesn't have any new commits by then.

In lieu of a stalled pull request, please close this and open an issue instead to revisit in the future. Maintainers may also choose to add keep-open label to keep this PR open but it is discouraged unless absolutely necessary.

If you believe I commented on this issue incorrectly, please report this here.

@pllim
Copy link
Member

pllim commented Sep 27, 2017

I think this discussion is veering towards "let's close this PR and try another implementation". Am I correct? Can we close this?

@pllim pllim added the Close? Tell stale bot that this issue/PR is stale label Sep 27, 2017
@astropy-bot
Copy link

astropy-bot bot commented Sep 27, 2017

Hi there @profxj 👋 - thanks for the pull request! I'm just a friendly 🤖 that checks for issues related to the changelog and making sure that this pull request is milestoned and labelled correctly. This is mainly intended for the maintainers, so if you are not a maintainer you can ignore this, and a maintainer will let you know if any action is required on your part 😃.

Everything looks good from my point of view! 👍

If there are any issues with this message, please report them here

@profxj
Copy link
Contributor Author

profxj commented Sep 27, 2017

At one point, the only reason this wasn't ingested was because I didn't
know how 'squash' commits. Would be rather annoyed to have spent
quite a few hours on this only to see it deleted for this..

It was never intended to have the same functionality as the YAML code
nor does it need to.

@pllim pllim removed the Close? Tell stale bot that this issue/PR is stale label Sep 27, 2017
@pllim
Copy link
Member

pllim commented Sep 27, 2017

@profxj , did you check "allow edit from maintainer" box here? If not, could you please check it?

@profxj
Copy link
Contributor Author

profxj commented Sep 27, 2017

@pllim -- just have.

@pllim
Copy link
Member

pllim commented Sep 27, 2017

Argh sorry, rebase problem. I'll fix it.

@pllim pllim added this to the v3.0.0 milestone Sep 27, 2017
@pllim
Copy link
Member

pllim commented Sep 27, 2017

@profxj -- I have squashed, rebased, and added a change log for you. Please review.

@taldcroft , @mhvk -- Do you think this PR is acceptable as-is, moving any extra "nice to have" items to a new issue? Please re-review.

@profxj
Copy link
Contributor Author

profxj commented Sep 27, 2017

All looks fine to me. Thanks.

@mhvk
Copy link
Contributor

mhvk commented Sep 27, 2017

I'll leave the final call to @taldcroft, but my overall sense is that this is an easy improvement we might as well make; ideally, we'd use something more like the yaml encoder, so that other astropy classes can be included as well, but that will take quite a bit more time to get right (for larger arrays, etc.), and it is not obvious we need it (and if we do, the very worst that could happen is that we create a new encoder and slowly deprecate this one).

@taldcroft
Copy link
Member

Agreed with @mhvk that this modest enhancement should be merged.

But on the bigger point I would argue against any further move toward expanding JSON serialization for more astropy classes. YAML is working out quite well and so there would need to be a pretty strong argument for making two similar, but not exactly the same, serialization systems in astropy.

@taldcroft taldcroft merged commit 945c9d2 into astropy:master Sep 28, 2017
@taldcroft
Copy link
Member

Thanks @profxj !

@profxj
Copy link
Contributor Author

profxj commented Sep 28, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants