Skip to content

Allow for easier construction of serializable objects#37

Closed
LivInTheLookingGlass wants to merge 13 commits intovsergeev:masterfrom
LivInTheLookingGlass:patch-3
Closed

Allow for easier construction of serializable objects#37
LivInTheLookingGlass wants to merge 13 commits intovsergeev:masterfrom
LivInTheLookingGlass:patch-3

Conversation

@LivInTheLookingGlass
Copy link
Copy Markdown
Contributor

@LivInTheLookingGlass LivInTheLookingGlass commented Aug 16, 2018

This PR adds the ability to usefully inherit from Ext. It changes four things:

  1. Serializes Ext before tuple, to enable inheritance from both NamedTuple and Ext
  2. Ensures that Ext inherits from object, so that we can detect its subclasses
  3. Detects subclasses of Ext at unpack time, so that these classes do not need ext_handlers
  4. Makes ExtType an alias for Ext to mirror msgpack-python API

In python3, you can have the pretty NamedTuple version by inheriting from an initially defined class. In python2, you can inherit from a function call to namedtuple(). In both cases, all that needs to happen is for the class to define a type/code variable, to define a data variable/property, and to define an _unpackb method for umsgpack to hook onto.

This PR enables the following in Python 3

from typing import NamedTuple
from umsgpack import Ext, packb, unpackb


class _Serializable3(NamedTuple):
    foo: str
    bar: int


class Serializable3(Ext, _Serializable3):
    type = 1

    def __init__(self, *args, **kwargs):
        super().__init__(Serializable3.type, packb(self[:]))

    @classmethod
    def _unpackb(cls, ext: Ext) -> 'Serializable3':
        return cls(*unpackb(ext.data))


class Serializable3Alt(Ext, NamedTuple('_Serializable3Alt', [('foo', str), ('bar', int)])):
    type = 2

    def __init__(self, *args, **kwargs):
        super().__init__(Serializable3Alt.type, packb(self[:]))

    @classmethod
    def _unpackb(cls, ext: Ext) -> 'Serializable3Alt':
        return cls(*unpackb(ext.data))

And the following in Python 2 or 3:

from collections import namedtuple
from umsgpack import Ext, packb, unpackb


class Serializable2(Ext, namedtuple('_Serializable2', ['foo', 'bar'])):
    type = 3

    def __init__(self, *args, **kwargs):
        super(Serializable2, self).__init__(Serializable2.type, packb(self[:]))

    @classmethod
    def _unpackb(cls, ext):
        return cls(*unpackb(ext.data))

This also enables for more flexible serialization through the use of dynamically calculated data:

from umsgpack import Ext, packb, unpackb


class SerializableStub(Ext):
    type = 4

    def __init__(self, foo):
        self.foo = foo
        # note that Ext is not initialized for this style

    @property
    def data(self):
        return packb((self.foo, ))

    @classmethod
    def _unpackb(cls, ext):
        return cls(*unpackb(ext.data))

@LivInTheLookingGlass LivInTheLookingGlass changed the title Allow for easier construction of serialized objects Allow for easier construction of serializable objects Aug 16, 2018
LivInTheLookingGlass added a commit to LivInTheLookingGlass/msgpack-python that referenced this pull request Sep 6, 2018
@LivInTheLookingGlass
Copy link
Copy Markdown
Contributor Author

Is there any interest in merging this feature? It would be fairly useful.

@vsergeev
Copy link
Copy Markdown
Owner

vsergeev commented Oct 4, 2019

I'll take a look in more detail soon and might incorporate it in upcoming v2.6.0.

Say you have a bit of code like this:

```python
class Message(Ext):
    type = 0  # for MessagePack serialization purposes

    def __new__(cls, msg_type: MessageType, *args, **kwargs):
        if cls is Message:
            return msg_type.associated_class(msg_type, *args, **kwargs)
        return object.__new__(cls)
```

This is designed for things to inherit from Message, but if anything does, then you get
conflicts in auto_handlers as originally designed. This patch fixes that.
@LivInTheLookingGlass
Copy link
Copy Markdown
Contributor Author

Any news on this? I'm working on a project that would probably benefit from not having to maintain this patch in my own codebase. (More than willing to help maintain, it's just a pain to do downstream)

@vsergeev
Copy link
Copy Markdown
Owner

I'll take a look this week. It's been on my todo list (for some time admittedly) to get this feature into v2.6.0.

@vsergeev
Copy link
Copy Markdown
Owner

vsergeev commented Feb 28, 2020

I like the Ext subclassing, the dynamic lookup without ext_handlers, and handling Ext before tuples. Those would help a lot with building more natural serializable custom data types. Crawling __subclasses__() is clever (I wasn't aware that was possible), but I'm concerned about the overhead of calling it in every unpack call and if it has any pitfalls with more complicated inheritance hierarchies. As for the subclassing itself, I think we may want the packing to take place outside of the constructor so that you can still construct, mutate, and use a serializable application data type on its own before serialization.

Ideally, there would be some sort of registration interface at the class definition level that's only called during class creation. This way umsgpack can build a dictionary once with all the classes associated with Ext types. I looked into a metaclass approach that took keyword arguments to try to do something like:

class MyType(metaclass=umsgpack.ExtInterface, ext_type=123):
    def pack(self):
        pass

    @classmethod
    def unpack(cls, data):
        pass

and while this is doable, it appears that passing the arguments to the metaclass like that is Python 3 only. Also, any metaclass approach interacts very poorly with the typing.NamedTuple, which is itself using the typing.NamedTupleMeta metaclass under the hood. Mixing in another metaclass with typing.NamedTuple raises a metaclass conflict TypeError, and working around this basically requires creating a separate wrapper class that inherits from both metaclasses, which is pretty ugly.

After a few other dead ends, it seems like a class decorator might be the easiest solution:

import typing

# umsgpack

ext_classes = {}

def ext_serializable(ext_type):
    def wrapper(cls):
        # assert ext_type is unique
        # assert cls has pack() and unpack()
        ext_classes[ext_type] = cls
        ext_classes[cls] = ext_type
        return cls
    return wrapper

def pack(obj):
    if obj.__class__ in ext_classes:
        return obj.pack()

def unpack(ext_type, data):
    return ext_classes[ext_type].unpack(data)

# user code

@ext_serializable(5)
class MySerializableTuple(typing.NamedTuple):
    foo: str
    bar: str

    def pack(self):
        return self.foo + "," + self.bar

    @classmethod
    def unpack(cls, data):
        return cls(*data.split(","))

# example

x = MySerializableTuple("abc", "def")

print(x) # -> MySerializableTuple(foo='abc', bar='def')

print(pack(x)) # -> abc,def

print(unpack(5, pack(x))) # -> MySerializableTuple(foo='abc', bar='def')

This has the advantages of being simple, interacting well with typing.NamedTuple, and being backwards compatible to Python 2.6. Its disadvantages are that it's a little bit trickier to enforce the definitions of pack() and unpack(). Let me know what you think.

@LivInTheLookingGlass
Copy link
Copy Markdown
Contributor Author

I'm not a huge fan of the decorator approach, though I can see why it would be useful here. If we were wanting to support python 3 only, my instinct would be to use init_subclass to handle this, but it seems like you want to support 2.* well past its end of life, which I mostly get.

Just to plot out my pro/cons here:

Pros:

  • decorators a cleaner to implement
  • might be faster
  • doesn't add type to class def

Cons:

  • harder to use type hinting (can't say isinstance(x, Ext)
  • looks less like inheritance, may be confusing

A middle of the road approach might be to change how it handles the ext_handlers instead, so instead of parsing through that each time it would:

  1. Proceed as normal unless it needs to unpack an ext
  2. If so, consult its ext_handlers
  3. If not found, consult the cached list of subclasses
  4. If not found, recompute the list of subclasses
  5. If not found, error

As for the packing part, the other way I have been doing that is with a property for the data field. It would be trivial to set up a template that caches data and clears it when a property is mutated.

Would that address your concerns?

@vsergeev
Copy link
Copy Markdown
Owner

vsergeev commented Mar 4, 2020

Mostly I don't see the need of visiting/caching the subclass list if we can just register the ext type at definition time and do a simple lookup during unpacking.

I agree that inheritance semantics would be less confusing and more intuitive, but I haven't found a clean way to handle NamedTuples with them, and I think the approach will generally require a special ext_type property (unless we use a function that returns a class, or the Python 3 only __init_subclass__). Here is an example of the metaclass version which works with both Python 2 and 3 with minor modifications:

# umsgpack

ext_classes = {}

class ExtInterfaceMeta(type):
    def __new__(cls, name, bases, dct):
        new_cls = type.__new__(cls, name, bases, dct)
        if 'ext_type' in dct:
            ext_classes[dct['ext_type']] = new_cls
        return new_cls

class ExtInterface(metaclass=ExtInterfaceMeta):
    def pack(self):
        raise NotImplementedError('Pack not implemented')

    @classmethod
    def unpack(cls, data):
        raise NotImplementedError('Unpack not implemented')

def pack(obj):
    if isinstance(obj, ExtInterface):
        return obj.pack()

def unpack(ext_type, data):
    return ext_classes[ext_type].unpack(data)

# user code

class MySerializableTuple(ExtInterface):
    ext_type = 5

    def __init__(self, foo, bar):
        self.foo = foo
        self.bar = bar

    def pack(self):
        return self.foo + "," + self.bar

    @classmethod
    def unpack(cls, data):
        return cls(*data.split(","))

    def __repr__(self):
        return "MySerializableTuple(foo='{}', bar='{}')".format(self.foo, self.bar)

# example

x = MySerializableTuple("abc", "def")

print(x) # -> MySerializableTuple(foo='abc', bar='def')

print(pack(x)) # -> abc,def

print(unpack(5, pack(x))) # -> MySerializableTuple(foo='abc', bar='def')

@LivInTheLookingGlass
Copy link
Copy Markdown
Contributor Author

LivInTheLookingGlass commented Apr 10, 2020

Okay, so I've sat on this for a while now, and I think I have a way to make it work. We do a two-stage thing.

  1. Make the decorator, like you said
  2. Provide a dummy class to use for type hinting

This would basically end up being something like

from abc import ABCMeta
from itertools import chain
try:
    from typing import Any, Dict, List, Tuple, Type, Union, TYPE_CHECKING
except ImportError:
    TYPE_CHECKING = False


class MessagePackableMeta(ABCMeta):
    def __instancecheck__(self, instance):
        if isinstance(instance, (int, float, bool, str, bytes, type(None))):
            return True
        if isinstance(instance, (list, tuple)):
            return all(isinstance(x, self) for x in instance)
        if isinstance(instance, dict):
            return all(isinstance(x, self) for x in chain(instance, instance.values()))
        return super().__instancecheck__(instance)


class _MessagePackable(metaclass=MessagePackableMeta):
    pass


for type_ in (int, float, bool, str, bytes, type(None)):
    _MessagePackable.register(type_)
del type_

if TYPE_CHECKING:
    MessagePackable = Union[_MessagePackable, float, str, bytes, None, List[Any], Tuple[Any], Dict[Any, Any]]
else:
    MessagePackable = _MessagePackable
        

ext_classes = {}

def pack(obj):
    if obj.__class__ in ext_classes:
        return obj.pack()

def unpack(ext_type, data):
    return ext_classes[ext_type].unpack(data)

def ext_serializable(ext_type):
    def wrapper(cls):
        # assert ext_type is unique
        # assert cls has pack() and unpack()
        MessagePackable.register(cls)
        ext_classes[ext_type] = cls
        ext_classes[cls] = ext_type
        return cls
    return wrapper

@LivInTheLookingGlass
Copy link
Copy Markdown
Contributor Author

LivInTheLookingGlass commented Apr 10, 2020

This doesn't work perfectly on the type-hinting front, but it would at least take care of primitive and custom types, and I think this even works on python 2, to some extent. If not, it would be easy to simply not export it in python 2.

The only errors this should give are some false positives on type hinting, since mypy doesn't support cyclic definitions yet.

@LivInTheLookingGlass
Copy link
Copy Markdown
Contributor Author

Okay, after digging into it more, the approach I posted will fail for type hinting. I no longer think it's realistically achievable to get type hinting to work here. The fact that isinstance() works at all is kinda surprising, so we should just roll with that. I'll leave the failed code there in the hopes that someone later might make it work better.

@LivInTheLookingGlass
Copy link
Copy Markdown
Contributor Author

We also can't do something clever like give them a common base class as a dummy, as the following produces incorrect type errors on mypy

class Shim:
    pass


def modify_class_def(cls):
    class ret(cls, Shim):
        pass

    return ret


@modify_class_def
class Test:
    pass


def test(a: Shim):
    pass


test(Test())  # error

@vsergeev
Copy link
Copy Markdown
Owner

I've opted to go with the decorator approach to get a more user friendly API out there for now, and we can revisit the metaclass approach and type hinting down the line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants