Proposal for structure extensions

Open clwi opened this issue 8 years ago • 1 comments

MessagePack has a lot of different scalar value types and also an extension mechanism to extend scalar types in the future. However, there are only 2 structures (fixed size array and fixed size map) and there are no extension mechanism for structures.

Problems

Problem 1: No dynamic structures

This has been brought up several times before, and it is definitely a problem when the producer want to use a streaming model and doesn't know in advance how big a structure is.

Problem 2: No userdefined structure types

Let us just take a counted set as an example. You might pack it as a map and rely on that the receiver knows how to interpret the map. But with the same kind of arguments there would be no use of different scalar types, all scalars could be represented by a blob and the receiver knows how to interpret the blob. Not so good.

Another example is that you need a type for a coordinate (3 reals: x,y and z). Again, you can solv it by packing the coordinates in an array with 3 elements, but then you have lost all type info.
If you, on the other hand, defines an own EXT scalar with 3 double in the data, you always use 24 bytes data and have no possibility to utilize the compression in MessagePack.
A third way is to define an own EXT and to put a MessagePack message in the data. This increases the complexity with recursive calls to the parser. Also, a general MessagePack browser/editor can't know about it and can only handle it as a blob.

Solution

I will propose a solution that solves both problems mentioned above.

First, we reserve a system EXT type (e.g. -2) for structure extensions.

All structure extensions are dynamic and they use the following schema:
EXT(-2,data) Item1 Item2 ... EXT(-2,0)
The structures could be nested and each EXT(-2,0) item closes the innermost structure.

The data field in the EXT item defines the type of the structure. Small negative values (-1 .. -128) are system defined types, e.g.

-1 = Dynamic array  
-2 = Dynamic map  
-3 = Dynamic set  
-4 = Dynamic counted set

Positive values or long data (> 1 byte) are user defined types. The user may for example store the class name in the EXT data and then a generic unpacker is able to generate objects of the correct class.

Backward compatibility issues

An old parser will fail if a new structure extension is part of another structure.

Conclusion

This proposal will broaden the usability of MessagePack with minimal changes to the specification.

Mar 15 '18 18:03 clwi

It seems to me like you can accomplish most of these things with what the spec already provides - is there really any gain to spelling out specific arguments for dynamic types? A fixext16 already seems like a good representation of the tuple type you describe, since you could fit three f32. Or, just an array with a type indicator:

["_pt", 123, 456, 789]

Acheiving dynamic things using dunder names is fairly common:

[
    {"__class": "temperature", "value": 25},
    {"__class": "location", "latitude": 0.000, "longitude": 0.000},
    {"__class": "blob", "__size": 12345678, "blob": "ab24..."}
]

But really, this all seems like a job for whoever decides on the schema, and not a change to the specification. Since the spec can already represent dynamic types.

NoSql databases (which are 100% dynamic) use a single underscore or a $ to indicate that a key is reserved in some way:

[
    {
        "_id": "2934f",
        "salesDate": "2022-05-02",
        "items": [
            "12",
            "35",
            "86"
        ]
    },
    {
        "_id": "12",
        "name": "Pothos",
        "price": {
            "$numberDecimal": "8.00"
        },
        "orders": [
            "2934f",
            "1b2df",
            "43de9"
        ]
    }
]

Jan 07 '23 15:01 tgross35