msgpack icon indicating copy to clipboard operation
msgpack copied to clipboard

Some thoughts and considerations while evaluating MsgPack

Open mlsomers opened this issue 10 years ago • 2 comments

I am considering to use MsgPack as a format for a new cross platform project, but looking at the specs, I have a few questions and doubts. I don't need immediate answers now, but these thoughts might be helpful and maybe you have some answers ready.

It looks like there is no typed collection in there. Strongly typed generics like arrays, hash tables, queues, dictionaries, etc… of specific types could be more compact by specifying the type once. And if it is a collection of a fixed length type, the length could also be specified only once (or left out completely for the native types like ints an bools). I don’t think there would be much benefit in specifying the type of the container (array / hashtable / etc). These could be left open for the programmers preference or allowed to change between versions.

Another idea in the same line is with complex types or compound types. If these could be defined once before an array of that type, then all subtypes would not need to be specified for each instance and only some extra bytes to indicate the length of variable-length parts would be needed.

Things get interesting where an array of a base type could contain any mixture of derived types. In that case extra type-reference info inside the array would be needed. Maybe a header dictionary of type info would be handy to which references could be made.

An external schema (like the XSD idea) might also be an option, but I’d strongly prefer integrating it in each instance of a file (preventing version mismatches and the need to reference external schemas from within the file, that always seem to be unfindable or missing or just point to a default tempuri.org which was never helpful).

Yet another thought: maybe a standard way of defining an index at the start or end of a file would be a good idea, allowing random access to large files. A map that specifies where certain collections or portions of the file start and their length.

Ok, I’ll stop rambling on the details, now for the practical stuff.

The only real advantage of using MsgPack instead of hardcoding an own specific format (for me) would be tooling, the pre-made libs and generic parsers/editors that understand the format, handle endianness out of the box. What I have not yet seen are tools that can validate and enable inspection and editing of the files outside the context of the target application. Do any of these exist yet?

In my opinion the Extensions part of the format is killing. An extension would be a black box in an editor and might only be editable with a hex-editor, the same as Raw Binary. The strong part of MsgPack is having type info, enabling us to make code generators like we had for SOAP or dynamically building data trees in-memory like many JSON parsers do these days. I love the concept! keep up this work! Who knows I might be able to contribute if I find the time.

mlsomers avatar Nov 23 '15 23:11 mlsomers

Well, I've been working hard to get around my biggest issue, that was "lack of tooling". I've just uploaded a first Alpha version of MsgPack Explorer. Check out some screen-shots at http://www.infotopie.nl/open-source/msgpack-explorer Download the source or the Alpha binary release at https://github.com/mlsomers/LsMsgPack It's probably not production ready yet, but I'd love to get some feedback. Cheers, Louis

mlsomers avatar Jan 03 '16 01:01 mlsomers

Yet another thought: maybe a standard way of defining an index at the start or end of a file would be a good idea, allowing random access to large files. A map that specifies where certain collections or portions of the file start and their length.

Due to legacy reasons I've been looking for fast JSON parser with minimal memory usage. Such exist. But then no API for lazy loading / random access. (At least I haven't found.) Looking at MsgPack being yet faster and smaller I've stumbled over this issue. Unless there have been some progress on this I still have to go with a manually extended JSON parser to preserve start positions of items (lengths are known), which then can be loaded / parsed on demand.

andreygursky avatar Jul 15 '19 18:07 andreygursky