Skip to content

Add json_serialize_sql and first step of new Format(De)Serialization infrastructure.#6647

Merged
Mytherin merged 46 commits intoduckdb:masterfrom
Maxxen:core/json-serialization
Mar 14, 2023
Merged

Add json_serialize_sql and first step of new Format(De)Serialization infrastructure.#6647
Mytherin merged 46 commits intoduckdb:masterfrom
Maxxen:core/json-serialization

Conversation

@Maxxen
Copy link
Member

@Maxxen Maxxen commented Mar 9, 2023

This adds the API for the new "formatted" serialization code to the parsed syntax tree, logical types and values.
This new api allows for easy "structural" serialization by separating the structure of serializeable classes and structs to the behaviour and format of the underlying serializer.

While I'd like to replace the current binary Serialize()/Deserialize() using FieldWriters with this system in the future, there is only one FormatSerializer implemented so far in the JSON extension, which is exposed through a json_serialize_sql(VARCHAR, ...opts) function which allows users to run SQL through DuckDBs parser and get a JSON document with statements (or errors) back. The function also takes up to three optional arguments:

  • skip_null := boolean: Don't include any properties that are set to null
  • skip_empty := boolean: Don't include any properties that are set to "empty" values, e.g. empty lists (vectors), or objects (unordered_maps)
  • format := boolean: Pretty-print the json document using proper indentation. Does not work that well with .mode box but ok in .mode ascii. I Imagine it could be useful for debugging.

In order to keep this PR somehow manageable I had to slash the implementation of any FormatDeserializers, but that's next on my todo, so you could actually send a modified json AST back and execute it.

Some notes and thoughts about the implementation:
This is probably some of the most template-heavy c++ code I've ever written and after much work I've managed to get it to compile even on GCC4.8, but I'm not sure if this is the most "efficient" or straightforward way to accomplish what I'm trying to do. That said, the beauty of this api is that internals and the template trait based "destructuring" logic can change without touching the AST.

  • We can't use a pure virtual FormatSerializer base class since we need to be able to handle templated structs (like vectors and maps) and virtual functions cant be templated.
  • We can't extend the STL types (like vectors and maps) with "declarative" traversal logic like we do for our own classes.
  • We can't use a CRTP style FormatSerializer base class because then we can't make parent nodes in the AST require that their children implement a serialization function, (since virtual functions cant be templated). We could make these traversal methods static (like we already do in the Deserialisation case) and dispatch from the parent class, but that requires the entire inheritance hierarchy to be known at compile time, which is not the case for e.g. table functions and scans implemented in extensions. I'm really tempted to try to work around this anyway but not sure if this is a dealbreaker in other aspects as well if this subsystem is to be eventually extended to other parts of the system?
  • I've also moved a ton of enum<->string conversion functions into a dedicated header, if an enum is to be serialized a string it has to have an overload/specialization there.

So I've ended up with the current implementation where common templated STL types are traversed by the base class which calls "hooks" for subclasses to implement in order to be aware of their current position in the tree and manage their own state. It's not great, but it should cover our primary use cases (json and binary serialization).

@Maxxen Maxxen requested a review from Mytherin March 10, 2023 13:59
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks great. Could we add a verifier similar to the DeserializedStatementVerifier for this (de)serialization pass?

@Mytherin
Copy link
Collaborator

Thanks for the fixes! Looks great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants