Add json_serialize_sql and first step of new Format(De)Serialization infrastructure.#6647
Merged
Mytherin merged 46 commits intoduckdb:masterfrom Mar 14, 2023
Merged
Conversation
Mytherin
reviewed
Mar 13, 2023
Collaborator
Mytherin
left a comment
There was a problem hiding this comment.
Thanks for the PR! Looks great. Could we add a verifier similar to the DeserializedStatementVerifier for this (de)serialization pass?
Collaborator
|
Thanks for the fixes! Looks great. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This adds the API for the new "formatted" serialization code to the parsed syntax tree, logical types and values.
This new api allows for easy "structural" serialization by separating the structure of serializeable classes and structs to the behaviour and format of the underlying serializer.
While I'd like to replace the current binary Serialize()/Deserialize() using FieldWriters with this system in the future, there is only one FormatSerializer implemented so far in the JSON extension, which is exposed through a
json_serialize_sql(VARCHAR, ...opts)function which allows users to run SQL through DuckDBs parser and get a JSON document with statements (or errors) back. The function also takes up to three optional arguments:skip_null := boolean: Don't include any properties that are set to nullskip_empty := boolean: Don't include any properties that are set to "empty" values, e.g. empty lists (vectors), or objects (unordered_maps)format := boolean: Pretty-print the json document using proper indentation. Does not work that well with.mode boxbut ok in.mode ascii. I Imagine it could be useful for debugging.In order to keep this PR somehow manageable I had to slash the implementation of any FormatDeserializers, but that's next on my todo, so you could actually send a modified json AST back and execute it.
Some notes and thoughts about the implementation:
This is probably some of the most template-heavy c++ code I've ever written and after much work I've managed to get it to compile even on GCC4.8, but I'm not sure if this is the most "efficient" or straightforward way to accomplish what I'm trying to do. That said, the beauty of this api is that internals and the template trait based "destructuring" logic can change without touching the AST.
CRTPstyle FormatSerializer base class because then we can't make parent nodes in the AST require that their children implement a serialization function, (since virtual functions cant be templated). We could make these traversal methods static (like we already do in the Deserialisation case) and dispatch from the parent class, but that requires the entire inheritance hierarchy to be known at compile time, which is not the case for e.g. table functions and scans implemented in extensions. I'm really tempted to try to work around this anyway but not sure if this is a dealbreaker in other aspects as well if this subsystem is to be eventually extended to other parts of the system?So I've ended up with the current implementation where common templated STL types are traversed by the base class which calls "hooks" for subclasses to implement in order to be aware of their current position in the tree and manage their own state. It's not great, but it should cover our primary use cases (json and binary serialization).