Making tag_invoke to support ondemand::document as well + docs#2228
Conversation
No. Please do not touch the singleheader files by hand. They are automatically generated. If you must updated them, use the script. But you probably should not touch them. |
|
What's the story with |
|
@the-moisrex That has to do with the amalgamation script: simdjson/singleheader/amalgamate.py Line 165 in ccf8694 You should not worry too much about it. |
That is failing all the tests though! |
Don't worry about that. Your PR is very good. |
Co-authored-by: Daniel Lemire <daniel@lemire.me>
Co-authored-by: Daniel Lemire <daniel@lemire.me>
| noexcept | ||
| #endif | ||
| { | ||
| return static_cast<document&>(*this).get<T>(); |
There was a problem hiding this comment.
That's potentially a problem, see #2229
It might allow you to hold an ondemand::array or ondemand::object while the ondemand::document is out of scope.
The workaround might be to check whether T is either ondemand::array or ondemand::object and error in these cases. That could be achieved with a static_assert?
What do you think?
There was a problem hiding this comment.
Honestly I freaked out when I saw this:
simdjson/include/simdjson/generic/ondemand/document-inl.h
Lines 170 to 176 in a33fdc2
- we're returning string_views from a moved type (which I assume it points to the parser not document itself)
- I don't understand why
std::forwardis being used, and I also why all of them don't have it
If the problem is only array and object, then static_assert would be better than the code that I mentioned that we're simply not providing specializations for array and object which the user may add themselves later!
There was a problem hiding this comment.
we're returning string_views from a moved type (which I assume it points to the parser not document itself)
That is fine because the std::string_view is owned by the parser.
I don't understand why std::forward is being used, and I also why all of them don't have it
I suspect that the std::forward<document>(*this).get_int64(); is overcomplicated code. It could be simplified.
There was a problem hiding this comment.
@the-moisrex I suspect that code simplification is certainly possible. The API itself ought to be fine.
The users do not see this code. :-)
|
@the-moisrex Don't worry too much about the failing CI. I am going to have a look soon. Let us get the design right first. |
|
@the-moisrex I think we should deprecate the rvalue get on documents. I suspect that it saw little use. |
|
@lemire should we preserve backward-compatibility when specialized directly and completely disable when a |
|
Oh, I was just starting to work on this. Happy @the-moisrex is already at it, thank you! |
|
@FranciscoThiesen Don't hesitate to chip/review, etc. |
That sounds fantastic. We would prefer not to break existing code if at all possible. Warnings are ok though. |
I'm disabling `document::get() &&` if the user has provided a `tag_invoke`d version; otherwise, we retain the compatibility.
|
Great work. |
| Suppose your custom types are `std::unique_ptr<any type>`, you could: | ||
|
|
||
| ```C++ | ||
| namespace simdjson { |
There was a problem hiding this comment.
@lemire once C++20 (and maybe even reflecton in C++26) become heavily used. Do you think we should add these powerful generic tag_invoke definitions? Or that should always be up to the user?
We could provide:
- Support for PushableContainers (https://github.com/simdjson/experimental_json_builder/blob/using_reflection_for_simdjson_ondemand/examples/example3.cpp)
- Support for unique_ptr as provided in this PR
- And even support of user defined types (:
There was a problem hiding this comment.
@the-moisrex @FranciscoThiesen Actually, I was thinking the same. My main concern is that we could be conservative because once we define something in the simdjson library, we make it difficult for users to do something different. So I don't know about "PushableContainers". That seems too general.
There was a problem hiding this comment.
Hm, what about having a out-of-the-box specialization for user defined structs that use basic types?
There was a problem hiding this comment.
Hm, what about having a out-of-the-box specialization for user defined structs that use basic types?
I have no strong opinion.
| @@ -0,0 +1,100 @@ | |||
| #ifndef SIMDJSON_ONDEMAND_DESERIALIZE_H | |||
| #define SIMDJSON_ONDEMAND_DESERIALIZE_H | |||
There was a problem hiding this comment.
| #define SIMDJSON_ONDEMAND_DESERIALIZE_H | |
| #ifndef SIMDJSON_CONDITIONAL_INCLUDE | |
| #define SIMDJSON_ON_DEMAND_DESERIALIZE_H | |
| #include "simdjson/error.h" | |
| #endif // SIMDJSON_CONDITIONAL_INCLUDE |
| #ifndef SIMDJSON_ONDEMAND_DESERIALIZE_H | ||
| #define SIMDJSON_ONDEMAND_DESERIALIZE_H | ||
|
|
||
| #include "simdjson/error.h" |
There was a problem hiding this comment.
| #include "simdjson/error.h" |
|
Btw, @the-moisrex I am leveraging your changes here (simdjson/experimental_json_builder#38). This work is a pretty important work to enable reflection for values/docuements later on) |
|
@the-moisrex @FranciscoThiesen What about adding the following (or something similar)... this would give a good basis without overdoing it. namespace simdjson {
namespace concepts {
template <typename C> struct is_vector : std::false_type {};
template <typename T> struct is_vector<std::vector<T>> : std::true_type {};
template <typename C>
concept std_vector = is_vector<C>::value;
} // namespace simdjson::concepts
template <typename T>
requires std::unsigned_integral<T>
simdjson_result<T> tag_invoke(deserialize_tag, std::type_identity<T>, ondemand::value &val) noexcept {
uint64_t x;
SIMDJSON_TRY(val.get_uint64().get(x));
if(x > std::numeric_limits<T>::max() || x < std::numeric_limits<T>::min()) {
return NUMBER_OUT_OF_RANGE;
}
return static_cast<T>(x);
}
template <typename T>
requires std::floating_point<T>
simdjson_result<T> tag_invoke(deserialize_tag, std::type_identity<T>, auto &val) noexcept{
double x;
SIMDJSON_TRY(val.get_double().get(x));
return static_cast<T>(x);
}
template <typename T>
requires std::signed_integral<T>
simdjson_result<T> tag_invoke(simdjson::deserialize_tag, std::type_identity<T>, auto &val) noexcept {
int64_t x;
SIMDJSON_TRY(val.get_int64().get(x));
if(x > std::numeric_limits<T>::max() || x < std::numeric_limits<T>::min()) {
return NUMBER_OUT_OF_RANGE;
}
return static_cast<T>(x);
}
template <>
simdjson_result<std::string> tag_invoke(deserialize_tag, std::type_identity<std::string>, auto &val) noexcept{
std::string s;
SIMDJSON_TRY(val.get_string(s));
return s;
}
template <typename T>
requires concepts::std_vector<T>
simdjson_result<T> tag_invoke(deserialize_tag, std::type_identity<T>, auto &val) {
T vec;
ondemand::array array;
SIMDJSON_TRY(val.get_array().get(array));
for (auto v : array) {
typename T::value_type value;
SIMDJSON_TRY(v.get<typename T::value_type>().get(value));
vec.push_back(value);
}
return vec;
}
}
@the-moisrex Pay attention to the serialization work we are doing. The goal in the short term is to have convention read/write to/from JSON at high speed. |
|
We could also add a few simple types like |
@lemire There's a lot more types to deserialize into; I've opened an issue #2230 |
|
I am going to merge and fix. |
|
See #2232 |
* tag_invoke based custom types (#2219) * tag_invoke based custom types Now you can use tag_invoke to add a custom type or a group of custom types. * Fixing macro usage + Fixing noexcept * Fixing the usage of #include We don't need <concepts> at all seems like it * Fixing tag_invoke impl for MSVC * Making `tag_invoke` to support `ondemand::document` as well + docs (#2228) * Making `tag_invoke` to support `ondemand::document` as well + docs * Fix typos and doc update by @lemire Co-authored-by: Daniel Lemire <daniel@lemire.me> * Better docs by @lemire Co-authored-by: Daniel Lemire <daniel@lemire.me> * Preserving the old, disallowing in the new I'm disabling `document::get() &&` if the user has provided a `tag_invoke`d version; otherwise, we retain the compatibility. --------- Co-authored-by: Daniel Lemire <daniel@lemire.me> * fix: correct small issues with deserialize (#2232) * Extending the deserialization code with more defaults + docs (#2233) * Make custom types easier with some predefined cases + docs * missing include * adding Ubuntu 24 CXX 20 * using concepts all the way * minor tweak * tiny tweak * tweaks * more tweaking * saving --------- Co-authored-by: Daniel Lemire <dlemire@lemire.me> * Making `tag_invoke` a "put" as opposed to a "get" (#2256) * fix: add tests related to issue 2227 (#2229) * fix: add tests related to issue 2227 * avoiding name clash * pedantic fix * deprecate rvalue get on document * selectively deprecating * Fix ndjson spec link (#2234) * fix ndjson spec link The link in the readme of parse_many links to a casino spam site * fix link * [no-ci] Update README.md * Make simdjson compile again * Enable SIMDJSON_SINGLEHEADER=OFF in VS Code With singleheader on, clangd can't find the right include files. * Add missing include directives to static build targets of simdjson. (#2240) * adding a warning * adding warning regarding SIMDJSON_BUILD_STATIC_LIB * release candidate * pedantic viable size * Making tag_invoke a feeder instead of a producer * adding missing undef silencer (#2253) * Ignore pragma once when amalgamating source files (#2248) With gcc it causes an error in `simdjson.cpp`: ``` simdjson.cpp:548:9: warning: #pragma once in main file 548 | #pragma once | ^~~~ ``` It had previously been commented out in: 6ef555e However, this was lost in an upgrade: 2a4ff73 * Update CI (#2254) * adding missing undef silencer * Updating CI * more fixes * fix * big endian fix * Moving to the new tag_invoke signature * Fix nlohmann ambiguity on C++23-enabled clang * Revert "Merge branch 'master' of https://github.com/simdjson/simdjson into builder_development_branch_extra" This reverts commit 3eeecba, reversing changes made to 6858b20. --------- Co-authored-by: Daniel Lemire <daniel@lemire.me> Co-authored-by: Sasha Lopoukhine <superlopuh@gmail.com> Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Tan Li Boon <undisputed-seraphim@users.noreply.github.com> Co-authored-by: tobil4sk <tobil4sk@outlook.com> * update CI on the builder_development_branch (no code change) (#2262) * typo * General madness simpler, no simpler!!! (#2267) * Minimal tag_invokes for STL types * simpler madness * adding a comment * missing file * minor tweaks to style * fixing incorrect max/min usage * updating single * simplify * validating the idea * putting back the concept * moving the include * guarding * Cheap General Madness (#2268) * Some General Concepts and their deserializations * Resolving ambiguity * Add missing #include * C++20 custom deserializer: better documentation (#2269) * mostly a documentation update. * missing cpp * [no-ci] fix comment * various minor fixes --------- Co-authored-by: Daniel Lemire <dlemire@lemire.me> --------- Co-authored-by: M. Bahoosh <12122474+the-moisrex@users.noreply.github.com> Co-authored-by: Daniel Lemire <dlemire@lemire.me> Co-authored-by: M. Bahoosh <moisrex@gmail.com> * minor update * More documentation regarding builder (#2270) * minor update * more improvment to our documentation (builder branch) * putting back missing functions * merge candidate --------- Co-authored-by: M. Bahoosh <moisrex@gmail.com> Co-authored-by: Daniel Lemire <dlemire@lemire.me> Co-authored-by: Sasha Lopoukhine <superlopuh@gmail.com> Co-authored-by: John Keiser <john@johnkeiser.com> Co-authored-by: Tan Li Boon <undisputed-seraphim@users.noreply.github.com> Co-authored-by: tobil4sk <tobil4sk@outlook.com> Co-authored-by: M. Bahoosh <12122474+the-moisrex@users.noreply.github.com>
I did mention a problem with the design of
document::get() &&(in #2227) that is a bit weird for me; I can't go back and just simply use non-constdocument::get()anymore because that might break user's codes.Now we instead of this:
We would do this:
So we can both support
documentandvaluein one move.Also, I created a
deserialize.hfile, is the file name okay, and do we need to add it anywhere? And should I do something aboutsingleheaderafter updates?