Skip to content

Making tag_invoke to support ondemand::document as well + docs#2228

Merged
lemire merged 4 commits intosimdjson:builder_development_branchfrom
the-moisrex:builder_development_branch
Aug 9, 2024
Merged

Making tag_invoke to support ondemand::document as well + docs#2228
lemire merged 4 commits intosimdjson:builder_development_branchfrom
the-moisrex:builder_development_branch

Conversation

@the-moisrex
Copy link
Member

I did mention a problem with the design of document::get() && (in #2227) that is a bit weird for me; I can't go back and just simply use non-const document::get() anymore because that might break user's codes.

Now we instead of this:

namespace simdjson {

// unique_ptr<T>
template <typename T>
auto tag_invoke(deserialize_tag, std::type_identity<std::unique_ptr<T>>, ondemand::value &val) {
  return simdjson_result{std::make_unique<T>(val.template get<T>())};
}

}

We would do this:

namespace simdjson {

// unique_ptr<T>
template <typename T>
auto tag_invoke(deserialize_tag, std::type_identity<std::unique_ptr<T>>, 
                auto &val  // <---------- could be a `document` or `value` now
) {
  return simdjson_result{std::make_unique<T>(val.template get<T>())};
}

}

So we can both support document and value in one move.

Also, I created a deserialize.h file, is the file name okay, and do we need to add it anywhere? And should I do something about singleheader after updates?

@lemire
Copy link
Member

lemire commented Aug 7, 2024

And should I do something about singleheader after updates?

No. Please do not touch the singleheader files by hand. They are automatically generated.

If you must updated them, use the script. But you probably should not touch them.

@the-moisrex
Copy link
Member Author

What's the story with SIMDJSON_CONDITIONAL_INCLUDE? Do I need it in the new deserialize.h file (and why?) or should I move the file?

@lemire
Copy link
Member

lemire commented Aug 7, 2024

@the-moisrex That has to do with the amalgamation script:

assert include.is_conditional_include, f"{self} cannot include {include} without #ifndef SIMDJSON_CONDITIONAL_INCLUDE."

You should not worry too much about it.

@the-moisrex
Copy link
Member Author

@the-moisrex That has to do with the amalgamation script:

assert include.is_conditional_include, f"{self} cannot include {include} without #ifndef SIMDJSON_CONDITIONAL_INCLUDE."

You should not worry too much about it.

That is failing all the tests though!

@lemire
Copy link
Member

lemire commented Aug 7, 2024

@the-moisrex

That is failing all the tests though!

Don't worry about that.

Your PR is very good.

the-moisrex and others added 2 commits August 7, 2024 08:52
Co-authored-by: Daniel Lemire <daniel@lemire.me>
Co-authored-by: Daniel Lemire <daniel@lemire.me>
noexcept
#endif
{
return static_cast<document&>(*this).get<T>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's potentially a problem, see #2229

It might allow you to hold an ondemand::array or ondemand::object while the ondemand::document is out of scope.

The workaround might be to check whether T is either ondemand::array or ondemand::object and error in these cases. That could be achieved with a static_assert?

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I freaked out when I saw this:

template<> simdjson_inline simdjson_result<raw_json_string> document::get() && noexcept { return get_raw_json_string(); }
template<> simdjson_inline simdjson_result<std::string_view> document::get() && noexcept { return get_string(false); }
template<> simdjson_inline simdjson_result<double> document::get() && noexcept { return std::forward<document>(*this).get_double(); }
template<> simdjson_inline simdjson_result<uint64_t> document::get() && noexcept { return std::forward<document>(*this).get_uint64(); }
template<> simdjson_inline simdjson_result<int64_t> document::get() && noexcept { return std::forward<document>(*this).get_int64(); }
template<> simdjson_inline simdjson_result<bool> document::get() && noexcept { return std::forward<document>(*this).get_bool(); }
template<> simdjson_inline simdjson_result<value> document::get() && noexcept { return get_value(); }

  1. we're returning string_views from a moved type (which I assume it points to the parser not document itself)
  2. I don't understand why std::forward is being used, and I also why all of them don't have it

If the problem is only array and object, then static_assert would be better than the code that I mentioned that we're simply not providing specializations for array and object which the user may add themselves later!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're returning string_views from a moved type (which I assume it points to the parser not document itself)

That is fine because the std::string_view is owned by the parser.

I don't understand why std::forward is being used, and I also why all of them don't have it

I suspect that the std::forward<document>(*this).get_int64(); is overcomplicated code. It could be simplified.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@the-moisrex I suspect that code simplification is certainly possible. The API itself ought to be fine.

The users do not see this code. :-)

@lemire
Copy link
Member

lemire commented Aug 7, 2024

@the-moisrex Don't worry too much about the failing CI. I am going to have a look soon.

Let us get the design right first.

@lemire
Copy link
Member

lemire commented Aug 7, 2024

@the-moisrex I think we should deprecate the rvalue get on documents. I suspect that it saw little use.

@the-moisrex
Copy link
Member Author

@lemire should we preserve backward-compatibility when specialized directly and completely disable when a tag_invoke of that type is present?

@FranciscoThiesen
Copy link
Member

Oh, I was just starting to work on this. Happy @the-moisrex is already at it, thank you!

@lemire
Copy link
Member

lemire commented Aug 7, 2024

@FranciscoThiesen Don't hesitate to chip/review, etc.

@lemire
Copy link
Member

lemire commented Aug 7, 2024

@the-moisrex

should we preserve backward-compatibility when specialized directly and completely disable when a tag_invoke of that type is present?

That sounds fantastic.

We would prefer not to break existing code if at all possible. Warnings are ok though.

I'm disabling `document::get() &&` if the user has provided a `tag_invoke`d version; otherwise, we retain the compatibility.
@the-moisrex
Copy link
Member Author

I retained backward-compatibility but disallowed the use of document::get()&& when there's a tag_invoke custom type is provided.

I did not mark it as deprecated because @lemire is doing that in #2229 which we're going to have a merge-conflict.

@lemire
Copy link
Member

lemire commented Aug 7, 2024

@the-moisrex

Great work.

Suppose your custom types are `std::unique_ptr<any type>`, you could:

```C++
namespace simdjson {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lemire once C++20 (and maybe even reflecton in C++26) become heavily used. Do you think we should add these powerful generic tag_invoke definitions? Or that should always be up to the user?

We could provide:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@the-moisrex @FranciscoThiesen Actually, I was thinking the same. My main concern is that we could be conservative because once we define something in the simdjson library, we make it difficult for users to do something different. So I don't know about "PushableContainers". That seems too general.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, what about having a out-of-the-box specialization for user defined structs that use basic types?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, what about having a out-of-the-box specialization for user defined structs that use basic types?

I have no strong opinion.

@@ -0,0 +1,100 @@
#ifndef SIMDJSON_ONDEMAND_DESERIALIZE_H
#define SIMDJSON_ONDEMAND_DESERIALIZE_H
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#define SIMDJSON_ONDEMAND_DESERIALIZE_H
#ifndef SIMDJSON_CONDITIONAL_INCLUDE
#define SIMDJSON_ON_DEMAND_DESERIALIZE_H
#include "simdjson/error.h"
#endif // SIMDJSON_CONDITIONAL_INCLUDE

#ifndef SIMDJSON_ONDEMAND_DESERIALIZE_H
#define SIMDJSON_ONDEMAND_DESERIALIZE_H

#include "simdjson/error.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "simdjson/error.h"

@FranciscoThiesen
Copy link
Member

Btw, @the-moisrex I am leveraging your changes here (simdjson/experimental_json_builder#38). This work is a pretty important work to enable reflection for values/docuements later on)

@lemire
Copy link
Member

lemire commented Aug 8, 2024

@the-moisrex @FranciscoThiesen

What about adding the following (or something similar)... this would give a good basis without overdoing it.

namespace simdjson {

namespace concepts {
template <typename C> struct is_vector : std::false_type {};
template <typename T> struct is_vector<std::vector<T>> : std::true_type {};
template <typename C>
concept std_vector = is_vector<C>::value;
} // namespace simdjson::concepts



template <typename T>
  requires std::unsigned_integral<T>
simdjson_result<T> tag_invoke(deserialize_tag, std::type_identity<T>, ondemand::value &val) noexcept {
  uint64_t x;
  SIMDJSON_TRY(val.get_uint64().get(x));
  if(x > std::numeric_limits<T>::max() || x < std::numeric_limits<T>::min()) {
    return NUMBER_OUT_OF_RANGE;
  }
  return static_cast<T>(x);
}

template <typename T>
  requires std::floating_point<T>
simdjson_result<T> tag_invoke(deserialize_tag, std::type_identity<T>, auto &val) noexcept{
  double x;
  SIMDJSON_TRY(val.get_double().get(x));
  return static_cast<T>(x);
}

template <typename T>
  requires std::signed_integral<T>
simdjson_result<T> tag_invoke(simdjson::deserialize_tag, std::type_identity<T>, auto &val) noexcept {
  int64_t x;
  SIMDJSON_TRY(val.get_int64().get(x));
  if(x > std::numeric_limits<T>::max() || x < std::numeric_limits<T>::min()) {
    return NUMBER_OUT_OF_RANGE;
  }
  return static_cast<T>(x);
}


template <>
simdjson_result<std::string> tag_invoke(deserialize_tag, std::type_identity<std::string>, auto &val) noexcept{
  std::string s;
  SIMDJSON_TRY(val.get_string(s));
  return s;
}

template <typename T>
  requires concepts::std_vector<T>
simdjson_result<T> tag_invoke(deserialize_tag, std::type_identity<T>, auto &val) {
  T vec;
  ondemand::array array;
  SIMDJSON_TRY(val.get_array().get(array));
  for (auto v : array) {
    typename T::value_type value;
    SIMDJSON_TRY(v.get<typename T::value_type>().get(value));
    vec.push_back(value);
  }
  return vec;
}
}

@the-moisrex Pay attention to the serialization work we are doing. The goal in the short term is to have convention read/write to/from JSON at high speed.

@lemire
Copy link
Member

lemire commented Aug 8, 2024

We could also add a few simple types like std::array, std::tuple... since they are simple enough that a generic implementation cannot be very much wrong. I suspect that std::map<std::string,T> could be added as well... without much harm.

@the-moisrex
Copy link
Member Author

We could also add a few simple types like std::array, std::tuple... since they are simple enough that a generic implementation cannot be very much wrong. I suspect that std::map<std::string,T> could be added as well... without much harm.

@lemire There's a lot more types to deserialize into; I've opened an issue #2230

@lemire
Copy link
Member

lemire commented Aug 9, 2024

I am going to merge and fix.

@lemire lemire merged commit 6e61b7f into simdjson:builder_development_branch Aug 9, 2024
@lemire
Copy link
Member

lemire commented Aug 9, 2024

See #2232

lemire added a commit that referenced this pull request Oct 19, 2024
* tag_invoke based custom types (#2219)

* tag_invoke based custom types

Now you can use tag_invoke to add a custom type or a group of custom types.

* Fixing macro usage + Fixing noexcept

* Fixing the usage of #include

We don't need <concepts> at all seems like it

* Fixing tag_invoke impl for MSVC

* Making `tag_invoke` to support `ondemand::document` as well + docs (#2228)

* Making `tag_invoke` to support `ondemand::document` as well + docs

* Fix typos and doc update by @lemire

Co-authored-by: Daniel Lemire <daniel@lemire.me>

* Better docs by @lemire

Co-authored-by: Daniel Lemire <daniel@lemire.me>

* Preserving the old, disallowing in the new

I'm disabling `document::get() &&` if the user has provided a `tag_invoke`d version; otherwise, we retain the compatibility.

---------

Co-authored-by: Daniel Lemire <daniel@lemire.me>

* fix: correct small issues with deserialize (#2232)

* Extending the deserialization code with more defaults + docs (#2233)

* Make custom types easier with some predefined cases + docs

* missing include

* adding Ubuntu 24 CXX 20

* using concepts all the way

* minor tweak

* tiny tweak

* tweaks

* more tweaking

* saving

---------

Co-authored-by: Daniel Lemire <dlemire@lemire.me>

* Making `tag_invoke` a "put" as opposed to a "get" (#2256)

* fix: add tests related to issue 2227 (#2229)

* fix: add tests related to issue 2227

* avoiding name clash

* pedantic fix

* deprecate rvalue get on document

* selectively deprecating

* Fix ndjson spec link (#2234)

* fix ndjson spec link

The link in the readme of parse_many links to a casino spam site

* fix link

* [no-ci] Update README.md

* Make simdjson compile again

* Enable SIMDJSON_SINGLEHEADER=OFF in VS Code

With singleheader on, clangd can't find the right
include files.

* Add missing include directives to static build targets of simdjson. (#2240)

* adding a warning

* adding warning regarding SIMDJSON_BUILD_STATIC_LIB

* release candidate

* pedantic viable size

* Making tag_invoke a feeder instead of a producer

* adding missing undef silencer (#2253)

* Ignore pragma once when amalgamating source files (#2248)

With gcc it causes an error in `simdjson.cpp`:
```
simdjson.cpp:548:9: warning: #pragma once in main file
  548 | #pragma once
      |         ^~~~
```

It had previously been commented out in:
6ef555e

However, this was lost in an upgrade:
2a4ff73

* Update CI (#2254)

* adding missing undef silencer

* Updating CI

* more fixes

* fix

* big endian fix

* Moving to the new tag_invoke signature

* Fix nlohmann ambiguity on C++23-enabled clang

* Revert "Merge branch 'master' of https://github.com/simdjson/simdjson into builder_development_branch_extra"

This reverts commit 3eeecba, reversing
changes made to 6858b20.

---------

Co-authored-by: Daniel Lemire <daniel@lemire.me>
Co-authored-by: Sasha Lopoukhine <superlopuh@gmail.com>
Co-authored-by: John Keiser <john@johnkeiser.com>
Co-authored-by: Tan Li Boon <undisputed-seraphim@users.noreply.github.com>
Co-authored-by: tobil4sk <tobil4sk@outlook.com>

* update CI on the builder_development_branch (no code change) (#2262)

* typo

* General madness simpler, no simpler!!! (#2267)

* Minimal tag_invokes for STL types

* simpler madness

* adding a comment

* missing file

* minor tweaks to style

* fixing incorrect max/min usage

* updating single

* simplify

* validating the idea

* putting back the concept

* moving the include

* guarding

* Cheap General Madness (#2268)

* Some General Concepts and their deserializations

* Resolving ambiguity

* Add missing #include

* C++20 custom deserializer: better documentation (#2269)

* mostly a documentation update.

* missing cpp

* [no-ci] fix comment

* various minor fixes

---------

Co-authored-by: Daniel Lemire <dlemire@lemire.me>

---------

Co-authored-by: M. Bahoosh <12122474+the-moisrex@users.noreply.github.com>
Co-authored-by: Daniel Lemire <dlemire@lemire.me>
Co-authored-by: M. Bahoosh <moisrex@gmail.com>

* minor update

* More documentation regarding builder (#2270)

* minor update

* more improvment to our documentation (builder branch)

* putting back missing functions

* merge candidate

---------

Co-authored-by: M. Bahoosh <moisrex@gmail.com>
Co-authored-by: Daniel Lemire <dlemire@lemire.me>
Co-authored-by: Sasha Lopoukhine <superlopuh@gmail.com>
Co-authored-by: John Keiser <john@johnkeiser.com>
Co-authored-by: Tan Li Boon <undisputed-seraphim@users.noreply.github.com>
Co-authored-by: tobil4sk <tobil4sk@outlook.com>
Co-authored-by: M. Bahoosh <12122474+the-moisrex@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants