Skip to content

Adding extract_into functionality + test (targets simdjson >= 4.0 as it relies on reflection)#2471

Merged
FranciscoThiesen merged 3 commits intomasterfrom
francisco/extract_into
Sep 26, 2025
Merged

Adding extract_into functionality + test (targets simdjson >= 4.0 as it relies on reflection)#2471
FranciscoThiesen merged 3 commits intomasterfrom
francisco/extract_into

Conversation

@FranciscoThiesen
Copy link
Member

No description provided.

@FranciscoThiesen
Copy link
Member Author

taking a shot at #2456

@the-moisrex
Copy link
Member

Cool. I guess we're going with extract_into out of all of those ideas.

Then let me expand on the idea and see if any of these ideas are any good:

// call `setMake` and `set_model` member functions
obj.extract_into<"make:setMake", "model:set_model">(car);

// extract "user"."name" and put it into "username" field
obj.extract_into<"user.name:username">(user);

// extract the first driver and call "setDriver" with that
obj.extract_into<"drivers.0:setDriver">(car);

// extract the first driver and call "setDriver" on `drivers` member field with that
obj.extract_into<"drivers.0:drivers.setDriver">(car);

You get the gist, this is pretty much a custom DSL which we could even generalize even more by instead of a string, we make it something more structured or even a function that returns something structured, and the string would simply be converted to that struct before things happen.

But I kinda don't like the idea of designing a custom DSL or even a full query language for SIMDJSON just yet.

But I do see the potential of this, and I don't want to waste this potential by under-generalizing it, and at the same time, over-generalizing it may backfire just as easily.

What do you guys think?


Also, question: will this work with custom deserializable types? If Car had a tag_invoke, we could just use it in User{ Car car; } for example, and it would work, right?

// Try to find and extract the field
if constexpr (concepts::optional_type<decltype(out.[:mem:])>) {
// For optional fields, it's ok if they're missing
auto field_result = (*this)[key];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but I think it could also be...

auto field_result = find_field_unordered(key);

@lemire
Copy link
Member

lemire commented Sep 25, 2025

Also, question: will this work with custom deserializable types? If Car had a tag_invoke, we could just use it in User{ Car car; } for example, and it would work, right?

It should !!! Of course, it might be beneficial to include a test, but I don't see why it would not work.

Then let me expand on the idea and see if any of these ideas are any good:

That's cool. I really like:

// extract "user"."name" and put it into "username" field
obj.extract_into<"user.name:username">(user);

Could even be...

// extract "SomeField" and put it into "username" field
obj.extract_into<"SomeField:username">(user);

This seems quite elegant and could even be used more generally in the generic get<>() method, and maybe in the simdjson::from construction (although, this would require some engineering).

I think it is much more elegant to put these instructions in the function call than to annotate one's objects.

I am less certain about the function calls that you document. Currently, rolling your own custom serialization/deserialization code is not pretty. We could/should make it more elegant... but until we have real-world feedback, I am not sure we should invest a lot of effort trying to guess what is useful.

So my question at this stage is... do we expect that what @FranciscoThiesen implemented blocks any of these future ideas ?

I think not, right? If so, if we think that we should go ahead and adopt's Francisco's idea.

@lemire
Copy link
Member

lemire commented Sep 25, 2025

I think that we should merge this and release it, but let us give it a bit of time.

Copy link
Member

@the-moisrex the-moisrex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it has a different name that the .extract idea, then @lemire is correct, this should not prevent any future iterations of this idea.

Good work.

@lemire
Copy link
Member

lemire commented Sep 25, 2025

We will want to extend it from object to document and add documentation. I can do this later.

@FranciscoThiesen at your convenience, please review the comments. (No rush.)

@FranciscoThiesen
Copy link
Member Author

Also, question: will this work with custom deserializable types? If Car had a tag_invoke, we could just use it in User{ Car car; } for example, and it would work, right?

It should !!! Of course, it might be beneficial to include a test, but I don't see why it would not work.

Then let me expand on the idea and see if any of these ideas are any good:

That's cool. I really like:

// extract "user"."name" and put it into "username" field
obj.extract_into<"user.name:username">(user);

Could even be...

// extract "SomeField" and put it into "username" field
obj.extract_into<"SomeField:username">(user);

This seems quite elegant and could even be used more generally in the generic get<>() method, and maybe in the simdjson::from construction (although, this would require some engineering).

I think it is much more elegant to put these instructions in the function call than to annotate one's objects.

I am less certain about the function calls that you document. Currently, rolling your own custom serialization/deserialization code is not pretty. We could/should make it more elegant... but until we have real-world feedback, I am not sure we should invest a lot of effort trying to guess what is useful.

So my question at this stage is... do we expect that what @FranciscoThiesen implemented blocks any of these future ideas ?

I think not, right? If so, if we think that we should go ahead and adopt's Francisco's idea.

I tried the custom setters and also nested fields, but that didn't work out well on a first attempt. Also not sure if the p2996 has support for every predicate that is described in the P2996 paper (R13). I believe it's worth keeping an issue open for that and trying again later on, maybe with more mature compiler support (or on a day with divine inspiration)

@FranciscoThiesen FranciscoThiesen merged commit 88a1b3e into master Sep 26, 2025
156 checks passed
@FranciscoThiesen FranciscoThiesen deleted the francisco/extract_into branch September 26, 2025 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants