Adding on-demand support for a subset of json path.#2127
Adding on-demand support for a subset of json path.#2127lemire merged 18 commits intosimdjson:masterfrom
Conversation
- Not yet tested - Not yet compiled - Not yet used in ondemand array/object/etc...
Pending: - Building - Adding tests for each of the on-demand classes usage of at_path - Properly documenting the subset of json path that is currently supported.
|
Will check the failures tomorrow |
…n function. Now json_path_to_pointer_conversion returns a std::string.
|
@FranciscoThiesen Looks very good to me. Please consider my comments. (All minor stuff so far.) |
|
@mbasmanova Would you have a look? At least at the documentation? There is still time to clarify the documentation, adjust the API and so forth. |
|
Thanks for the review @lemire, I didn't know about the string suffix format so I learned something new (: Will make the adjustments later today! |
Look at my PR, it should become clear... Let me repeat myself: we can't use it inside the library because we require that the library be compatible with C++11... but I fully expect nearly everyone that uses simdutf to be at C++17 right now... or better. |
mbasmanova
left a comment
There was a problem hiding this comment.
@FranciscoThiesen @lemire Francisco, thank you for working on this. Daniel, thank you for enabling this work. I read the docs and found them very nice and easy to understand. I made a few tiny comments.
| A JSON Path is a sequence of segments each starting with the '/' character. Within arrays, an integer | ||
| A JSON Pointer path is a sequence of segments each starting with the '/' character. Within arrays, an integer | ||
| index allows you to select the indexed node. Within objects, the string value of the key allows you to | ||
| select the value. If your keys contain the characters '/' or '~', they must be escaped as '~1' and |
There was a problem hiding this comment.
If your keys contain the characters '/' or '~'
I understand that / is used as a separator, hence, needs escaping, but I wonder why ~ needs escaping.
Curious, do we allow unicode characters?
There was a problem hiding this comment.
This comes from the json pointer spec: https://datatracker.ietf.org/doc/html/rfc6901#section-4
Since we use ~1 to escape '/', there has to be a way for the user to specify ~1 as part of a valid string. Imagine that we want to specify the following path "/someField~1/...", without having the ~1 being replaced by "/". For this to happen, we need to escape the ~ as ~0 and then the string can be represented as /someField~01/
@lemire can you clarify regarding unicode character support? (I see it in the JSON pointer spec, but you have a lot more context here)
There was a problem hiding this comment.
Since we use '~1' to escape '/', there has to be a way for the user to specify...
Got it. Thanks.
There was a problem hiding this comment.
Oh, the markdown text for the message got super weird from using ~. Just updated the comment and it seems that the strikethrough effect is now gone from the markdown format.
There was a problem hiding this comment.
@lemire can you chime in here regarding @mbasmanova question on unicode characters?
There was a problem hiding this comment.
(Regarding my comment, I know we mean JSON Path, but I was answering regarding what exists currently, prior to this PR.)
There was a problem hiding this comment.
@FranciscoThiesen @mbasmanova My statement was incorrect. We will support unicode (UTF-8) just fine, but not if there are escaped Unicode sequences.
There was a problem hiding this comment.
So the key "école" would be matched. No problem. But only if it appears byte-by-byte identical in the path and in the JSON document.
It has to be UTF-8 throughout. We don't do normalization. We don't escape.
There was a problem hiding this comment.
@lemire Goti it. Thank you for clarifying. If not too much trouble, it would be nice to document this.
There was a problem hiding this comment.
It will be documented for the next release.
|
@FranciscoThiesen I suggest you have a look at @mbasmanova's comments. If needed, I can chime in/help. |
|
Thank you @mbasmanova for the review! Will address all the comments later today. |
|
@FranciscoThiesen Do you recommend merging? |
|
@lemire yes Regarding improvements that I can do in a follow-up PR, so far I have:
Also, do you think that adding at_path support for DOM makes sense, or it would be more interesting for me to work on any other issues? |
Sure !!! If you have lots of talent and energy, the most sought after issue is #1386 |
|
Merging. I will improve the documentation before the release. |
|
The documentation regarding Unicode characters is coming: #2132 Note that I changed "JSON Path" to "JSONPath" throughout as the in-progress RFC uses JSONPath and we tend to follow the conventions present in the RFCs. |
Implementation for #2070
After some reviews I can also introduce this in the DOM (since it won't be much different)