Add JSON_EXTRACT ES|QL scalar function#141507
Add JSON_EXTRACT ES|QL scalar function#141507quackaplop wants to merge 1 commit intoelastic:mainfrom
Conversation
|
❌ Author of the following commits did not sign a Contributor Agreement: Please, read and sign the above mentioned agreement if you want to contribute to this project |
🔍 Preview links for changed docs |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
Implements a new ES|QL function that extracts values from JSON strings using dot-notation path expressions. Supports keyword, text, and _source input types.
272314c to
3cd47c6
Compare
alex-spies
left a comment
There was a problem hiding this comment.
Small drive by as I was looking into something related.
Is this ready for review? If we add the :Analytics/ES|QL label, the team will be pinged :)
| "ROW json = \"{\\\\\"name\\\\\":\\\\\"Alice\\\\\",\\\\\"age\\\\\":30}\"\n| EVAL name = JSON_EXTRACT(json, \"name\")", | ||
| "ROW json = \"{\\\\\"user\\\\\":{\\\\\"address\\\\\":{\\\\\"city\\\\\":\\\\\"London\\\\\"}}}\"\n| EVAL city = JSON_EXTRACT(json, \"user.address.city\")" | ||
| ], | ||
| "preview" : false, |
There was a problem hiding this comment.
Straight to GA intended?
There was a problem hiding this comment.
Probably not: FN_JSON_EXTRACT(Build.current().isSnapshot())
There was a problem hiding this comment.
Fixed in the new PR #142375. Added preview = true to the @FunctionInfo annotation and moved registration to snapshotFunctions(). The generated JSON now correctly shows "preview": true, "snapshot_only": true.
There was a problem hiding this comment.
I think we should add cases for when a property is duplicated.
This is the current behavior:
row x = "{\"foo\":1, \"foo\":2}" | eval y = json_extract(x, "foo")::integer
x | y
------------------+---------------
{"foo":1, "foo":2}|1
Should this give a warning?
Also, this is different from both jq and psql. Both use the last definition of a property, not the first.
There was a problem hiding this comment.
Added test cases for duplicate keys in #142375. Current behavior is first-match (streaming parser stops at the first matching key). This is documented in both the csv-spec and unit tests. We can revisit switching to last-match semantics (to align with jq/psql) as a follow-up if desired — it would require continuing past the first match in the streaming parser rather than returning immediately.
|
|
||
| jsonExtractJsonNull | ||
| required_capability: fn_json_extract | ||
| ROW json = "{\"value\":null}" |
There was a problem hiding this comment.
Another interesting case is a null inside a JSON array.
There was a problem hiding this comment.
Added in #142375. Test case extracts index 1 from [1, null, 3] — returns ES|QL null without a warning, consistent with how we handle JSON null values elsewhere. Also updated the @FunctionInfo description and generated docs to document this behavior.
|
Not yet. But I will publish it this week. |
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
| * Support for requesting the "_size" metadata field when the mapper-size plugin is enabled. | ||
| */ | ||
| METADATA_SIZE_FIELD, | ||
| FN_JSON_EXTRACT(Build.current().isSnapshot()), |
There was a problem hiding this comment.
Was PERIODIC_EMIT_PARTIAL_AGGREGATION_RESULTS intentionally removed?
There was a problem hiding this comment.
No, that was accidental — our commit replaced it instead of appending. Fixed in #142375 by rebasing onto latest main. FN_JSON_EXTRACT is now appended at the end of the capabilities list, all existing capabilities are preserved.
| } | ||
| // Convert bracket notation to dot notation: "orders[1].item" -> "orders.1.item" | ||
| String converted = path.replace("[", ".").replace("]", ""); | ||
| return converted.split("\\."); |
There was a problem hiding this comment.
String.split is fairly expensive considering its argument is a regexp pattern that needs to be parsed.
Additionally I expect most of the cases the actual path used with this function is going to be a foldable constant (opposed to be something derived per document).
With this I suggest we have a specialized version that only process path once.
You could find examples for constant specialization in org.elasticsearch.xpack.esql.expression.function.scalar.string.Hash (process and processConstant)
There was a problem hiding this comment.
Oh, this code is a nightmare. I am not done with it yet - there are SO MANY memory copies there
There was a problem hiding this comment.
Fixed in #142375. Added a processConstant evaluator with a @Fixed ParsedPath parameter (following the Hash.process/Hash.processConstant pattern). When the path is a foldable constant, it's parsed once into a ParsedPath record and reused across all rows. Also replaced String.split(regex) with manual character iteration to avoid regex compilation overhead in the non-constant case as well.
|
This is a great functionality. But as mentioned in the issue, can it be developed in similar to 'jq' that way developers don't need to learn another format of usage? |
|
Superseded by #142375 (moved to a clean branch with correct authorship). |
|
Correct, fixed in #142375. Now properly marked as |
Superseded by #142375