Skip to content

Add JSON_EXTRACT ES|QL scalar function#141507

Closed
quackaplop wants to merge 1 commit intoelastic:mainfrom
quackaplop:pr/json-extract
Closed

Add JSON_EXTRACT ES|QL scalar function#141507
quackaplop wants to merge 1 commit intoelastic:mainfrom
quackaplop:pr/json-extract

Conversation

@quackaplop
Copy link
Copy Markdown
Contributor

@quackaplop quackaplop commented Jan 29, 2026

Superseded by #142375

@cla-checker-service
Copy link
Copy Markdown

cla-checker-service bot commented Jan 29, 2026

❌ Author of the following commits did not sign a Contributor Agreement:
3cd47c6

Please, read and sign the above mentioned agreement if you want to contribute to this project

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 29, 2026

🔍 Preview links for changed docs

@github-actions
Copy link
Copy Markdown
Contributor

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.4.0 labels Jan 29, 2026
Implements a new ES|QL function that extracts values from JSON strings
using dot-notation path expressions. Supports keyword, text, and _source
input types.
Copy link
Copy Markdown
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small drive by as I was looking into something related.

Is this ready for review? If we add the :Analytics/ES|QL label, the team will be pinged :)

"ROW json = \"{\\\\\"name\\\\\":\\\\\"Alice\\\\\",\\\\\"age\\\\\":30}\"\n| EVAL name = JSON_EXTRACT(json, \"name\")",
"ROW json = \"{\\\\\"user\\\\\":{\\\\\"address\\\\\":{\\\\\"city\\\\\":\\\\\"London\\\\\"}}}\"\n| EVAL city = JSON_EXTRACT(json, \"user.address.city\")"
],
"preview" : false,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Straight to GA intended?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not: FN_JSON_EXTRACT(Build.current().isSnapshot())

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! Thank you

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the new PR #142375. Added preview = true to the @FunctionInfo annotation and moved registration to snapshotFunctions(). The generated JSON now correctly shows "preview": true, "snapshot_only": true.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add cases for when a property is duplicated.

This is the current behavior:

row x = "{\"foo\":1, \"foo\":2}" | eval y = json_extract(x, "foo")::integer

        x         |       y       
------------------+---------------
{"foo":1, "foo":2}|1 

Should this give a warning?

Also, this is different from both jq and psql. Both use the last definition of a property, not the first.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test cases for duplicate keys in #142375. Current behavior is first-match (streaming parser stops at the first matching key). This is documented in both the csv-spec and unit tests. We can revisit switching to last-match semantics (to align with jq/psql) as a follow-up if desired — it would require continuing past the first match in the streaming parser rather than returning immediately.


jsonExtractJsonNull
required_capability: fn_json_extract
ROW json = "{\"value\":null}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another interesting case is a null inside a JSON array.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in #142375. Test case extracts index 1 from [1, null, 3] — returns ES|QL null without a warning, consistent with how we handle JSON null values elsewhere. Also updated the @FunctionInfo description and generated docs to document this behavior.

@quackaplop
Copy link
Copy Markdown
Contributor Author

Not yet. But I will publish it this week.

@idegtiarenko idegtiarenko added >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Feb 2, 2026
@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Feb 2, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

* Support for requesting the "_size" metadata field when the mapper-size plugin is enabled.
*/
METADATA_SIZE_FIELD,
FN_JSON_EXTRACT(Build.current().isSnapshot()),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was PERIODIC_EMIT_PARTIAL_AGGREGATION_RESULTS intentionally removed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that was accidental — our commit replaced it instead of appending. Fixed in #142375 by rebasing onto latest main. FN_JSON_EXTRACT is now appended at the end of the capabilities list, all existing capabilities are preserved.

}
// Convert bracket notation to dot notation: "orders[1].item" -> "orders.1.item"
String converted = path.replace("[", ".").replace("]", "");
return converted.split("\\.");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String.split is fairly expensive considering its argument is a regexp pattern that needs to be parsed.
Additionally I expect most of the cases the actual path used with this function is going to be a foldable constant (opposed to be something derived per document).
With this I suggest we have a specialized version that only process path once.

You could find examples for constant specialization in org.elasticsearch.xpack.esql.expression.function.scalar.string.Hash (process and processConstant)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this code is a nightmare. I am not done with it yet - there are SO MANY memory copies there

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #142375. Added a processConstant evaluator with a @Fixed ParsedPath parameter (following the Hash.process/Hash.processConstant pattern). When the path is a foldable constant, it's parsed once into a ParsedPath record and reused across all rows. Also replaced String.split(regex) with manual character iteration to avoid regex compilation overhead in the non-constant case as well.

@quackaplop quackaplop marked this pull request as draft February 2, 2026 15:50
@getkub
Copy link
Copy Markdown

getkub commented Feb 8, 2026

This is a great functionality. But as mentioned in the issue, can it be developed in similar to 'jq' that way developers don't need to learn another format of usage?

@quackaplop
Copy link
Copy Markdown
Contributor Author

Superseded by #142375 (moved to a clean branch with correct authorship).

@quackaplop
Copy link
Copy Markdown
Contributor Author

Correct, fixed in #142375. Now properly marked as preview = true and registered in snapshotFunctions().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants