Skip to content

ESQL: introduce support for mapping-unavailable fields#139417

Closed
bpintea wants to merge 37 commits intoelastic:mainfrom
bpintea:feat/set_unmapped_nullify
Closed

ESQL: introduce support for mapping-unavailable fields#139417
bpintea wants to merge 37 commits intoelastic:mainfrom
bpintea:feat/set_unmapped_nullify

Conversation

@bpintea
Copy link
Copy Markdown
Contributor

@bpintea bpintea commented Dec 12, 2025

This introduces support for mapping-unavailable fields (present and not mapped or just missing). The behaviour is controlled through a new SET setting unmapped_fields, which can take the values "FAIL", "NULLIFY", "LOAD".

An optional field behaves just like a "normal", mapped field, with regards to how it flows through the commands chain: it can be simply used in the commands, as if present in the source, but can no longer be referenced once dropped - explicitly, with DROP, or not selected by a KEEP, or RENAME that doesn't reference it -, or past a STATS reduction.

However, unlike a mapped field, if it's not reference at all, it won't show up in the output of a simple FROM index.

Currently, the schema difference between nullified fields and the loaded ones is in the type: nullified ones are of data type NULL, while the loaded ones are KEYWORD.
The implementation difference w.r.t. logical plan building is that the nullified fields are created as null value aliasing on top of the data source, while the loaded one are pushed as extractors into the source (this leverages the INSIST work).

The partially mapped fields are also covered: when the setting is "load", these fields will be extracted from those indices that have the field, but isn't mapped. In case there's a conflict between the loaded KEYWORD field and the mapped type in the fields that have this field mapped, an explicit conversion is needed, just like with union types.

Related: #138888

@github-actions
Copy link
Copy Markdown
Contributor

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @bpintea, I've created a changelog YAML for you.

ROW x = 1
| EVAL language_code = does_not_exist::INTEGER
| LOOKUP JOIN languages_lookup ON language_code
;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about stuff like:

SET unmapped_fields="nullify"\;
ROW x = 1
| EVAL language_code = does_not_exist::INTEGER
SET unmapped_fields="load"\;
| LOOKUP JOIN languages_lookup ON language_code
;

?

Is that allowed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not, there's just one SET allowed per statment/query.

Copy link
Copy Markdown
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some very good work in this PR, especially the attention to the different use cases and commands that can use "unmapped fields". Good job with this and the tests; I learned a bit about this feature by looking and trying out different queries from tests.

I am seeing difficulties in adding such a broad feature with different tricky/special commands to account for (fork, sub-queries, _insist and union types) and I am seeing some inconsistent behavior and outcome (see my comments). I don't think I have enough context to understand the full list of functionality for this feature, but I am seeing some things missing and I am curious on the plan for those missing bits/limitations (maybe?).

I will go over the PR one more time, at least. But, so far it is looking good, ignoring the missing bits/issues I mentioned above.

Copy link
Copy Markdown
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left some comments exploratory in nature and, if they are valid, a potential set of improvements. But can be addressed later with further work on this feature. Great work in this PR!


A comment for later discussions (if any), not a show stopper, more like a heads up for some potential confusion amongst users and weird hard-to-track-down bugs later in the development cycles.

The unmapped_message is materialized whenever KEEP is used. If I do FROM partial_mapping_sample_data | SORT unmapped_message I get an error about the field not being found. That's fair and this shows that there should be a mechanism for some "special" fields to be made available in a query command. I am also aware that this should improve with follow-up PRs.

If these "special" fields (unmapped, not visible etc) need a separate mechanism to be made available, why are we treating them equal to other fields, the "normal" ones? IMHO, to have a clean story around these unmapped fields, we need to "signal" their presence in a different way that shouldn't lead users to think that they are treated equally. With _insist command (that is snapshot only currently) this intention is very clear: _insist is a separate mechanism by which a user tells ESQL that whatever field name _insist refers to, that field is "special" in some way or another.

At some point, I think this transparent way of loading unmapped fields with keep will come back and bite us because these fields are not like the regular fields. We shouldn't treat them as regular fields, to eliminate any sort of confusion. This could also cripple our work on optimizations where we have several mechanisms to add evals to account for possible shadowed fields, where we move (potentially eliminate) commands around. This will make it easier to introduce bugs hard to track down and fix later in the development cycle.

Looking at how these fields are used and loaded and because they depend on being KEEPed in some way or another, I am wondering why the explicit reference to a potentially unmapped field is not part of FROM.

Just a humble 2c.

private static LogicalPlan nullify(LogicalPlan plan, List<UnresolvedAttribute> unresolved) {
var nullAliases = nullAliases(unresolved);

// insert an Eval on top of every LeafPlan, if there's a UnaryPlan atop it
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why you treat these two branches (unary and nary) differently? Why not in the same transformUp check?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, thanks, this is now being merged.


var transformed = load ? load(plan, unresolved) : nullify(plan, unresolved);

return transformed.equals(plan) ? plan : refreshPlan(transformed, unresolved);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be enough with transformed == plan?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, now it does, thanks.

return insisted;
}

// TODO: would an alternative to this be to drop the current Fork and have ResolveRefs#resolveFork re-resolve it. We might need
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having to re-resolve Fork (or better said, update whatever output Fork is creating) is something I encountered as well. It's a clash between the Fork's need to jump in the plan very early in the process and resolve the output columns and align them properly between its branches and the fact that some plans change/add/remove references from the plan and whatever Fork already resolved needs a "refresh".

Unfortunately, this is not the only tricky place in the ESQL planner that needs special treatment. Union types, inline stats, TS are other features that need special hand-holding.

Realistically, imho, we might be better off documenting these special treatments somewhere so that future us know how to approach stuff at that time and if needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we should probably refactor Fork for more robustness (a similar comment).

Comment on lines +205 to +221
var refreshed = refreshUnresolved(plan, unresolved);
return refreshChildren(refreshed);
}

/**
* The UAs that haven't been resolved are marked as unresolvable with a custom message. This needs to be removed for
* {@link Analyzer.ResolveRefs} to attempt again to wire them to the newly added aliases. That's what this method does.
*/
private static LogicalPlan refreshUnresolved(LogicalPlan plan, List<UnresolvedAttribute> unresolved) {
return plan.transformExpressionsOnlyUp(UnresolvedAttribute.class, ua -> {
if (unresolved.contains(ua)) {
unresolved.remove(ua);
// Besides clearing the message, we need to refresh the nameId to avoid equality with the previous plan.
// (A `new UnresolvedAttribute(ua.source(), ua.name())` would save an allocation, but is problematic with subtypes.)
ua = (ua.withId(new NameId())).withUnresolvedMessage(null);
}
return ua;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious why these two steps (refreshUnresolved for clearing UAs unresolved messages and refreshChildren) can't be performed in a single tree pass, I am clearly missing a detail that was probably obvious in some of your tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They could probably have been, though it might not have been a clear code. But refreshing the plan has been factored away.

Copy link
Copy Markdown
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it another go. Need to do at least another round, this has become huge.

Ideas for follow-up work, summarized from my comments below:

  • We may want to issue a warning if a user sets unmapped_fields:"load" but the source for one of the indices is disabled.
  • We need to figure out how to avoid having both UnresolvedNamePattern and UnresolvedPattern. I think there's a way, and in the current state, the attribute/named expression hierarchy is pretty confusing with these two co-existing.
  • Our generative tests should probably randomly prepend one of the 3 settings for unmapped_fields.
  • We want more tests for unmapped_fields=\"load\". E.g.
    • Tests for the behavior of union types when an index has an unmapped/missing field that is multi-typed based on the mapping of 2 other indices.
    • More tests that do something with the unmapped/missing fields (e.g. join or enrich on them!)
    • More tests that do something before using the unmapped fields, esp. join on an unrelated field.
    • Subqueries using an unmapped/missing field

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's add comments to the two files to inform folks that whatever's added here should probably also be reflected in unmapped_fields.csv-spec and vice versa?

@alex-spies alex-spies added test-release Trigger CI checks against release build auto-backport Automatically create backport pull requests when merged labels Jan 9, 2026
@alex-spies
Copy link
Copy Markdown
Contributor

Closing this in favor of #140463.

We can't push to this PR's branch due to permissions, so we re-created it. The copy includes all of the commits from here.

@alex-spies alex-spies closed this Jan 9, 2026
@alex-spies
Copy link
Copy Markdown
Contributor

@astefan , to answer your question in Bogdan's absence: the use case we'd like to cover is "I have a query and I'm pointing it at different index patterns". E.g. in O11y, we have fields that are used in many indices, but may be missing in others; or a roll-over in a data stream may add a field, but we want a query to still work when we run against the pre-rolled over index. Another use case are pre-defined dashboards based on ESQL queries, where you may want to pop in a different index pattern but re-use the same query.

For these cases, you'd have to INSIST essentially on a large number of fields that the query may use. It's simpler to add a SET unmapped_fields="nullify" (or "load") at the beginning and implicitly have the query add INSISTs for you whenever you mention a new column name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged ES|QL-ui Impacts ES|QL UI >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) test-release Trigger CI checks against release build v9.3.0 v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants