Skip to content

Disallow unmapped_fields=load with partial non-KEYWORD#144109

Merged
alex-spies merged 59 commits intoelastic:mainfrom
shmuelhanoch:143218-esql-unmapped-fields-load
Mar 31, 2026
Merged

Disallow unmapped_fields=load with partial non-KEYWORD#144109
alex-spies merged 59 commits intoelastic:mainfrom
shmuelhanoch:143218-esql-unmapped-fields-load

Conversation

@shmuelhanoch
Copy link
Copy Markdown
Contributor

@shmuelhanoch shmuelhanoch commented Mar 12, 2026

Fail the query when SET unmapped_fields="load" is used and the query uses a PUNK (partially unmapped non-KEYWORD field) anywhere except in KEEP and DROP.

Validation was already in place due to us treating PUNKs as union types: they require a cast to be allowed anywhere except KEEP/DROP.

Add unit and yaml tests to document behavior and guard against regressions.

Closes #143218

Fail the query when SET unmapped_fields="load" is used and any index
has a field that is partially mapped (present in some indices, unmapped
in others) and whose type where mapped is not KEYWORD. Add analyzer
rule DisallowLoadWithPartiallyMappedNonKeyword and unit tests.

Closes elastic#143218
@shmuelhanoch shmuelhanoch force-pushed the 143218-esql-unmapped-fields-load branch from 5f76a74 to 90769e3 Compare March 16, 2026 14:22
@shmuelhanoch shmuelhanoch marked this pull request as ready for review March 16, 2026 14:26
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 16, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@alex-spies alex-spies self-requested a review March 16, 2026 14:35
@shmuelhanoch shmuelhanoch changed the title ESQL: disallow unmapped_fields=load with partial non-KEYWORD Disallow unmapped_fields=load with partial non-KEYWORD Mar 18, 2026
@alex-spies alex-spies added v9.4.0 and removed v7.9.4 labels Mar 18, 2026
Copy link
Copy Markdown
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shmuelhanoch ! I think the solution is generally correct, but I have some remarks around where to put it and how to test it.

In particular, I think this requires some yaml tests to confirm the behavior is correct in end-to-end tests as well, because our unit tests are a little "academic" and require us to mimic how we create mappings very well.

resolutionsInPlan.add(r);
}
} else {
IndexResolution r = context.indexResolution().get(new IndexPattern(relation.source(), relation.indexPattern()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • When do we encounter a case with more than 1 index resolution?
  • How do we test this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testDisallowLoadCommaSeparatedIndicesWhenPartialNonKeywordUsed creates a comma-separated FROM (idx_a, idx_b) where partial_long is unmapped in idx_b.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to suggest to look at what actually happens during query execution by debugging an ES node. I think each EsRelation has only 1 relevant index resolution which uses the relation's pattern. I think having more than 1 can only happen with subqueries, e.g. FROM (FROM idx1), (FROM idx2), (FROM idx3).

But for FROM logs-* there should be only 1 index resolution. For FROM idx1,idx2 the same should hold true.

That said, this code is obsolete now anyway.

…fields-load

# Conflicts:
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerUnmappedTests.java
@alex-spies
Copy link
Copy Markdown
Contributor

alex-spies commented Mar 25, 2026

From discussing with @GalLalouche it occurred to me that we should particularly test date/nanos union types:

SET unmapped_fields="load";
FROM sample_data, sample_data_ts_nanos, no_mapping_sample_data METADATA _index
| WHERE @timestamp == "2021-01-01"::date_nanos

Here, @timestamp is date in sample_data, date_nanos in sample_data_ts_nanos, and outright unmapped in no_mapping_sample_data.

The peculiarity is that, without load, our analyzer will implicitly cast @timestamp to date_nanos. That could mess with the behavior we are implementing.

The expected behavior should still be that we fail the query because @timestamp is partially unmapped and used without a cast. (The case with an explicit cast should work but is out of scope here, and will be either part of #143693 or a follow-up.)

Update: Covered in unit tests now.

Fail the query when SET unmapped_fields="load" is used and any index
has a field that is partially mapped (present in some indices, unmapped
in others) and whose type where mapped is not KEYWORD. Add analyzer
rule DisallowLoadWithPartiallyMappedNonKeyword and unit tests.

Closes elastic#143218
- Move partially-unmapped non-KEYWORD load validation from the analyzer
  Resolution batch into Verifier (after existing load-mode checks).
- Fail only when those fields appear in FieldAttributes outside
  EsRelation (actually used by the query).
- Resolve indices from non-LOOKUP EsRelations only; pass AnalyzerContext
  for indexResolution() lookups.
- Remove DisallowLoadWithPartiallyMappedNonKeyword; add
  LoadPartialMappingChecks and a Verifier.verify overload that accepts
  AnalyzerContext.
- Extend AnalyzerUnmappedTests: FROM-only, WHERE, SORT, comma-separated
  indices, dotted paths, nullify.
- Add esql YAML REST tests for load vs partial non-KEYWORD (used vs
  unused) behind optional_fields_v2.
@shmuelhanoch shmuelhanoch force-pushed the 143218-esql-unmapped-fields-load branch from 921e12b to a83ca15 Compare March 25, 2026 16:36
Copy link
Copy Markdown
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I may be wrong, but the code in LoadPartialMappingChecks can be simpler than what is currently proposed. Notice, also, that I shortened the error message. Imho, it's too long and the second part of it can be omitted. Up to you and @alex-spies if that's the case or not.

static void checkPartialNonKeywordWithLoad(LogicalPlan plan, @Nullable AnalyzerContext context, Failures failures) {
        if (context == null || context.unmappedResolution() != UnmappedResolution.LOAD) {
            return;
        }

        Set<String> nonKeywordFieldNames = new HashSet<>();
        List<IndexResolution> resolutions = new ArrayList<>();

        plan.forEachDown(LogicalPlan.class, p -> {
            if (p instanceof EsRelation rel) {
                if (rel.indexMode() != IndexMode.LOOKUP) {
                    IndexResolution r = context.indexResolution().get(new IndexPattern(rel.source(), rel.indexPattern()));
                    if (r != null) {
                        resolutions.add(r);
                    }
                }
            } else {
                p.forEachExpression(FieldAttribute.class, fa -> {
                    if (fa.dataType() != KEYWORD) {
                        nonKeywordFieldNames.add(fa.name());
                    }
                });
            }
        });

        for (String fieldName : new TreeSet<>(nonKeywordFieldNames)) {
            for (IndexResolution resolution : resolutions) {
                if (resolution.isValid() && resolution.get().isPartiallyUnmappedField(fieldName)) {
                    failures.add(
                        fail(
                            plan,
                            "Loading partially mapped non-KEYWORD fields is not yet supported. "
                                + "unmapped_fields=\"load\" is not supported with partially mapped field [{}].",
                            fieldName
                        )
                    );
                    break;
                }
            }
        }
    }

@alex-spies alex-spies force-pushed the 143218-esql-unmapped-fields-load branch from 60d2e48 to 7a1a802 Compare March 31, 2026 08:43
@alex-spies alex-spies enabled auto-merge (squash) March 31, 2026 10:39
@alex-spies alex-spies merged commit 360fc23 into elastic:main Mar 31, 2026
35 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 31, 2026
…rics

* upstream/main: (21 commits)
  Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:external-basic.topSnippetsFunction} elastic#145353
  Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:external-basic.scoreFunction} elastic#145352
  [DiskBBQ] Fix bug in NeighborQueue#popRawAndAddRaw (elastic#145324)
  Fix dense_vector default index options when using BFLOAT16 (elastic#145202)
  Use checked exceptions in entitlement constructor rules (elastic#145234)
  ESQL: DS: datasource file plugins should not return TEXT types (elastic#145334)
  Plumb DLM error store through to DlmFrozenTransition classes (elastic#145243)
  Make Settings.Builder.remove() fluent (elastic#145294)
  Add FLS tests for METRICS_INFO and TS_INFO (elastic#145211)
  Fix flaky SecurityFeatureResetTests (elastic#145063)
  [DOCS] Fix conflict markers in ESQL processing command list (elastic#145338)
  Skip certain metric assertions on Windows (elastic#144933)
  [ES|QL] Add schema reconciliation for multi-file external sources (elastic#145220)
  Simplify DiskBBQ dynamic visit ratio to linear (elastic#142784)
  ESQL: Disallow unmapped_fields=load with partial non-KEYWORD (elastic#144109)
  [Transform] Track Linked Projects (elastic#144399)
  Fix bulk scoring to process last batch instead of falling through to scalar tail (elastic#145316)
  Clean up TickerScheduleEngineTests (elastic#145303)
  [CI] ShardBulkInferenceActionFilterIT testRestart - Ensuring that secrets-inference index is available after full restart and unmuting test (elastic#145317)
  Add CRUD doc to the DistributedArchitectureGuide (elastic#144710)
  ...
ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Apr 1, 2026
…#144109)

Add tests that ensure we fail the query when SET unmapped_fields="load"
is used and any index has a field that is partially mapped (present in some
indices, unmapped in others) and whose type where mapped is not KEYWORD.

---------

Co-authored-by: Mouhcine Aitounejjar <mouhcine.aitounejjar@elastic.co>
Co-authored-by: Alexander Spies <alexander.spies@elastic.co>
mromaios pushed a commit to mromaios/elasticsearch that referenced this pull request Apr 9, 2026
…#144109)

Add tests that ensure we fail the query when SET unmapped_fields="load"
is used and any index has a field that is partially mapped (present in some
indices, unmapped in others) and whose type where mapped is not KEYWORD.

---------

Co-authored-by: Mouhcine Aitounejjar <mouhcine.aitounejjar@elastic.co>
Co-authored-by: Alexander Spies <alexander.spies@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ESQL: disallow unmapped_fields="load" in case a field is partially mapped but not as KEYWORD and used without cast

5 participants