Disallow unmapped_fields=load with partial non-KEYWORD#144109
Disallow unmapped_fields=load with partial non-KEYWORD#144109alex-spies merged 59 commits intoelastic:mainfrom
Conversation
8b4069d to
fee6a30
Compare
Fail the query when SET unmapped_fields="load" is used and any index has a field that is partially mapped (present in some indices, unmapped in others) and whose type where mapped is not KEYWORD. Add analyzer rule DisallowLoadWithPartiallyMappedNonKeyword and unit tests. Closes elastic#143218
5f76a74 to
90769e3
Compare
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
alex-spies
left a comment
There was a problem hiding this comment.
Thanks @shmuelhanoch ! I think the solution is generally correct, but I have some remarks around where to put it and how to test it.
In particular, I think this requires some yaml tests to confirm the behavior is correct in end-to-end tests as well, because our unit tests are a little "academic" and require us to mimic how we create mappings very well.
...a/org/elasticsearch/xpack/esql/analysis/rules/DisallowLoadWithPartiallyMappedNonKeyword.java
Outdated
Show resolved
Hide resolved
...k/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerUnmappedTests.java
Outdated
Show resolved
Hide resolved
...k/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerUnmappedTests.java
Outdated
Show resolved
Hide resolved
...k/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerUnmappedTests.java
Show resolved
Hide resolved
...a/org/elasticsearch/xpack/esql/analysis/rules/DisallowLoadWithPartiallyMappedNonKeyword.java
Outdated
Show resolved
Hide resolved
...a/org/elasticsearch/xpack/esql/analysis/rules/DisallowLoadWithPartiallyMappedNonKeyword.java
Outdated
Show resolved
Hide resolved
...a/org/elasticsearch/xpack/esql/analysis/rules/DisallowLoadWithPartiallyMappedNonKeyword.java
Outdated
Show resolved
Hide resolved
...a/org/elasticsearch/xpack/esql/analysis/rules/DisallowLoadWithPartiallyMappedNonKeyword.java
Outdated
Show resolved
Hide resolved
...a/org/elasticsearch/xpack/esql/analysis/rules/DisallowLoadWithPartiallyMappedNonKeyword.java
Outdated
Show resolved
Hide resolved
| resolutionsInPlan.add(r); | ||
| } | ||
| } else { | ||
| IndexResolution r = context.indexResolution().get(new IndexPattern(relation.source(), relation.indexPattern())); |
There was a problem hiding this comment.
- When do we encounter a case with more than 1 index resolution?
- How do we test this?
There was a problem hiding this comment.
testDisallowLoadCommaSeparatedIndicesWhenPartialNonKeywordUsed creates a comma-separated FROM (idx_a, idx_b) where partial_long is unmapped in idx_b.
There was a problem hiding this comment.
I wanted to suggest to look at what actually happens during query execution by debugging an ES node. I think each EsRelation has only 1 relevant index resolution which uses the relation's pattern. I think having more than 1 can only happen with subqueries, e.g. FROM (FROM idx1), (FROM idx2), (FROM idx3).
But for FROM logs-* there should be only 1 index resolution. For FROM idx1,idx2 the same should hold true.
That said, this code is obsolete now anyway.
…fields-load # Conflicts: # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerUnmappedTests.java
|
From discussing with @GalLalouche it occurred to me that we should particularly test date/nanos union types: Here, The peculiarity is that, without The expected behavior should still be that we fail the query because Update: Covered in unit tests now. |
Fail the query when SET unmapped_fields="load" is used and any index has a field that is partially mapped (present in some indices, unmapped in others) and whose type where mapped is not KEYWORD. Add analyzer rule DisallowLoadWithPartiallyMappedNonKeyword and unit tests. Closes elastic#143218
- Move partially-unmapped non-KEYWORD load validation from the analyzer Resolution batch into Verifier (after existing load-mode checks). - Fail only when those fields appear in FieldAttributes outside EsRelation (actually used by the query). - Resolve indices from non-LOOKUP EsRelations only; pass AnalyzerContext for indexResolution() lookups. - Remove DisallowLoadWithPartiallyMappedNonKeyword; add LoadPartialMappingChecks and a Verifier.verify overload that accepts AnalyzerContext. - Extend AnalyzerUnmappedTests: FROM-only, WHERE, SORT, comma-separated indices, dotted paths, nullify. - Add esql YAML REST tests for load vs partial non-KEYWORD (used vs unused) behind optional_fields_v2.
921e12b to
a83ca15
Compare
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Verifier.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Verifier.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Verifier.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Verifier.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Verifier.java
Outdated
Show resolved
Hide resolved
...lugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/LoadPartialMappingChecks.java
Outdated
Show resolved
Hide resolved
...src/yamlRestTest/resources/rest-api-spec/test/esql/192_unmapped_load_partial_non_keyword.yml
Outdated
Show resolved
Hide resolved
...k/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerUnmappedTests.java
Outdated
Show resolved
Hide resolved
...k/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerUnmappedTests.java
Outdated
Show resolved
Hide resolved
astefan
left a comment
There was a problem hiding this comment.
Actually, I may be wrong, but the code in LoadPartialMappingChecks can be simpler than what is currently proposed. Notice, also, that I shortened the error message. Imho, it's too long and the second part of it can be omitted. Up to you and @alex-spies if that's the case or not.
static void checkPartialNonKeywordWithLoad(LogicalPlan plan, @Nullable AnalyzerContext context, Failures failures) {
if (context == null || context.unmappedResolution() != UnmappedResolution.LOAD) {
return;
}
Set<String> nonKeywordFieldNames = new HashSet<>();
List<IndexResolution> resolutions = new ArrayList<>();
plan.forEachDown(LogicalPlan.class, p -> {
if (p instanceof EsRelation rel) {
if (rel.indexMode() != IndexMode.LOOKUP) {
IndexResolution r = context.indexResolution().get(new IndexPattern(rel.source(), rel.indexPattern()));
if (r != null) {
resolutions.add(r);
}
}
} else {
p.forEachExpression(FieldAttribute.class, fa -> {
if (fa.dataType() != KEYWORD) {
nonKeywordFieldNames.add(fa.name());
}
});
}
});
for (String fieldName : new TreeSet<>(nonKeywordFieldNames)) {
for (IndexResolution resolution : resolutions) {
if (resolution.isValid() && resolution.get().isPartiallyUnmappedField(fieldName)) {
failures.add(
fail(
plan,
"Loading partially mapped non-KEYWORD fields is not yet supported. "
+ "unmapped_fields=\"load\" is not supported with partially mapped field [{}].",
fieldName
)
);
break;
}
}
}
}
...src/yamlRestTest/resources/rest-api-spec/test/esql/192_unmapped_load_partial_non_keyword.yml
Outdated
Show resolved
Hide resolved
...src/yamlRestTest/resources/rest-api-spec/test/esql/192_unmapped_load_partial_non_keyword.yml
Outdated
Show resolved
Hide resolved
60d2e48 to
7a1a802
Compare
instead of playing whack-a-mole.
…rics
* upstream/main: (21 commits)
Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:external-basic.topSnippetsFunction} elastic#145353
Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:external-basic.scoreFunction} elastic#145352
[DiskBBQ] Fix bug in NeighborQueue#popRawAndAddRaw (elastic#145324)
Fix dense_vector default index options when using BFLOAT16 (elastic#145202)
Use checked exceptions in entitlement constructor rules (elastic#145234)
ESQL: DS: datasource file plugins should not return TEXT types (elastic#145334)
Plumb DLM error store through to DlmFrozenTransition classes (elastic#145243)
Make Settings.Builder.remove() fluent (elastic#145294)
Add FLS tests for METRICS_INFO and TS_INFO (elastic#145211)
Fix flaky SecurityFeatureResetTests (elastic#145063)
[DOCS] Fix conflict markers in ESQL processing command list (elastic#145338)
Skip certain metric assertions on Windows (elastic#144933)
[ES|QL] Add schema reconciliation for multi-file external sources (elastic#145220)
Simplify DiskBBQ dynamic visit ratio to linear (elastic#142784)
ESQL: Disallow unmapped_fields=load with partial non-KEYWORD (elastic#144109)
[Transform] Track Linked Projects (elastic#144399)
Fix bulk scoring to process last batch instead of falling through to scalar tail (elastic#145316)
Clean up TickerScheduleEngineTests (elastic#145303)
[CI] ShardBulkInferenceActionFilterIT testRestart - Ensuring that secrets-inference index is available after full restart and unmuting test (elastic#145317)
Add CRUD doc to the DistributedArchitectureGuide (elastic#144710)
...
…#144109) Add tests that ensure we fail the query when SET unmapped_fields="load" is used and any index has a field that is partially mapped (present in some indices, unmapped in others) and whose type where mapped is not KEYWORD. --------- Co-authored-by: Mouhcine Aitounejjar <mouhcine.aitounejjar@elastic.co> Co-authored-by: Alexander Spies <alexander.spies@elastic.co>
…#144109) Add tests that ensure we fail the query when SET unmapped_fields="load" is used and any index has a field that is partially mapped (present in some indices, unmapped in others) and whose type where mapped is not KEYWORD. --------- Co-authored-by: Mouhcine Aitounejjar <mouhcine.aitounejjar@elastic.co> Co-authored-by: Alexander Spies <alexander.spies@elastic.co>
Fail the query when SET unmapped_fields="load" is used and the query uses a PUNK (partially unmapped non-KEYWORD field) anywhere except in KEEP and DROP.
Validation was already in place due to us treating PUNKs as union types: they require a cast to be allowed anywhere except KEEP/DROP.
Add unit and yaml tests to document behavior and guard against regressions.
Closes #143218