Disallow unmapped_fields="load" with full-text search/MATCH#143810
Conversation
80a166b to
324bc55
Compare
324bc55 to
db073ff
Compare
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
Hi @shmuelhanoch, I've created a changelog YAML for you. |
7c13171 to
17c80e0
Compare
| * Reproducer for #141927: with unmapped_fields=load, full-text search (match operator) must fail at analysis | ||
| * instead of returning empty results. | ||
| */ | ||
| public void testUnmappedFieldsLoadWithMatchOperatorFails() { |
There was a problem hiding this comment.
Could we add few more tests here? Not necessarily more test functions, because we can verificationFailure(...) many times within the same function and pass many queries, I'd suspect there is a good number of interesting cases we want covered.
There was a problem hiding this comment.
Done, I added one for each forbidden full-text function.
| throw new VerificationException( | ||
| List.of( | ||
| Failure.fail( | ||
| fullTextFunctions.get(0), |
There was a problem hiding this comment.
Looks like only the first function gets reported. But also, is finding a full-text function anywhere in query necessary and sufficient to reject it? Or do we want the reject only if they're applied to the unmapped field?
There was a problem hiding this comment.
Only rejecting if applied to an unmapped field would certainly be better, but I wouldn't go there for now.
There's FTFs like KQL which can refer to an unmapped field inside the query string, as in WHERE KQL("author: Faulkner") (taken straight from our docs). author might be unmapped and it's not fully clear how we should behave then.
There was a problem hiding this comment.
Change to report every full-text function in the plan.
alex-spies
left a comment
There was a problem hiding this comment.
Thanks @shmuelhanoch ! This gets the job done, but I'd like to increase test coverage and would also like to place the new verification into a slightly more appropriate place before merging. I added details below :)
| BitSet partialMetrics = new BitSet(FeatureMetric.values().length); | ||
| return verify(execute(plan), gatherPreAnalysisMetrics(plan, partialMetrics)); | ||
| LogicalPlan analyzed = execute(plan); | ||
| if (context().unmappedResolution() == UnmappedResolution.LOAD) { |
There was a problem hiding this comment.
This is not the best place to add an additional verification. Analyzer#analyze is a pretty high-level method which has a simple workflow:
- analyze
- verify and gather usage data
We should keep it this way and not add low-level verifications to high-level execution logic.
We have 2 typical places for verifications that come to mind:
- inside
Verifier#verify(there's a section marked withConcrete verifications) - inside functions and commands themselves by implementing PostAnalysisPlanVerificationAware and PostAnalysisVerificationAware.
Now, neither of these places have access to the query settings. We could plug the query settings in there, and in fact it might become required anyway.
However, I think there's a simpler way.
We could add a verification step inside Verifier#verify which checks for the presence of full-text functions in case that the load setting had any effect on the query plan. More specifically, load will place FieldAttribute instances into the EsRelation which contain a PotentiallyUnmappedKeywordEsField. Such an attribute is thus marked as "will try to load from _source during execution". If we find any such field attribute in an EsRelation, we know that load was used, we do try to load some fields from _source, and we can thus fail the query if it contains disallowed functions or commands.
This also means that we should add positive tests for queries where load did not have any effect (because all fields in the query were, in fact, mapped), because full-text functions will not be forbidden in such queries.
There was a problem hiding this comment.
Update: #144115 adds a new section inside Verifier#verify where we disallow not-yet allowed commands. I think we can wait on #144115 and move this PR's new validation next to it.
This is a little simpler than my original suggestion in that it will keep this PR's logic: disallow full-text functions if the query uses SET unmapped_fields="load". No need to check EsRelations for the presence of PotentiallyUnmappedKeywordEsFields in FieldAttributes.
There was a problem hiding this comment.
OK, I moved the check to a Verifier#verify.
...k/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerUnmappedTests.java
Outdated
Show resolved
Hide resolved
| * Reproducer for #141927: with unmapped_fields=load, full-text search (match operator) must fail at analysis | ||
| * instead of returning empty results. |
There was a problem hiding this comment.
Appreciate the comment pointing to the issue!
super nit: I opened #144121 to explain that we plan to increase support for full-text functions. We could link to it as well so that future-us knows that the test will need changes in the future.
c81800e to
2990547
Compare
alex-spies
left a comment
There was a problem hiding this comment.
Almost there! I have only few final comments and, once addressed, this certainly LGTM!
| } | ||
|
|
||
| if (unmappedResolution == UnmappedResolution.LOAD) { | ||
| checkLoadModeDisallowedCommands(plan, failures); |
There was a problem hiding this comment.
Here's the existing check by Andrei.
I suggest we add another helper method checkLoadModeDisallowedFunctions and call it here.
There was a problem hiding this comment.
OK, extracted into checkLoadModeDisallowedFunctions here
| // Disallow full-text search when unmapped_fields=load. We do not restrict to "only when the FTF | ||
| // is applied to an unmapped field" because FTFs like KQL can reference unmapped fields inside the | ||
| // query string (e.g. KQL("author: Faulkner")), and the desired behavior there is unclear. |
There was a problem hiding this comment.
++, thanks for commenting with the reason for the code here. I think this will help future us (and others!) understand the choices we made.
| List<FullTextFunction> fullTextFunctions = new ArrayList<>(); | ||
| plan.forEachExpressionDown(FullTextFunction.class, fullTextFunctions::add); | ||
| if (fullTextFunctions.isEmpty() == false) { | ||
| String message = "unmapped_fields=\"load\" is not allowed with full-text search (MATCH, match operator `:`, " |
There was a problem hiding this comment.
Rather than mentioning MATCH specifically in the error message, why don't we tell the user which full-text function they used?
Ideally, our tests should also assert that the right full text function was mentioned in the error message.
When reporting user input in error messages, we typically enclose it in brackets, like so:
unmapped_fields="load" does not support full-text search function [KQL]; use "fail" or "nullify"
except that the KQL needs to be replaced by the actual function that was used.
There was a problem hiding this comment.
Sure, I fixed this to show the actual function name and check it in the tests
74504ad to
19c271f
Compare
alex-spies
left a comment
There was a problem hiding this comment.
LGTM! Thanks @shmuelhanoch !
…lastic#141927) Fail at analysis instead of returning empty results. Add unit test testUnmappedFieldsLoadWithMatchOperatorFails in AnalyzerUnmappedTests.
- Move unmapped_fields=load + full-text verification from Analyzer to Verifier (new verify(plan, partialMetrics, unmappedResolution) overload). - Report all full-text function sites in the plan, not only the first. - Test each forbidden full-text function: match op, match, match_phrase, multi_match, qstr, kql, knn (capability-guarded where needed). - Add verificationFailureWithMapping helper for KNN test; Javadoc elastic#144121.
- Extract full-text check into checkLoadModeDisallowedFunctions helper - Error message shows actual function name in brackets (e.g. [KQL], [MATCH]) - Tests assert correct function name in each failure message
19c271f to
57aaf9b
Compare
…143810) * ESQL: fail when unmapped_fields=load is used with full-text search (elastic#141927) Fail at analysis instead of returning empty results. Add unit test testUnmappedFieldsLoadWithMatchOperatorFails in AnalyzerUnmappedTests. * Review fixes: - Move unmapped_fields=load + full-text verification from Analyzer to Verifier (new verify(plan, partialMetrics, unmappedResolution) overload). - Report all full-text function sites in the plan, not only the first. - Test each forbidden full-text function: match op, match, match_phrase, multi_match, qstr, kql, knn (capability-guarded where needed). - Add verificationFailureWithMapping helper for KNN test; Javadoc elastic#144121. * Fix Verifier Javadoc and AnalyzerUnmappedTests for CI * More review fixes: - Extract full-text check into checkLoadModeDisallowedFunctions helper - Error message shows actual function name in brackets (e.g. [KQL], [MATCH]) - Tests assert correct function name in each failure message
Fail at analysis instead of returning empty results.
Add unit test testUnmappedFieldsLoadWithMatchOperatorFails in AnalyzerUnmappedTests.
Closes #141927