DRAFT: LOAD/NULLIFY golden tests#141821
Conversation
| | \_EsRelation[employees][avg_worked_seconds{f}#25, birth_date{f}#26, emp_no{f}#27, first_name{f}#28, gender{f}#29, height{f}#30, height.float{f}#31, height.half_float{f}#32, height.scaled_float{f}#33, hire_date{f}#34, is_rehired{f}#35, job_positions{f}#36, languages{f}#37, languages.byte{f}#38, languages.long{f}#39, languages.short{f}#40, last_name{f}#41, salary{f}#42, salary_change{f}#43, salary_change.int{f}#44, salary_change.keyword{f}#45, salary_change.long{f}#46, still_hired{f}#47, does_not_exist{f}#48] | ||
| \_Limit[1000[INTEGER],false,false] | ||
| \_Project[[avg_worked_seconds{f(EsField)}#50, birth_date{f(DateEsField)}#51, emp_no{f(EsField)}#52, first_name{f(KeywordEsField)}#53, gender{f(KeywordEsField)}#54, height{f(EsField)}#55, height.float{f(EsField)}#56, height.half_float{f(EsField)}#57, height.scaled_float{f(EsField)}#58, hire_date{f(DateEsField)}#59, is_rehired{f(EsField)}#60, job_positions{f(KeywordEsField)}#61, languages{f(EsField)}#62, languages.byte{f(EsField)}#63, languages.long{f(EsField)}#64, languages.short{f(EsField)}#65, last_name{f(KeywordEsField)}#66, salary{f(EsField)}#67, salary_change{f(EsField)}#68, salary_change.int{f(EsField)}#69, salary_change.keyword{f(KeywordEsField)}#70, salary_change.long{f(EsField)}#71, still_hired{f(EsField)}#72, does_not_exist{r}#73, _fork{r}#49]] | ||
| \_Eval[[null[KEYWORD] AS does_not_exist#73]] |
There was a problem hiding this comment.
This is wrong. The field is not loaded in one of the fork branches because it wasn't mentioned inside the branch itself. But it's still needed to load the values for the final output - we cannot just fill with nulls.
There was a problem hiding this comment.
The point here is that with FORK, the underlying index is the same in all branches. We mustn't assume that an unmapped field exists and try to load it in one branch but not in the other - either we assume the index has the field or we don't.
Subqueries are less obvious but have a similar problem:
SET unmapped_fields="load";
FROM (employees), (employees | KEEP *, unmapped_field)
- It's more reasonable to say that
unmapped_fieldshould only be loaded in the second subquery as the subqueries should be independent. - However, in one subquery we say "employees actually has an unmapped field that we should load!", while in the other we ignore it. That's probably fine, except that the query plans for subqueries and
FORKare related java classes, so we need to be careful during planning to distinguish between these cases.
| \_Project[[emp_no{r}#0, language_code{r}#1]] | ||
| \_UnionAll[[avg_worked_seconds{r}#2, birth_date{r}#3, emp_no{r}#0, first_name{r}#4, gender{r}#5, height{r}#6, height.float{r}#7, height.half_float{r}#8, height.scaled_float{r}#9, hire_date{r}#10, is_rehired{r}#11, job_positions{r}#12, languages{r}#13, languages.byte{r}#14, languages.long{r}#15, languages.short{r}#16, last_name{r}#17, salary{r}#18, salary_change{r}#19, salary_change.int{r}#20, salary_change.keyword{r}#21, salary_change.long{r}#22, still_hired{r}#23, language_code{r}#1, language_name{r}#24, does_not_exist{r}#25]] | ||
| |_Project[[avg_worked_seconds{f}#26, birth_date{f}#27, emp_no{f}#28, first_name{f}#29, gender{f}#30, height{f}#31, height.float{f}#32, height.half_float{f}#33, height.scaled_float{f}#34, hire_date{f}#35, is_rehired{f}#36, job_positions{f}#37, languages{f}#38, languages.byte{f}#39, languages.long{f}#40, languages.short{f}#41, last_name{f}#42, salary{f}#43, salary_change{f}#44, salary_change.int{f}#45, salary_change.keyword{f}#46, salary_change.long{f}#47, still_hired{f}#48, language_code{r}#49, language_name{r}#50, does_not_exist{r}#51]] | ||
| | \_Eval[[null[INTEGER] AS language_code#49, null[KEYWORD] AS language_name#50, null[KEYWORD] AS does_not_exist#51]] |
There was a problem hiding this comment.
2 Problems!
Eval[[null[INTEGER] AS language_code#49
IMHO this one should actually be loaded from source, but we need to verify that this the expected behavior. This is more general and pertains to the case when a field is mapped as non-KEYWORD in one index and missing in another
Even if we load from _source in this case, we still need to autocast as well ( to int in this case).
Update: This is #141912
- We should fail queries like this for now, because loading unmapped fields and the alignment logic of UnionAll that inserts null Eval for missing attributes don't play well together. There are probably bugs, and we saw cases where the the load logic implied that a field missing in one subquery is of KEYWORD type, even though it was properly mapped in another subquery, causing mayhem during type resolution downstream.
In fact, we likely need to fail subqueries altogether until we have this figured out, otherwise we might allow queries that will later require a breaking change.
Here's a reproducer where the union all resolution for subqueries and loading unmapped fields works against each other - does_not_exist is both marked for loading AND it's overridden as NULL by the union all alignment logic (see analyzed plan):
FROM employees, (FROM languages | KEEP language_code)
| KEEP emp_no, language_code, does_not_exist
Update to 2: rather than failing the whole query if a field is unmapped in one index and mapped as non-keyword in another but not explicitly cast, we probably should be consistent with union types which treat this as an unsupported field and fill it with nulls - rather than failing the whole query.
alex-spies
left a comment
There was a problem hiding this comment.
Oops, probably should've waited until this gets un-drafted. Only had a quick look and realized this is still WIP. Will continue reviewing when we're ready :)
There was a problem hiding this comment.
Unrelated change; should this be committed?
| if (query.toUpperCase().startsWith("SET") == false) { | ||
| query = "SET unmapped_fields=\"load\"; " + query; | ||
| } |
There was a problem hiding this comment.
Leftover to remove before this can be merged.
There was a problem hiding this comment.
This also looks like an ad-hoc change; this csv was used for old tests and should probably be reverted.
| /** | ||
| * Support for field aliases in mappings. | ||
| */ | ||
| FIELD_ALIAS_SUPPORT, |
There was a problem hiding this comment.
Are we adding support for this in this PR? That should be an existing ability, no?
There was a problem hiding this comment.
Not in CsvTests :)
alex-spies
left a comment
There was a problem hiding this comment.
Woops, I had uncommitted review comments. Committing now; they'll apply to whichever PR supersedes this, though.
There was a problem hiding this comment.
This file still has multiple // TODO: golden testing, where the test case is missing any actual assertions.
We should replace these test stubs by actual golden tests.
There was a problem hiding this comment.
All but one of those have subqueries/forks/joins. Do we want to cement its behavior in golden tests?
There was a problem hiding this comment.
Ah! Good catch. No, we probably don't, given that branching queries will be disabled for load for now.
But maybe we should mark them with @AwaitsFix and point to an issue? The tests run and "succeed" right now, giving a false sense of coverage, no?
There was a problem hiding this comment.
Drive-by, but this draft is mentioned in #141911 under "spec tests for -> TS", but I don't see any queries with TS. Let's add some?
|
Moved to #143522 since the branch name was extremely misleading, and it made it hard on me to keep track. |
) This PR increases the test coverage of unmapped fields and gildens pseudo-golden tests. * Moves pseudo-golden tests from AnalyzerUnmappedTests to AnalyzerUnmappedGoldenTests. * Add more golden tests. * Add more Spec/CSV tests. Continued from #141821.
…tic#143522) This PR increases the test coverage of unmapped fields and gildens pseudo-golden tests. * Moves pseudo-golden tests from AnalyzerUnmappedTests to AnalyzerUnmappedGoldenTests. * Add more golden tests. * Add more Spec/CSV tests. Continued from elastic#141821.
…tic#143522) This PR increases the test coverage of unmapped fields and gildens pseudo-golden tests. * Moves pseudo-golden tests from AnalyzerUnmappedTests to AnalyzerUnmappedGoldenTests. * Add more golden tests. * Add more Spec/CSV tests. Continued from elastic#141821.
No description provided.