Skip to content

DRAFT: LOAD/NULLIFY golden tests#141821

Closed
GalLalouche wants to merge 28 commits intoelastic:mainfrom
GalLalouche:tests/golden_name_ids
Closed

DRAFT: LOAD/NULLIFY golden tests#141821
GalLalouche wants to merge 28 commits intoelastic:mainfrom
GalLalouche:tests/golden_name_ids

Conversation

@GalLalouche
Copy link
Copy Markdown
Contributor

No description provided.

| \_EsRelation[employees][avg_worked_seconds{f}#25, birth_date{f}#26, emp_no{f}#27, first_name{f}#28, gender{f}#29, height{f}#30, height.float{f}#31, height.half_float{f}#32, height.scaled_float{f}#33, hire_date{f}#34, is_rehired{f}#35, job_positions{f}#36, languages{f}#37, languages.byte{f}#38, languages.long{f}#39, languages.short{f}#40, last_name{f}#41, salary{f}#42, salary_change{f}#43, salary_change.int{f}#44, salary_change.keyword{f}#45, salary_change.long{f}#46, still_hired{f}#47, does_not_exist{f}#48]
\_Limit[1000[INTEGER],false,false]
\_Project[[avg_worked_seconds{f(EsField)}#50, birth_date{f(DateEsField)}#51, emp_no{f(EsField)}#52, first_name{f(KeywordEsField)}#53, gender{f(KeywordEsField)}#54, height{f(EsField)}#55, height.float{f(EsField)}#56, height.half_float{f(EsField)}#57, height.scaled_float{f(EsField)}#58, hire_date{f(DateEsField)}#59, is_rehired{f(EsField)}#60, job_positions{f(KeywordEsField)}#61, languages{f(EsField)}#62, languages.byte{f(EsField)}#63, languages.long{f(EsField)}#64, languages.short{f(EsField)}#65, last_name{f(KeywordEsField)}#66, salary{f(EsField)}#67, salary_change{f(EsField)}#68, salary_change.int{f(EsField)}#69, salary_change.keyword{f(KeywordEsField)}#70, salary_change.long{f(EsField)}#71, still_hired{f(EsField)}#72, does_not_exist{r}#73, _fork{r}#49]]
\_Eval[[null[KEYWORD] AS does_not_exist#73]]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. The field is not loaded in one of the fork branches because it wasn't mentioned inside the branch itself. But it's still needed to load the values for the final output - we cannot just fill with nulls.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point here is that with FORK, the underlying index is the same in all branches. We mustn't assume that an unmapped field exists and try to load it in one branch but not in the other - either we assume the index has the field or we don't.

Subqueries are less obvious but have a similar problem:

SET unmapped_fields="load";
FROM (employees), (employees | KEEP *, unmapped_field)
  • It's more reasonable to say that unmapped_field should only be loaded in the second subquery as the subqueries should be independent.
  • However, in one subquery we say "employees actually has an unmapped field that we should load!", while in the other we ignore it. That's probably fine, except that the query plans for subqueries and FORK are related java classes, so we need to be careful during planning to distinguish between these cases.

\_Project[[emp_no{r}#0, language_code{r}#1]]
\_UnionAll[[avg_worked_seconds{r}#2, birth_date{r}#3, emp_no{r}#0, first_name{r}#4, gender{r}#5, height{r}#6, height.float{r}#7, height.half_float{r}#8, height.scaled_float{r}#9, hire_date{r}#10, is_rehired{r}#11, job_positions{r}#12, languages{r}#13, languages.byte{r}#14, languages.long{r}#15, languages.short{r}#16, last_name{r}#17, salary{r}#18, salary_change{r}#19, salary_change.int{r}#20, salary_change.keyword{r}#21, salary_change.long{r}#22, still_hired{r}#23, language_code{r}#1, language_name{r}#24, does_not_exist{r}#25]]
|_Project[[avg_worked_seconds{f}#26, birth_date{f}#27, emp_no{f}#28, first_name{f}#29, gender{f}#30, height{f}#31, height.float{f}#32, height.half_float{f}#33, height.scaled_float{f}#34, hire_date{f}#35, is_rehired{f}#36, job_positions{f}#37, languages{f}#38, languages.byte{f}#39, languages.long{f}#40, languages.short{f}#41, last_name{f}#42, salary{f}#43, salary_change{f}#44, salary_change.int{f}#45, salary_change.keyword{f}#46, salary_change.long{f}#47, still_hired{f}#48, language_code{r}#49, language_name{r}#50, does_not_exist{r}#51]]
| \_Eval[[null[INTEGER] AS language_code#49, null[KEYWORD] AS language_name#50, null[KEYWORD] AS does_not_exist#51]]
Copy link
Copy Markdown
Contributor

@alex-spies alex-spies Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 Problems!

Eval[[null[INTEGER] AS language_code#49
  1. IMHO this one should actually be loaded from source, but we need to verify that this the expected behavior. This is more general and pertains to the case when a field is mapped as non-KEYWORD in one index and missing in another

Even if we load from _source in this case, we still need to autocast as well ( to int in this case).

Update: This is #141912

  1. We should fail queries like this for now, because loading unmapped fields and the alignment logic of UnionAll that inserts null Eval for missing attributes don't play well together. There are probably bugs, and we saw cases where the the load logic implied that a field missing in one subquery is of KEYWORD type, even though it was properly mapped in another subquery, causing mayhem during type resolution downstream.

In fact, we likely need to fail subqueries altogether until we have this figured out, otherwise we might allow queries that will later require a breaking change.

Here's a reproducer where the union all resolution for subqueries and loading unmapped fields works against each other - does_not_exist is both marked for loading AND it's overridden as NULL by the union all alignment logic (see analyzed plan):

FROM employees, (FROM languages | KEEP language_code)
            | KEEP emp_no, language_code, does_not_exist

Update to 2: rather than failing the whole query if a field is unmapped in one index and mapped as non-keyword in another but not explicitly cast, we probably should be consistent with union types which treat this as an unsupported field and fill it with nulls - rather than failing the whole query.

Copy link
Copy Markdown
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, probably should've waited until this gets un-drafted. Only had a quick look and realized this is still WIP. Will continue reviewing when we're ready :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change; should this be committed?

Comment on lines +374 to +376
if (query.toUpperCase().startsWith("SET") == false) {
query = "SET unmapped_fields=\"load\"; " + query;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover to remove before this can be merged.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also looks like an ad-hoc change; this csv was used for old tests and should probably be reverted.

/**
* Support for field aliases in mappings.
*/
FIELD_ALIAS_SUPPORT,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we adding support for this in this PR? That should be an existing ability, no?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in CsvTests :)

Copy link
Copy Markdown
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops, I had uncommitted review comments. Committing now; they'll apply to whichever PR supersedes this, though.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file still has multiple // TODO: golden testing, where the test case is missing any actual assertions.

We should replace these test stubs by actual golden tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All but one of those have subqueries/forks/joins. Do we want to cement its behavior in golden tests?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Good catch. No, we probably don't, given that branching queries will be disabled for load for now.

But maybe we should mark them with @AwaitsFix and point to an issue? The tests run and "succeed" right now, giving a false sense of coverage, no?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by, but this draft is mentioned in #141911 under "spec tests for -> TS", but I don't see any queries with TS. Let's add some?

@GalLalouche
Copy link
Copy Markdown
Contributor Author

Moved to #143522 since the branch name was extremely misleading, and it made it hard on me to keep track.

GalLalouche added a commit that referenced this pull request Mar 14, 2026
)

This PR increases the test coverage of unmapped fields and gildens pseudo-golden tests.

* Moves pseudo-golden tests from AnalyzerUnmappedTests to AnalyzerUnmappedGoldenTests.
* Add more golden tests.
* Add more Spec/CSV tests.

Continued from #141821.
ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Mar 16, 2026
…tic#143522)

This PR increases the test coverage of unmapped fields and gildens pseudo-golden tests.

* Moves pseudo-golden tests from AnalyzerUnmappedTests to AnalyzerUnmappedGoldenTests.
* Add more golden tests.
* Add more Spec/CSV tests.

Continued from elastic#141821.
michalborek pushed a commit to michalborek/elasticsearch that referenced this pull request Mar 23, 2026
…tic#143522)

This PR increases the test coverage of unmapped fields and gildens pseudo-golden tests.

* Moves pseudo-golden tests from AnalyzerUnmappedTests to AnalyzerUnmappedGoldenTests.
* Add more golden tests.
* Add more Spec/CSV tests.

Continued from elastic#141821.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants