Skip to content

ESQL: Fix Join references#109989

Merged
alex-spies merged 8 commits intoelastic:mainfrom
alex-spies:fix-join-references
Jun 27, 2024
Merged

ESQL: Fix Join references#109989
alex-spies merged 8 commits intoelastic:mainfrom
alex-spies:fix-join-references

Conversation

@alex-spies
Copy link
Copy Markdown
Contributor

@alex-spies alex-spies commented Jun 20, 2024

  • Make Join.references() return all the references to attributes in children used for matching the left and right hand sides.
  • Do not use leftAttribute == rightAttribute expressions to express how to match with a table in LOOKUP; these expressions get optimized away sometimes. This requires a breaking change to Join's serialization.

It is important that a call to Join.references() (resp. Join.expressions()) returns all the references (expressions) that the Join relies on; esp. the references may need to be rewritten in optimizations.

Ex.: If we wish to push down an Eval past a Join, we may need to adjust attribute names in case of name collisions:

| lookup some_table on some_field
| eval some_field = "foobar"

-> to push down, we may want to rename some_field first, like so:

| eval some_field_2 = some_field, some_field = "foobar"
| lookup some_table on some_field_2

In optimization rules, we use machinery like QueryPlan.transformExpressionsOnly, like here. This relies on a correct implementation of the info() method - which also implies correctness of references() and expressions().

Comment on lines 80 to 82
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The crux: having the join config's parts actually in the node info, rather than the join config as a whole.

This makes it so that the fields the Join relies on are actually picked up when we run expressions - and therefore also when we run references.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting rid of the List<Expression> conditions is actually necessary; in a first attempt, I had them part of Join's NodeInfo, but that made tests fail:

  • If expressions like left == right are part of the node info for Join (which they should), they are also subject to optimizations.
  • If left is a reference attribute for a literal null, its data type is NULL. If we apply FoldNull, thus left == right will be optimized to null.

Our optimizations treat our expressions as things that are to be evaluated, so replacing them by something that evaluates to the same is seen as correct. That is absolutely not okay if the expressions are meant to only symbolize a correspondence between attributes on the left and right hand sides.

Comment on lines 22 to 23
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I think having unit tests for logical plan methods is warranted; making sure that the references are computed correctly is important.

@alex-spies alex-spies mentioned this pull request Jun 25, 2024
10 tasks
@alex-spies alex-spies force-pushed the fix-join-references branch from 86d58d0 to 44196fe Compare June 27, 2024 10:13
Randomize; test Join.expressions() as well.
@alex-spies alex-spies marked this pull request as ready for review June 27, 2024 11:12
@alex-spies alex-spies requested review from astefan and nik9000 June 27, 2024 11:13
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 27, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @alex-spies, I've created a changelog YAML for you.

out.writeNamedWriteableCollection(matchFields);
out.writeCollection(conditions, (o, v) -> v.writeTo(o));
out.writeNamedWriteableCollection(leftFields);
out.writeNamedWriteableCollection(rightFields);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine because you can't use this outside of a snapshot build and we don't offer wire compatibility between snapshot builds.

@alex-spies
Copy link
Copy Markdown
Contributor Author

Thanks for the review @nik9000 !

@alex-spies alex-spies merged commit d6380ed into elastic:main Jun 27, 2024
@alex-spies alex-spies deleted the fix-join-references branch June 27, 2024 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.15.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants