Skip to content

Implicit key type conversion for join using step in pipeline#19885

Merged
vdimir merged 24 commits intomasterfrom
vdimir/join-cast-types-v2
Mar 9, 2021
Merged

Implicit key type conversion for join using step in pipeline#19885
vdimir merged 24 commits intomasterfrom
vdimir/join-cast-types-v2

Conversation

@vdimir
Copy link
Copy Markdown
Member

@vdimir vdimir commented Jan 31, 2021

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Supports implicit key type conversion for JOIN. Closes #18567.

Ref PR #18672

@robot-clickhouse robot-clickhouse added doc-alert pr-feature Pull request with new product feature labels Jan 31, 2021
@vdimir vdimir force-pushed the vdimir/join-cast-types-v2 branch from 35ac215 to 8d7a135 Compare February 3, 2021 11:40
@robot-clickhouse robot-clickhouse added the submodule changed At least one submodule changed in this PR. label Feb 3, 2021
@vdimir vdimir force-pushed the vdimir/join-cast-types-v2 branch from 8d7a135 to 4ec9636 Compare February 3, 2021 11:47
@vdimir vdimir removed the submodule changed At least one submodule changed in this PR. label Feb 3, 2021
{
assert(left_keys.size() == right_keys.size());

/// only JOIN USING supported
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that only USING is supported is clear enough from the code, the comment should probably tell "why".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (leftBecomeNullable(col.type))
col.type = makeNullable(col.type);

std::unordered_map<String, DataTypePtr> type_map;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the type information in the input block here is "wrong" (before conversion), and we have to overwrite it w/ the converted values? But why? Can the input block be made to already have the proper types?

Copy link
Copy Markdown
Member

@KochetovNicolai KochetovNicolai Feb 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, when actions for converting are in a separate step, this code won't be needed ...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved common type calculation into TreeRewriter and now TableJoin contains information about converted types on earlier stage, so it is possible to change types in addJoinedColumn and have correct types in columns_added_by_join

Copy link
Copy Markdown
Contributor

@akuzm akuzm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice that this patch is much smaller than the previous version.

I think it would be helpful to add a high-level comment somewhere, that describes what we do -- e.g. we can convert USING columns to their common supertype by inserting additional casts into ExpressionTransform before join. This only works for USING because etc etc...

Copy link
Copy Markdown
Member

@KochetovNicolai KochetovNicolai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@vdimir vdimir force-pushed the vdimir/join-cast-types-v2 branch from 4b68c68 to a6ea5ff Compare February 9, 2021 13:55
@vdimir
Copy link
Copy Markdown
Member Author

vdimir commented Feb 9, 2021

I've found bugs with aggregation due to fact that in analyzeAggregation corrected types are not calculated yet. Common type calculated moved to earlier stage to TreeRewriter and type mapping stored in TableJoin.

It fixes bug and also I suppose makes code a bit clearer

else
{
auto new_dag = ActionsDAG::merge(
std::move(*joined_block_actions->getActionsDAG().clone()),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we split joined_block_actions into separate variables not to use merge here? Now it consist from createJoinedBlockActions and additional converting step. @KochetovNicolai

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, this actions are applied to right part of join. And we don't use separate step for this actions for now. So, I think it is ok to merge.

@vdimir vdimir force-pushed the vdimir/join-cast-types-v2 branch from 0aa2434 to 1a26310 Compare February 19, 2021 11:30
@vdimir
Copy link
Copy Markdown
Member Author

vdimir commented Feb 19, 2021

I've added implementation of type conversion for JOIN ON syntax. It don't change types in result table. For that conversion do not replace original columns, but and new ones and join performed by this new columns.

Copy link
Copy Markdown
Member

@KochetovNicolai KochetovNicolai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not read tests, but changes in code LGTM.

@akuzm
Copy link
Copy Markdown
Contributor

akuzm commented Mar 1, 2021

Integration test failure doesn't look relevant.
MSan fuzzer looks like this one #20046

Should we merge?

@vdimir
Copy link
Copy Markdown
Member Author

vdimir commented Mar 1, 2021

Should we merge?

I found one issue, query doesn't work if syntax JOIN ON used with some constant in WHERE, e.g. SELECT * FROM t1 FULL JOIN t2 ON t1.a = t2.a WHERE 1. It shows error DB::Exception: Not found column t2.a in block. There are only columns: CAST(t2.a, Int32). If type conversion for t1.a and t2.a not needed it works ok as earlier.

Likely after some optimization some required columns are dropped. I'm trying to figure it out.
Maybe we can merge this and fix this issue later or it's better to investigate it more detailed?

@akuzm
Copy link
Copy Markdown
Contributor

akuzm commented Mar 2, 2021

It shows error DB::Exception: Not found column t2.a in block. There are only columns: CAST(t2.a, Int32).

Looks kinda bad, probably better to investigate. Do I remember correctly that you keep the name of the column, but replace it with the casted one? So it should have been named just t2.a and not cast... if everything worked correctly.

@olgarev
Copy link
Copy Markdown
Contributor

olgarev commented May 31, 2021

Internal documentation ticket DOCSUP-9868

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Type conversions for JOIN keys.

5 participants