Skip to content

fix scalar subquery hash conflicts#8367

Merged
alexey-milovidov merged 1 commit intoClickHouse:masterfrom
amosbird:scalarfix
Aug 23, 2020
Merged

fix scalar subquery hash conflicts#8367
alexey-milovidov merged 1 commit intoClickHouse:masterfrom
amosbird:scalarfix

Conversation

@amosbird
Copy link
Copy Markdown
Collaborator

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Bug Fix

Changelog entry (up to few sentences, required except for Non-significant/Documentation categories):

subquery hash values are not enough to distinguish. #8333

@alexey-milovidov alexey-milovidov added the pr-bugfix Pull request with bugfix, not backported by default label Dec 26, 2019
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why tree hash is not sufficient?
Does it mean that we don't calculate it right and it should be fixed instaed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example we don't have all the fields hashed in https://github.com/yandex/ClickHouse/blob/master/dbms/src/Parsers/ASTOrderByElement.h#L14

AST hashing currently only takes children into account and the only special implementation is ASTLiteral https://github.com/yandex/ClickHouse/blob/master/dbms/src/Parsers/ASTLiteral.cpp#L10

Some code refactoring needs to be done if we really want distinguishable ast hashing, and some reflection framework might be needed

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AST hash is intended to be different for different ASTs. It should be fixed.

Reflection framework will be overkill for that, because I don't know any convenient, well known and widely used C++ reflection frameworks.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll fix this if there is no undergoing parser refactoring ....

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no active parser refactorings right now.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we afford to use IAST::format to generate hashes from? Hashing through values is not enough as different types might have the same memory representation and it's too complex to maintain 60+ ASTs

Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov Dec 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was implemented that way and I remember that I have added separate method for hashing because it was too bad for very long queries or for queries with very long arrays / sets.

@alexey-milovidov alexey-milovidov added the st-discussion When the implementation aspects are not clear or when the PR is on hold due to questions. label Jan 25, 2020
@4ertus2 4ertus2 self-assigned this Jun 15, 2020
@alexey-milovidov
Copy link
Copy Markdown
Member

@amosbird You need to hash subqueries and you choose between hashing on AST level and on Tokens level?

Hashing in tokens level for your task is not always available, because after we have AST, we don't have initial query string or token stream. Formatting query back only to calculate hash is also not an option because it is very inefficient.

I think, we just need to provide proper getTreeHash method for all ASTs.

Hashing through values is not enough as different types might have the same memory representation

Just hash field type before value.

@amosbird amosbird force-pushed the scalarfix branch 2 times, most recently from 7c03585 to fafba06 Compare August 19, 2020 14:32
@alexey-milovidov alexey-milovidov self-assigned this Aug 19, 2020
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have FieldVisitorHash.

@amosbird amosbird force-pushed the scalarfix branch 3 times, most recently from 29a0926 to 95e7d77 Compare August 20, 2020 09:31
@alexey-milovidov alexey-milovidov removed the st-discussion When the implementation aspects are not clear or when the PR is on hold due to questions. label Aug 23, 2020
@alexey-milovidov
Copy link
Copy Markdown
Member

Build check:

Build failed without build log. It's inner CI problem. Maybe something wrong with docker or host. Just wait for build restart.

Infrastructure failure.

@alexey-milovidov alexey-milovidov mentioned this pull request Aug 23, 2020
40 tasks
@alexey-milovidov
Copy link
Copy Markdown
Member

alexey-milovidov commented Aug 23, 2020

AST fuzzer — Received signal 6

@akuzm There is no stack trace from clickhouse-server. The problem is undiagnosable.
@nikitamikhaylov The error is in cache dictionary - you should take a look #13981

@alexey-milovidov
Copy link
Copy Markdown
Member

Stress test: #13980

@alexey-milovidov
Copy link
Copy Markdown
Member

Performance test: it's unclear for me if error is real. Need to investigate.

@alexey-milovidov
Copy link
Copy Markdown
Member

I have checked manually that the queries from "linear_regression" performance test work:

ClickHouse client version 20.8.1.1.
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 20.8.1 revision 54438.

milovidov-desktop :) DROP TABLE IF EXISTS test_model

DROP TABLE IF EXISTS test_model

Ok.

0 rows in set. Elapsed: 0.001 sec. 

milovidov-desktop :) 
[1]+  Stopped                 clickhouse-client
milovidov@milovidov-desktop:~/work/ClickHouse$ fg
clickhouse-client
milovidov-desktop :) CREATE TABLE test_model engine = Memory as select stochasticLinearRegressionState(0.0001)(Age, Income, ParamPrice, Robotness, RefererHash) as state from test.hits

CREATE TABLE test_model
ENGINE = Memory AS
SELECT stochasticLinearRegressionState(0.0001)(Age, Income, ParamPrice, Robotness, RefererHash) AS state
FROM test.hits

Ok.

0 rows in set. Elapsed: 0.090 sec. Processed 8.87 million rows, 168.60 MB (98.11 million rows/s., 1.86 GB/s.) 

milovidov-desktop :) SELECT stochasticLinearRegressionState(0.0001, 0, 15)(Age, Income, ParamPrice, Robotness, RefererHash) FROM test.hits FORMAT Null

SELECT stochasticLinearRegressionState(0.0001, 0, 15)(Age, Income, ParamPrice, Robotness, RefererHash)
FROM test.hits
FORMAT Null

Ok.

0 rows in set. Elapsed: 0.050 sec. Processed 8.87 million rows, 168.60 MB (176.19 million rows/s., 3.35 GB/s.) 

milovidov-desktop :) SELECT stochasticLinearRegression(Age, Income, ParamPrice, Robotness, RefererHash) FROM test.hits FORMAT Null

SELECT stochasticLinearRegression(Age, Income, ParamPrice, Robotness, RefererHash)
FROM test.hits
FORMAT Null

Ok.

0 rows in set. Elapsed: 0.051 sec. Processed 8.87 million rows, 168.60 MB (173.75 million rows/s., 3.30 GB/s.) 

milovidov-desktop :) SELECT stochasticLinearRegressionState(0.0001, 0, 15, 'Momentum')(Age, Income, ParamPrice, Robotness, RefererHash) FROM hits_100m_single FORMAT Null

SELECT stochasticLinearRegressionState(0.0001, 0, 15, 'Momentum')(Age, Income, ParamPrice, Robotness, RefererHash)
FROM hits_100m_single
FORMAT Null


Received exception from server (version 20.8.1):
Code: 60. DB::Exception: Received from localhost:9000. DB::Exception: Table default.hits_100m_single doesn't exist.. 

0 rows in set. Elapsed: 0.024 sec. 

milovidov-desktop :) SELECT stochasticLinearRegressionState(0.0001, 0, 15, 'Momentum')(Age, Income, ParamPrice, Robotness, RefererHash) FROM test.hits FORMAT Null

SELECT stochasticLinearRegressionState(0.0001, 0, 15, 'Momentum')(Age, Income, ParamPrice, Robotness, RefererHash)
FROM test.hits
FORMAT Null

Ok.

0 rows in set. Elapsed: 0.050 sec. Processed 8.87 million rows, 168.60 MB (178.12 million rows/s., 3.38 GB/s.) 

milovidov-desktop :) WITH (SELECT state FROM test_model) AS model SELECT evalMLMethod(model, Income, ParamPrice, Robotness, RefererHash) FROM test.hits FORMAT Null

WITH 
    (
        SELECT state
        FROM test_model
    ) AS model
SELECT evalMLMethod(model, Income, ParamPrice, Robotness, RefererHash)
FROM test.hits
FORMAT Null

Ok.

0 rows in set. Elapsed: 0.025 sec. Processed 8.87 million rows, 159.73 MB (357.14 million rows/s., 6.43 GB/s.) 

milovidov-desktop :) DROP TABLE IF EXISTS test_model

DROP TABLE IF EXISTS test_model

Ok.

0 rows in set. Elapsed: 0.001 sec. 

milovidov-desktop :) Bye.
milovidov@milovidov-desktop:~/work/ClickHouse$ clickhouse-benchmark <<< "SELECT stochasticLinearRegression(Age, Income, ParamPrice, Robotness, RefererHash) FROM test.hits FORMAT Null"
Loaded 1 queries.

Queries executed: 21.

localhost:9000, queries 21, QPS: 20.351, RPS: 180594815.652, MiB/s: 3272.344, result RPS: 0.000, result MiB/s: 0.000.

0.000%          0.046 sec.
10.000%         0.046 sec.
20.000%         0.047 sec.
30.000%         0.048 sec.
40.000%         0.048 sec.
50.000%         0.049 sec.
60.000%         0.049 sec.
70.000%         0.050 sec.
80.000%         0.051 sec.
90.000%         0.052 sec.
95.000%         0.052 sec.
99.000%         0.057 sec.
99.900%         0.057 sec.
99.990%         0.057 sec.



Queries executed: 41.

localhost:9000, queries 20, QPS: 19.860, RPS: 176238010.688, MiB/s: 3193.400, result RPS: 0.000, result MiB/s: 0.000.

0.000%          0.047 sec.
10.000%         0.047 sec.
20.000%         0.048 sec.
30.000%         0.049 sec.
40.000%         0.049 sec.
50.000%         0.050 sec.
60.000%         0.050 sec.
70.000%         0.051 sec.
80.000%         0.052 sec.
90.000%         0.053 sec.
95.000%         0.053 sec.
99.000%         0.062 sec.
99.900%         0.062 sec.
99.990%         0.062 sec.



Queries executed: 62.

localhost:9000, queries 21, QPS: 20.959, RPS: 185988991.989, MiB/s: 3370.086, result RPS: 0.000, result MiB/s: 0.000.

0.000%          0.046 sec.
10.000%         0.046 sec.
20.000%         0.047 sec.
30.000%         0.047 sec.
40.000%         0.047 sec.
50.000%         0.047 sec.
60.000%         0.048 sec.
70.000%         0.048 sec.
80.000%         0.049 sec.
90.000%         0.050 sec.
95.000%         0.050 sec.
99.000%         0.051 sec.
99.900%         0.051 sec.
99.990%         0.051 sec.



Queries executed: 83.

localhost:9000, queries 21, QPS: 20.408, RPS: 181098473.261, MiB/s: 3281.470, result RPS: 0.000, result MiB/s: 0.000.

0.000%          0.046 sec.
10.000%         0.046 sec.
20.000%         0.047 sec.
30.000%         0.047 sec.
40.000%         0.048 sec.
50.000%         0.048 sec.
60.000%         0.050 sec.
70.000%         0.050 sec.
80.000%         0.050 sec.
90.000%         0.053 sec.
95.000%         0.053 sec.
99.000%         0.054 sec.
99.900%         0.054 sec.
99.990%         0.054 sec.



Queries executed: 104.

localhost:9000, queries 21, QPS: 20.700, RPS: 183687351.186, MiB/s: 3328.380, result RPS: 0.000, result MiB/s: 0.000.

0.000%          0.046 sec.
10.000%         0.047 sec.
20.000%         0.047 sec.
30.000%         0.048 sec.
40.000%         0.048 sec.
50.000%         0.048 sec.
60.000%         0.049 sec.
70.000%         0.049 sec.
80.000%         0.049 sec.
90.000%         0.050 sec.
95.000%         0.050 sec.
99.000%         0.051 sec.
99.900%         0.051 sec.
99.990%         0.051 sec.


^CStopping launch of queries. SIGINT received.

Queries executed: 107.

localhost:9000, queries 107, QPS: 20.442, RPS: 181397043.428, MiB/s: 3286.880, result RPS: 0.000, result MiB/s: 0.000.

0.000%          0.046 sec.
10.000%         0.046 sec.
20.000%         0.047 sec.
30.000%         0.047 sec.
40.000%         0.048 sec.
50.000%         0.049 sec.
60.000%         0.049 sec.
70.000%         0.050 sec.
80.000%         0.050 sec.
90.000%         0.052 sec.
95.000%         0.053 sec.
99.000%         0.057 sec.
99.900%         0.062 sec.
99.990%         0.062 sec.

@alexey-milovidov alexey-milovidov merged commit d0b6ba3 into ClickHouse:master Aug 23, 2020
@alexey-milovidov
Copy link
Copy Markdown
Member

It's #13348, @KochetovNicolai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-docs-needed pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants