fix scalar subquery hash conflicts#8367
Conversation
There was a problem hiding this comment.
Why tree hash is not sufficient?
Does it mean that we don't calculate it right and it should be fixed instaed?
There was a problem hiding this comment.
for example we don't have all the fields hashed in https://github.com/yandex/ClickHouse/blob/master/dbms/src/Parsers/ASTOrderByElement.h#L14
AST hashing currently only takes children into account and the only special implementation is ASTLiteral https://github.com/yandex/ClickHouse/blob/master/dbms/src/Parsers/ASTLiteral.cpp#L10
Some code refactoring needs to be done if we really want distinguishable ast hashing, and some reflection framework might be needed
There was a problem hiding this comment.
AST hash is intended to be different for different ASTs. It should be fixed.
Reflection framework will be overkill for that, because I don't know any convenient, well known and widely used C++ reflection frameworks.
There was a problem hiding this comment.
Ok, I'll fix this if there is no undergoing parser refactoring ....
There was a problem hiding this comment.
There is no active parser refactorings right now.
There was a problem hiding this comment.
Can we afford to use IAST::format to generate hashes from? Hashing through values is not enough as different types might have the same memory representation and it's too complex to maintain 60+ ASTs
There was a problem hiding this comment.
It was implemented that way and I remember that I have added separate method for hashing because it was too bad for very long queries or for queries with very long arrays / sets.
|
@amosbird You need to hash subqueries and you choose between hashing on AST level and on Tokens level? Hashing in tokens level for your task is not always available, because after we have AST, we don't have initial query string or token stream. Formatting query back only to calculate hash is also not an option because it is very inefficient. I think, we just need to provide proper getTreeHash method for all ASTs.
Just hash field type before value. |
7c03585 to
fafba06
Compare
src/Parsers/ASTSetQuery.cpp
Outdated
There was a problem hiding this comment.
We have FieldVisitorHash.
29a0926 to
95e7d77
Compare
|
Build check:
Infrastructure failure. |
@akuzm There is no stack trace from clickhouse-server. The problem is undiagnosable. |
|
Stress test: #13980 |
|
Performance test: it's unclear for me if error is real. Need to investigate. |
|
I have checked manually that the queries from "linear_regression" performance test work: |
|
It's #13348, @KochetovNicolai |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
Changelog entry (up to few sentences, required except for Non-significant/Documentation categories):
subquery hash values are not enough to distinguish. #8333