-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Projection analysis slows down queries #39905
Copy link
Copy link
Closed
Labels
Description
Describe the situation
While looking at some performance degradation during an upgrade I've noticed that for fast queries the analysis done to determine if a projection can / should be used slows down queries for over 15%.
Example setup:
DROP DATABASE IF EXISTS p;
CREATE DATABASE p;
CREATE TABLE p.perf
ENGINE = MergeTree
ORDER BY c1
SETTINGS index_granularity = 1024, ttl_only_drop_parts = 1 AS
SELECT *
FROM generateRandom('c1 UInt32, c2 UInt64, s1 String, c3 UInt32, c4 UInt64, s2 String, c5 UInt32, c6 UInt64, s3 String, c7 UInt32, c8 UInt64, s4 String, c9 UInt64 , c10 UInt64 , c11 UInt64 , c12 UInt64 , c13 UInt64 , c14 UInt64 , c15 UInt64 , c16 UInt64 , c17 UInt64 , c18 UInt64 , c19 UInt64 , c20 UInt64 , c21 UInt64 , c22 UInt64 , c23 UInt64 , c24 UInt64 , c25 UInt64 , c26 UInt64 , c27 UInt64 , c28 UInt64 , c29 UInt64 , c30 UInt64 , c31 UInt64 , c32 UInt64 , c33 UInt64 , c34 UInt64 , c35 UInt64 , c36 UInt64 , c37 UInt64 , c38 UInt64 , c39 UInt64 , c40 UInt64 , c41 UInt64 , c42 UInt64 , c43 UInt64 , c44 UInt64 , c45 UInt64 , c46 UInt64 , c47 UInt64 , c48 UInt64 , c49 UInt64 , c50 UInt64 , c51 UInt64 , c52 UInt64 , c53 UInt64 , c54 UInt64 , c55 UInt64 , c56 UInt64 , c57 UInt64 , c58 UInt64 , c59 UInt64 , c60 UInt64 , c61 UInt64 , c62 UInt64 , c63 UInt64 , c64 UInt64 , c65 UInt64 , c66 UInt64 , c67 UInt64 , c68 UInt64 , c69 UInt64 , c70 UInt64 , c71 UInt64 , c72 UInt64 , c73 UInt64 , c74 UInt64 , c75 UInt64 , c76 UInt64 , c77 UInt64 , c78 UInt64 , c79 UInt64 , c80 UInt64 , c81 UInt64 , c82 UInt64 , c83 UInt64', 0, 30, 30)
LIMIT 10000000;
OPTIMIZE TABLE p.perf FINAL;Running on 22.7.2.15, default options:
clickhouse-benchmark --query "select 1 from p.perf prewhere c1=3912034155 FORMAT JSON;" --cumulative -i 10000 --max_threads 1Queries executed: 10000.
localhost:9000, queries 10000, QPS: 1376.580, RPS: 1409618.225, MiB/s: 5.377, result RPS: 4129.741, result MiB/s: 0.004.
0.000% 0.001 sec.
10.000% 0.001 sec.
20.000% 0.001 sec.
30.000% 0.001 sec.
40.000% 0.001 sec.
50.000% 0.001 sec.
60.000% 0.001 sec.
70.000% 0.001 sec.
80.000% 0.001 sec.
90.000% 0.001 sec.
95.000% 0.001 sec.
99.000% 0.001 sec.
99.900% 0.001 sec.
99.990% 0.007 sec.
Disabling projections (although the flag is deprecated it still has an effect):
clickhouse-benchmark --query "select 1 from p.perf prewhere c1=3912034155 FORMAT JSON;" --cumulative -i 10000 --max_threads 1 --allow_experimental_projection_optimization 0localhost:9000, queries 10000, QPS: 1659.054, RPS: 1698871.798, MiB/s: 6.481, result RPS: 4977.163, result MiB/s: 0.005.
0.000% 0.001 sec.
10.000% 0.001 sec.
20.000% 0.001 sec.
30.000% 0.001 sec.
40.000% 0.001 sec.
50.000% 0.001 sec.
60.000% 0.001 sec.
70.000% 0.001 sec.
80.000% 0.001 sec.
90.000% 0.001 sec.
95.000% 0.001 sec.
99.000% 0.001 sec.
99.900% 0.001 sec.
99.990% 0.004 sec.
That is, we are losing almost 20% of QPS because of projection analysis.
Perf shows this too:
- 73.17% 0.00% 0 TCPHandler lib_poco_foundation.so [.] Poco::ThreadImpl::runnableEntry ▒
Poco::ThreadImpl::runnableEntry ▒
Poco::PooledThread::run ▒
Poco::Net::TCPServerDispatcher::run ▒
Poco::Net::TCPServerConnection::start ▒
- DB::TCPHandler::run ▒
- 73.03% DB::TCPHandler::runImpl ▒
- 52.45% DB::executeQuery ▒
- 52.31% DB::executeQueryImpl ▒
- 33.35% DB::InterpreterFactory::get ▒
std::__1::make_unique<DB::InterpreterSelectWithUnionQuery, std::__1::shared_ptr<DB::IAST>&, std::__1::shared_ptr<DB::Context>&, DB::SelectQueryOptions const&> ▒
- DB::InterpreterSelectWithUnionQuery::InterpreterSelectWithUnionQuery ▒
- 33.33% DB::InterpreterSelectWithUnionQuery::buildCurrentChildInterpreter ▒
- 33.31% DB::InterpreterSelectQuery::InterpreterSelectQuery ▒
- 33.24% DB::InterpreterSelectQuery::InterpreterSelectQuery ▒
- 28.09% DB::InterpreterSelectQuery::InterpreterSelectQuery(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context> const&, std::__1::optional<DB::Pipe>, std::__1::shared_ptr<DB::IStorage> const&, DB::Sele▒
- 20.48% DB::InterpreterSelectQuery::getSampleBlockImpl ▒
- 16.64% DB::MergeTreeData::getQueryProcessingStage ▒
- 16.54% DB::MergeTreeData::getQueryProcessingStageWithAggregateProjection ▒
+ 14.68% DB::InterpreterSelectQuery::InterpreterSelectQuery ▒
+ 1.14% DB::InterpreterSelectQuery::~InterpreterSelectQuery ▒
+ 3.68% DB::ExpressionAnalysisResult::ExpressionAnalysisResult ▒
+ 5.47% DB::TreeRewriter::analyzeSelect ▒
+ 0.96% DB::TreeRewriterResult::TreeRewriterResult ▒
0.59% DB::TreeRewriterResult::~TreeRewriterResult ▒
+ 1.90% DB::JoinedTables::resolveTables ▒
+ 0.89% DB::Context::checkAccess ▒
+ 0.83% DB::JoinedTables::~JoinedTables ▒
+ 12.17% DB::InterpreterSelectWithUnionQuery::execute ▒
+ 2.83% DB::parseQuery ▒
+ 1.16% DB::InterpreterSelectWithUnionQuery::~InterpreterSelectWithUnionQuery ▒
0.66% DB::logQuery ▒
+ 6.78% DB::TCPHandler::processOrdinaryQueryWithProcessors ▒
+ 4.28% DB::TCPHandler::receivePacket ▒
+ 3.89% DB::QueryPipeline::reset ▒
+ 1.48% DB::QueryState::operator= ▒
1.26% std::__1::__function::__policy_invoker<void (DB::QueryPipeline&)>::__call_impl<std::__1::__function::__default_alloc_func<DB::executeQueryImpl(char const*, char const*, std::__1::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage:▒
+ 0.64% DB::InternalTextLogsQueue::setSourceRegexp ▒
0.50% DB::CurrentThread::QueryScope::QueryScope
Since query interpretation isn't fast (see #35490 for another example), the extra call to InterpreterSelectQuery has an important impact on the performance of the query. I'm not sure if there is a better alternative to analyze the query though.
Reactions are currently unavailable