Skip to content

Add _part_starting_offset virtual column and key condition support for offset-based querying#79417

Merged
KochetovNicolai merged 10 commits intoClickHouse:masterfrom
amosbird:projection-index-2
Apr 29, 2025
Merged

Add _part_starting_offset virtual column and key condition support for offset-based querying#79417
KochetovNicolai merged 10 commits intoClickHouse:masterfrom
amosbird:projection-index-2

Conversation

@amosbird
Copy link
Copy Markdown
Collaborator

@amosbird amosbird commented Apr 22, 2025

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Support _part_starting_offset virtual column in MergeTree-family tables. This column represents the cumulative row count of all preceding parts, calculated at query time based on the current part list. The cumulative values are retained throughout query execution and remain effective even after part pruning. Related internal logic has been refactored to support this behavior.

When expressions like _part_starting_offset + _part_offset or _part_offset + _part_starting_offset are used in the WHERE clause, key condition analysis will be properly applied, enabling efficient query-then-fetch patterns and cursor-based pagination. This improves the analysis process introduced in #58224 . Now only one numeric column is used to filter instead of (_part, _part_offset) pair.

This PR also improves stability over projection indexes that rely on _part and _part_offset, which are sensitive to part merges. See #78429 . However, it may still not be reliable under workloads involving: Inserts into partitioned table, materialization of lightweight delete masks, background merges in Collapsing, Replacing, or AggregatingMergeTree tables, etc. A proper query-level snapshot is required for this, and it will be implemented in another PR.

SELECT
    sum(column_not_in_projection)
FROM events
WHERE (_part_starting_offset + _part_offset) IN (
    SELECT _part_starting_offset + _part_offset
    FROM events
    WHERE user_id = 42
)

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Apr 22, 2025

Workflow [PR], commit [f41142b]

@clickhouse-gh clickhouse-gh bot added the pr-feature Pull request with new product feature label Apr 22, 2025
@KochetovNicolai KochetovNicolai self-assigned this Apr 22, 2025
@EmeraldShift
Copy link
Copy Markdown
Contributor

EmeraldShift commented Apr 22, 2025

Can you elaborate more on why this isn't stable across partitioned table inserts? Is it because each partition may have its own set of parts so the offsets aren't unique?

Would it be sufficient and stable to instead use something like this for joining?

(<partition_key_1>, ..., <partition_key_n>, _part_starting_offset + _part_offset)

EDIT: I guess it means my projection index now also has to store my partition columns, which is not ideal. It still sounds better than storing all primary columns to get the same index analysis effect. And since the partition key should be the same for all rows in a part, maybe there's a way to optimize storing it?

@EmeraldShift
Copy link
Copy Markdown
Contributor

Also, sorry for the basic question but I have very little understanding here: what ordering is the offset based on? What's a "preceding" part? Why is that ordering stable across merges?

@amosbird
Copy link
Copy Markdown
Collaborator Author

amosbird commented Apr 23, 2025

@EmeraldShift The current implementation of _part_starting_offset computes cumulative row offsets over a globally ordered list of parts, without respecting partition boundaries. The order is determined by the internal part list as seen at query time, and newly inserted parts are typically placed before existing parts if they belong to an earlier partition.

This can cause _part_starting_offset values to shift unexpectedly when parts are inserted into earlier partitions. For example:

┌──────────────┬────────────┬───────────────┐
│ Partition ID │ Part Name  │ Row Count     │
├──────────────┼────────────┼───────────────┤
│      1       │ part_1_1   │      5        │
│      2       │ part_2_1   │      5        │
└──────────────┴────────────┴───────────────┘

Initial computed _part_starting_offset (at query time):

part_1_1 → 0
part_2_1 → 5

Now a new part is inserted into Partition 1:

┌──────────────┬────────────┬───────────────┐
│ Partition ID │ Part Name  │ Row Count     │
├──────────────┼────────────┼───────────────┤
│      1       │ part_1_2   │      5        │
└──────────────┴────────────┴───────────────┘

Due to the way parts are ordered, the new global list may become:

[part_1_1, part_1_2, part_2_1]

And the updated _part_starting_offset will be:

part_1_1 → 0
part_1_2 → 5
part_2_1 → 10   ← shifted from 5 to 10

This shift breaks the assumption that _part_starting_offset remains stable across queries, making it unreliable for cursor-based or offset-based pagination when partitions are in use.

The correctness of _part_starting_offset relies on a stable snapshot of the global part list. If the snapshot is held at the local query level, it enables single-node projection index capabilities. At the distributed query level, it supports cluster-wide consistent filtering. With an extended snapshot lifecycle, it can further support advanced patterns like query-then-fetch or scroll-style pagination, similar to Elasticsearch. This PR extends current snapshot storage of MergeTree tables to persist _part_starting_offset during query planning, laying the foundation for such advanced use cases.

@amosbird amosbird force-pushed the projection-index-2 branch 2 times, most recently from e9b6c42 to 7011fb4 Compare April 23, 2025 13:09
@EmeraldShift
Copy link
Copy Markdown
Contributor

Thank you for the detailed explanation!

@amosbird amosbird force-pushed the projection-index-2 branch 2 times, most recently from a5060cd to 2a7714c Compare April 25, 2025 12:44
@amosbird amosbird force-pushed the projection-index-2 branch from 2a7714c to 649d8e8 Compare April 25, 2025 16:26
@KochetovNicolai KochetovNicolai added this pull request to the merge queue Apr 29, 2025
Merged via the queue into ClickHouse:master with commit 11d5900 Apr 29, 2025
117 of 120 checks passed
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 29, 2025
baibaichen pushed a commit to Kyligence/gluten that referenced this pull request Apr 30, 2025
baibaichen pushed a commit to apache/gluten that referenced this pull request Apr 30, 2025
* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250430)

* Fix Build due to ClickHouse/ClickHouse#79067

* Fix build due to ClickHouse/ClickHouse#79417

---------

Co-authored-by: kyligence-git <gluten@kyligence.io>
Co-authored-by: Chang chen <chenchang@apache.com>
@fm4v
Copy link
Copy Markdown
Member

fm4v commented May 21, 2025

@amosbird @KochetovNicolai Hi! Can you add a experimental setting for new virtual columns?

@amosbird
Copy link
Copy Markdown
Collaborator Author

@fm4v Hi! I'd like to understand the rationale behind introducing an experimental flag for _part_starting_offset. This column is a pure virtual column. Should the flag just hide the column? Or override its value to 0?

To my knowledge, there's no other pure virtual column that has or needs an experimental flag. The only related example is allow_experimental_block_number_column, but that's quite different: it affects whether the column is persisted, and its value semantics also change depending on the setting.

baibaichen pushed a commit to Kyligence/gluten that referenced this pull request Jul 5, 2025
baibaichen pushed a commit to apache/gluten that referenced this pull request Jul 6, 2025
* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250705)

* Fix benchmark build

* Fix Benchmark build due to ClickHouse/ClickHouse#79417

* Revert "Fix Build due to ClickHouse/ClickHouse#80931"

This reverts commit 02d12f6.

* Fix Build due to ClickHouse/ClickHouse#81886

* Fix Link issue due to ClickHouse/ClickHouse#83121

* Fix Build due to ClickHouse/ClickHouse#82604

* Fix Build due to ClickHouse/ClickHouse#82945

* Fix Build due to ClickHouse/ClickHouse#83214

---------

Co-authored-by: kyligence-git <gluten@kyligence.io>
Co-authored-by: Chang chen <chenchang@apache.com>
@canhld94
Copy link
Copy Markdown
Contributor

canhld94 commented Jul 9, 2025

We see a serve lock contention on getting parts snapshot in v25.5.4, suspicious that it relates to this PR.

WITH
    (
        SELECT now() - 3600
    ) AS start_time,
    (
        SELECT now()
    ) AS end_time
SELECT
    arrayStringConcat(arrayMap(x -> demangle(addressToSymbol(x)), trace), '\n') AS trace_symbols,
    count() AS sz
FROM system.trace_log
WHERE ((event_time >= start_time) AND (event_time <= end_time)) AND (trace_type = 'Real')
GROUP BY trace_symbols
ORDER BY sz DESC
LIMIT 5
SETTINGS allow_introspection_functions = 1

Query id: c0fbbcbb-9f7c-4293-8439-65d217508a4a

Row 1:
──────
trace_symbols: DB::(anonymous namespace)::writeTraceInfo(DB::TraceType, int, siginfo_t*, void*)


std::__1::mutex::lock()
DB::DataPartsLock::DataPartsLock(std::__1::mutex&)
DB::MergeTreeData::getStorageSnapshot(std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::__1::shared_ptr<DB::Context const>) const
DB::IdentifierResolver::tryResolveTableIdentifier(DB::Identifier const&, std::__1::shared_ptr<DB::Context const> const&)
DB::QueryAnalyzer::tryResolveIdentifier(DB::IdentifierLookup const&, DB::IdentifierResolveScope&, DB::IdentifierResolveContext)
DB::QueryAnalyzer::resolveQuery(std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&)
DB::QueryAnalyzer::resolveExpressionNode(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool, bool)
DB::QueryAnalyzer::resolveQueryJoinTreeNode(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&)
DB::QueryAnalyzer::resolveQuery(std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&)
DB::QueryAnalyzer::resolve(std::__1::shared_ptr<DB::IQueryTreeNode>&, std::__1::shared_ptr<DB::IQueryTreeNode> const&, std::__1::shared_ptr<DB::Context const>)
DB::QueryAnalysisPass::run(std::__1::shared_ptr<DB::IQueryTreeNode>&, std::__1::shared_ptr<DB::Context const>)
DB::QueryTreePassManager::run(std::__1::shared_ptr<DB::IQueryTreeNode>, unsigned long)
DB::buildQueryTreeAndRunPasses(std::__1::shared_ptr<DB::IAST> const&, DB::SelectQueryOptions const&, std::__1::shared_ptr<DB::Context const> const&, std::__1::shared_ptr<DB::IStorage> const&)
DB::InterpreterSelectQueryAnalyzer::InterpreterSelectQueryAnalyzer(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const> const&, DB::SelectQueryOptions const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&)
DB::TableFunctionView::getActualTableStructure(std::__1::shared_ptr<DB::Context const>, bool) const
DB::TableFunctionView::executeImpl(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, DB::ColumnsDescription, bool) const
DB::ITableFunction::execute(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, DB::ColumnsDescription, bool, bool) const
DB::Context::executeTableFunction(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::ITableFunction> const&, std::__1::shared_ptr<DB::Context const> const&)
DB::QueryAnalyzer::resolveTableFunction(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&, bool)
DB::QueryAnalyzer::resolveQueryJoinTreeNode(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&)
DB::QueryAnalyzer::resolveQuery(std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&)
DB::QueryAnalyzer::resolve(std::__1::shared_ptr<DB::IQueryTreeNode>&, std::__1::shared_ptr<DB::IQueryTreeNode> const&, std::__1::shared_ptr<DB::Context const>)
DB::QueryAnalysisPass::run(std::__1::shared_ptr<DB::IQueryTreeNode>&, std::__1::shared_ptr<DB::Context const>)
DB::QueryTreePassManager::run(std::__1::shared_ptr<DB::IQueryTreeNode>, unsigned long)
DB::buildQueryTreeAndRunPasses(std::__1::shared_ptr<DB::IAST> const&, DB::SelectQueryOptions const&, std::__1::shared_ptr<DB::Context const> const&, std::__1::shared_ptr<DB::IStorage> const&)
DB::InterpreterSelectQueryAnalyzer::InterpreterSelectQueryAnalyzer(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const> const&, DB::SelectQueryOptions const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&)
std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>> std::__1::__function::__policy_invoker<std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>> (DB::InterpreterFactory::Arguments const&)>::__call_impl[abi:ne190107]<std::__1::__function::__default_alloc_func<DB::registerInterpreterSelectQueryAnalyzer(DB::InterpreterFactory&)::$_0, std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>> (DB::InterpreterFactory::Arguments const&)>>(std::__1::__function::__policy_storage const*, DB::InterpreterFactory::Arguments const&) (.llvm.1496827156594342616)
DB::InterpreterFactory::get(std::__1::shared_ptr<DB::IAST>&, std::__1::shared_ptr<DB::Context>, DB::SelectQueryOptions const&)
DB::executeQueryImpl(char const*, char const*, std::__1::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*, std::__1::shared_ptr<DB::IAST>&)
DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum)
DB::TCPHandler::runImpl()
DB::TCPHandler::run()
Poco::Net::TCPServerConnection::start()
Poco::Net::TCPServerDispatcher::run()
Poco::PooledThread::run()
Poco::ThreadImpl::runnableEntry(void*)


sz:            241180

Row 2:
──────
trace_symbols: DB::(anonymous namespace)::writeTraceInfo(DB::TraceType, int, siginfo_t*, void*)



std::__1::mutex::lock()
DB::DataPartsLock::DataPartsLock(std::__1::mutex&)
DB::MergeTreeData::getStorageSnapshot(std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::__1::shared_ptr<DB::Context const>) const
DB::IdentifierResolver::tryResolveTableIdentifier(DB::Identifier const&, std::__1::shared_ptr<DB::Context const> const&)
DB::QueryAnalyzer::tryResolveIdentifier(DB::IdentifierLookup const&, DB::IdentifierResolveScope&, DB::IdentifierResolveContext)
DB::QueryAnalyzer::resolveQuery(std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&)
DB::QueryAnalyzer::resolveExpressionNode(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool, bool)
DB::QueryAnalyzer::resolveQueryJoinTreeNode(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&)
DB::QueryAnalyzer::resolveQuery(std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&)
DB::QueryAnalyzer::resolveExpressionNode(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool, bool)
DB::QueryAnalyzer::resolveQueryJoinTreeNode(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&)
DB::QueryAnalyzer::resolveQuery(std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&)
DB::QueryAnalyzer::resolve(std::__1::shared_ptr<DB::IQueryTreeNode>&, std::__1::shared_ptr<DB::IQueryTreeNode> const&, std::__1::shared_ptr<DB::Context const>)
DB::QueryAnalysisPass::run(std::__1::shared_ptr<DB::IQueryTreeNode>&, std::__1::shared_ptr<DB::Context const>)
DB::QueryTreePassManager::run(std::__1::shared_ptr<DB::IQueryTreeNode>, unsigned long)
DB::buildQueryTreeAndRunPasses(std::__1::shared_ptr<DB::IAST> const&, DB::SelectQueryOptions const&, std::__1::shared_ptr<DB::Context const> const&, std::__1::shared_ptr<DB::IStorage> const&)
DB::InterpreterSelectQueryAnalyzer::InterpreterSelectQueryAnalyzer(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const> const&, DB::SelectQueryOptions const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&)
DB::StorageView::read(DB::QueryPlan&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&, std::__1::shared_ptr<DB::StorageSnapshot> const&, DB::SelectQueryInfo&, std::__1::shared_ptr<DB::Context const>, DB::QueryProcessingStage::Enum, unsigned long, unsigned long)
DB::(anonymous namespace)::buildQueryPlanForTableExpression(std::__1::shared_ptr<DB::IQueryTreeNode>, std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::SelectQueryInfo const&, DB::SelectQueryOptions const&, std::__1::shared_ptr<DB::PlannerContext>&, bool, bool)
DB::buildJoinTreeQueryPlan(std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::SelectQueryInfo const&, DB::SelectQueryOptions&, std::__1::unordered_set<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&, std::__1::shared_ptr<DB::PlannerContext>&)
DB::Planner::buildPlanForQueryNode()
DB::Planner::buildQueryPlanIfNeeded()
DB::executeQueryImpl(char const*, char const*, std::__1::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*, std::__1::shared_ptr<DB::IAST>&)
DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum)
DB::TCPHandler::runImpl()
DB::TCPHandler::run()
Poco::Net::TCPServerConnection::start()
Poco::Net::TCPServerDispatcher::run()
Poco::PooledThread::run()
Poco::ThreadImpl::runnableEntry(void*)


sz:            198878

Row 3:
──────
trace_symbols: DB::(anonymous namespace)::writeTraceInfo(DB::TraceType, int, siginfo_t*, void*)



std::__1::mutex::lock()
DB::DataPartsLock::DataPartsLock(std::__1::mutex&)
DB::MergeTreeData::getStorageSnapshot(std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::__1::shared_ptr<DB::Context const>) const
DB::IdentifierResolver::tryResolveTableIdentifier(DB::Identifier const&, std::__1::shared_ptr<DB::Context const> const&)
DB::QueryAnalyzer::tryResolveIdentifier(DB::IdentifierLookup const&, DB::IdentifierResolveScope&, DB::IdentifierResolveContext)
DB::QueryAnalyzer::resolveQuery(std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&)
DB::QueryAnalyzer::resolve(std::__1::shared_ptr<DB::IQueryTreeNode>&, std::__1::shared_ptr<DB::IQueryTreeNode> const&, std::__1::shared_ptr<DB::Context const>)
DB::QueryAnalysisPass::run(std::__1::shared_ptr<DB::IQueryTreeNode>&, std::__1::shared_ptr<DB::Context const>)
DB::QueryTreePassManager::run(std::__1::shared_ptr<DB::IQueryTreeNode>, unsigned long)
DB::buildQueryTreeAndRunPasses(std::__1::shared_ptr<DB::IAST> const&, DB::SelectQueryOptions const&, std::__1::shared_ptr<DB::Context const> const&, std::__1::shared_ptr<DB::IStorage> const&)
DB::InterpreterSelectQueryAnalyzer::InterpreterSelectQueryAnalyzer(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const> const&, DB::SelectQueryOptions const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&)
std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>> std::__1::__function::__policy_invoker<std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>> (DB::InterpreterFactory::Arguments const&)>::__call_impl[abi:ne190107]<std::__1::__function::__default_alloc_func<DB::registerInterpreterSelectQueryAnalyzer(DB::InterpreterFactory&)::$_0, std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>> (DB::InterpreterFactory::Arguments const&)>>(std::__1::__function::__policy_storage const*, DB::InterpreterFactory::Arguments const&) (.llvm.1496827156594342616)
DB::InterpreterFactory::get(std::__1::shared_ptr<DB::IAST>&, std::__1::shared_ptr<DB::Context>, DB::SelectQueryOptions const&)
DB::executeQueryImpl(char const*, char const*, std::__1::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*, std::__1::shared_ptr<DB::IAST>&)
DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum)
DB::TCPHandler::runImpl()
DB::TCPHandler::run()
Poco::Net::TCPServerConnection::start()
Poco::Net::TCPServerDispatcher::run()
Poco::PooledThread::run()
Poco::ThreadImpl::runnableEntry(void*)


sz:            171786

Row 4:
──────
trace_symbols: DB::(anonymous namespace)::writeTraceInfo(DB::TraceType, int, siginfo_t*, void*)



DB::ExecutionThreadContext::wait(std::__1::atomic<bool>&)
DB::ExecutorTasks::tryGetTask(DB::ExecutionThreadContext&)
DB::PipelineExecutor::executeStepImpl(unsigned long, std::__1::atomic<bool>*)
void std::__1::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__1::__function::__default_alloc_func<DB::PipelineExecutor::spawnThreadsImpl(std::__1::shared_ptr<DB::IAcquiredSlot>)::$_0, void ()>>(std::__1::__function::__policy_storage const*)
ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::worker()
void std::__1::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__1::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false, true>::ThreadFromGlobalPoolImpl<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&&)::'lambda'(), void ()>>(std::__1::__function::__policy_storage const*)
ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool::worker()
void* std::__1::__thread_proxy[abi:ne190107]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool*>>(void*)


sz:            105504

Row 5:
──────
trace_symbols: DB::(anonymous namespace)::writeTraceInfo(DB::TraceType, int, siginfo_t*, void*)



std::__1::mutex::lock()
DB::DataPartsLock::DataPartsLock(std::__1::mutex&)
DB::MergeTreeData::getStorageSnapshot(std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::__1::shared_ptr<DB::Context const>) const
DB::IdentifierResolver::tryResolveTableIdentifier(DB::Identifier const&, std::__1::shared_ptr<DB::Context const> const&)
DB::QueryAnalyzer::tryResolveIdentifier(DB::IdentifierLookup const&, DB::IdentifierResolveScope&, DB::IdentifierResolveContext)
DB::QueryAnalyzer::resolveQuery(std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&)
DB::QueryAnalyzer::resolveExpressionNode(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool, bool)
DB::QueryAnalyzer::resolveQueryJoinTreeNode(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&)
DB::QueryAnalyzer::resolveQuery(std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&)
DB::QueryAnalyzer::resolveExpressionNode(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool, bool)
DB::QueryAnalyzer::resolveQueryJoinTreeNode(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&)
DB::QueryAnalyzer::resolveQuery(std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&)
DB::QueryAnalyzer::resolve(std::__1::shared_ptr<DB::IQueryTreeNode>&, std::__1::shared_ptr<DB::IQueryTreeNode> const&, std::__1::shared_ptr<DB::Context const>)
DB::QueryAnalysisPass::run(std::__1::shared_ptr<DB::IQueryTreeNode>&, std::__1::shared_ptr<DB::Context const>)
DB::QueryTreePassManager::run(std::__1::shared_ptr<DB::IQueryTreeNode>, unsigned long)
DB::buildQueryTreeAndRunPasses(std::__1::shared_ptr<DB::IAST> const&, DB::SelectQueryOptions const&, std::__1::shared_ptr<DB::Context const> const&, std::__1::shared_ptr<DB::IStorage> const&)
DB::InterpreterSelectQueryAnalyzer::InterpreterSelectQueryAnalyzer(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const> const&, DB::SelectQueryOptions const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&)
DB::TableFunctionView::getActualTableStructure(std::__1::shared_ptr<DB::Context const>, bool) const
DB::TableFunctionView::executeImpl(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, DB::ColumnsDescription, bool) const
DB::ITableFunction::execute(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, DB::ColumnsDescription, bool, bool) const
DB::Context::executeTableFunction(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::ITableFunction> const&, std::__1::shared_ptr<DB::Context const> const&)
DB::QueryAnalyzer::resolveTableFunction(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&, bool)
DB::QueryAnalyzer::resolveQueryJoinTreeNode(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&)
DB::QueryAnalyzer::resolveQuery(std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&)
DB::QueryAnalyzer::resolve(std::__1::shared_ptr<DB::IQueryTreeNode>&, std::__1::shared_ptr<DB::IQueryTreeNode> const&, std::__1::shared_ptr<DB::Context const>)
DB::QueryAnalysisPass::run(std::__1::shared_ptr<DB::IQueryTreeNode>&, std::__1::shared_ptr<DB::Context const>)
DB::QueryTreePassManager::run(std::__1::shared_ptr<DB::IQueryTreeNode>, unsigned long)
DB::buildQueryTreeAndRunPasses(std::__1::shared_ptr<DB::IAST> const&, DB::SelectQueryOptions const&, std::__1::shared_ptr<DB::Context const> const&, std::__1::shared_ptr<DB::IStorage> const&)
DB::InterpreterSelectQueryAnalyzer::InterpreterSelectQueryAnalyzer(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const> const&, DB::SelectQueryOptions const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&)
std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>> std::__1::__function::__policy_invoker<std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>> (DB::InterpreterFactory::Arguments const&)>::__call_impl[abi:ne190107]<std::__1::__function::__default_alloc_func<DB::registerInterpreterSelectQueryAnalyzer(DB::InterpreterFactory&)::$_0, std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>> (DB::InterpreterFactory::Arguments const&)>>(std::__1::__function::__policy_storage const*, DB::InterpreterFactory::Arguments const&) (.llvm.1496827156594342616)
DB::InterpreterFactory::get(std::__1::shared_ptr<DB::IAST>&, std::__1::shared_ptr<DB::Context>, DB::SelectQueryOptions const&)
DB::executeQueryImpl(char const*, char const*, std::__1::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*, std::__1::shared_ptr<DB::IAST>&)
DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum)
DB::TCPHandler::runImpl()
DB::TCPHandler::run()
Poco::Net::TCPServerConnection::start()
Poco::Net::TCPServerDispatcher::run()
Poco::PooledThread::run()
Poco::ThreadImpl::runnableEntry(void*)


sz:            65665

auto lock = lockParts();
snapshot_data->parts = getVisibleDataPartsVectorUnlocked(query_context, lock);
parts = getVisibleDataPartsVectorUnlocked(query_context, lock);
snapshot_data->parts = RangesInDataParts(parts);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to hold part lock when constructing RangesInDataParts

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice spot! Do you mind submitting a PR to optimize this?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, how did you notice this? It’s a bit surprising, since the RangesInDataParts c'tor seems simpler than getVisibleDataPartsVectorUnlocked.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We see some lock contention in getStorageSnapshot when upgrading from v25.3 to v25.5. I'm not 100% constructing RangesInDataParts is the problem here though. Testing in our prod env if moving snapshot_data->parts = RangesInDataParts(parts); out of the scope of lock can help.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We see some lock contention in getStorageSnapshot when upgrading from v25.3 to v25.5.

Yeah, I saw the trace info you posted. I'm curious — how did you first notice the contention? Was there a slowdown in queries or a drop in QPS that led you to check the trace_log? Or do you have some tooling in place that specifically monitors lock contention? Also, do you happen to have a comparison of the trace_log before and after the upgrade? It would be helpful to see how the contention pattern changed between v25.3 and v25.5.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a slowdown in queries or a drop in QPS

Yes, we have quota for max concurrent query and mostly never hit it unless something bad happens.

Or do you have some tooling in place that specifically monitors lock contention

No, it's just my habit that when seeing performance regression, the first thing I check is where the query spending time on, i.e. checking trace_log.

Also, do you happen to have a comparison of the trace_log before and after the upgrade? It would be helpful to see how the contention pattern changed between v25.3 and v25.5.

Unfortunately I don't have it now. But I've always been using system.trace_log to investigate performance regression and to the top of my mind, when things run properly, Real trace is similar to CPU trace and may contains other general traces (network, query pipeline execution...), but not lock.

@canhld94
Copy link
Copy Markdown
Contributor

canhld94 commented Jul 9, 2025

We see a serve lock contention on getting parts snapshot in v25.5.4, suspicious that it relates to this PR.

WITH
    (
        SELECT now() - 3600
    ) AS start_time,
    (
        SELECT now()
    ) AS end_time
SELECT
    arrayStringConcat(arrayMap(x -> demangle(addressToSymbol(x)), trace), '\n') AS trace_symbols,
    count() AS sz
FROM system.trace_log
WHERE ((event_time >= start_time) AND (event_time <= end_time)) AND (trace_type = 'Real')
GROUP BY trace_symbols
ORDER BY sz DESC
LIMIT 5
SETTINGS allow_introspection_functions = 1

Query id: c0fbbcbb-9f7c-4293-8439-65d217508a4a

Row 1:
──────
trace_symbols: DB::(anonymous namespace)::writeTraceInfo(DB::TraceType, int, siginfo_t*, void*)


std::__1::mutex::lock()
DB::DataPartsLock::DataPartsLock(std::__1::mutex&)
DB::MergeTreeData::getStorageSnapshot(std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::__1::shared_ptr<DB::Context const>) const
DB::IdentifierResolver::tryResolveTableIdentifier(DB::Identifier const&, std::__1::shared_ptr<DB::Context const> const&)
DB::QueryAnalyzer::tryResolveIdentifier(DB::IdentifierLookup const&, DB::IdentifierResolveScope&, DB::IdentifierResolveContext)
DB::QueryAnalyzer::resolveQuery(std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&)
DB::QueryAnalyzer::resolveExpressionNode(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool, bool)
DB::QueryAnalyzer::resolveQueryJoinTreeNode(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&)
DB::QueryAnalyzer::resolveQuery(std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&)
DB::QueryAnalyzer::resolve(std::__1::shared_ptr<DB::IQueryTreeNode>&, std::__1::shared_ptr<DB::IQueryTreeNode> const&, std::__1::shared_ptr<DB::Context const>)
DB::QueryAnalysisPass::run(std::__1::shared_ptr<DB::IQueryTreeNode>&, std::__1::shared_ptr<DB::Context const>)
DB::QueryTreePassManager::run(std::__1::shared_ptr<DB::IQueryTreeNode>, unsigned long)
DB::buildQueryTreeAndRunPasses(std::__1::shared_ptr<DB::IAST> const&, DB::SelectQueryOptions const&, std::__1::shared_ptr<DB::Context const> const&, std::__1::shared_ptr<DB::IStorage> const&)
DB::InterpreterSelectQueryAnalyzer::InterpreterSelectQueryAnalyzer(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const> const&, DB::SelectQueryOptions const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&)
DB::TableFunctionView::getActualTableStructure(std::__1::shared_ptr<DB::Context const>, bool) const
DB::TableFunctionView::executeImpl(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, DB::ColumnsDescription, bool) const
DB::ITableFunction::execute(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, DB::ColumnsDescription, bool, bool) const
DB::Context::executeTableFunction(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::ITableFunction> const&, std::__1::shared_ptr<DB::Context const> const&)
DB::QueryAnalyzer::resolveTableFunction(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&, bool)
DB::QueryAnalyzer::resolveQueryJoinTreeNode(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&)
DB::QueryAnalyzer::resolveQuery(std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&)
DB::QueryAnalyzer::resolve(std::__1::shared_ptr<DB::IQueryTreeNode>&, std::__1::shared_ptr<DB::IQueryTreeNode> const&, std::__1::shared_ptr<DB::Context const>)
DB::QueryAnalysisPass::run(std::__1::shared_ptr<DB::IQueryTreeNode>&, std::__1::shared_ptr<DB::Context const>)
DB::QueryTreePassManager::run(std::__1::shared_ptr<DB::IQueryTreeNode>, unsigned long)
DB::buildQueryTreeAndRunPasses(std::__1::shared_ptr<DB::IAST> const&, DB::SelectQueryOptions const&, std::__1::shared_ptr<DB::Context const> const&, std::__1::shared_ptr<DB::IStorage> const&)
DB::InterpreterSelectQueryAnalyzer::InterpreterSelectQueryAnalyzer(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const> const&, DB::SelectQueryOptions const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&)
std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>> std::__1::__function::__policy_invoker<std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>> (DB::InterpreterFactory::Arguments const&)>::__call_impl[abi:ne190107]<std::__1::__function::__default_alloc_func<DB::registerInterpreterSelectQueryAnalyzer(DB::InterpreterFactory&)::$_0, std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>> (DB::InterpreterFactory::Arguments const&)>>(std::__1::__function::__policy_storage const*, DB::InterpreterFactory::Arguments const&) (.llvm.1496827156594342616)
DB::InterpreterFactory::get(std::__1::shared_ptr<DB::IAST>&, std::__1::shared_ptr<DB::Context>, DB::SelectQueryOptions const&)
DB::executeQueryImpl(char const*, char const*, std::__1::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*, std::__1::shared_ptr<DB::IAST>&)
DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum)
DB::TCPHandler::runImpl()
DB::TCPHandler::run()
Poco::Net::TCPServerConnection::start()
Poco::Net::TCPServerDispatcher::run()
Poco::PooledThread::run()
Poco::ThreadImpl::runnableEntry(void*)


sz:            241180

Row 2:
──────
trace_symbols: DB::(anonymous namespace)::writeTraceInfo(DB::TraceType, int, siginfo_t*, void*)



std::__1::mutex::lock()
DB::DataPartsLock::DataPartsLock(std::__1::mutex&)
DB::MergeTreeData::getStorageSnapshot(std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::__1::shared_ptr<DB::Context const>) const
DB::IdentifierResolver::tryResolveTableIdentifier(DB::Identifier const&, std::__1::shared_ptr<DB::Context const> const&)
DB::QueryAnalyzer::tryResolveIdentifier(DB::IdentifierLookup const&, DB::IdentifierResolveScope&, DB::IdentifierResolveContext)
DB::QueryAnalyzer::resolveQuery(std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&)
DB::QueryAnalyzer::resolveExpressionNode(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool, bool)
DB::QueryAnalyzer::resolveQueryJoinTreeNode(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&)
DB::QueryAnalyzer::resolveQuery(std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&)
DB::QueryAnalyzer::resolveExpressionNode(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool, bool)
DB::QueryAnalyzer::resolveQueryJoinTreeNode(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&)
DB::QueryAnalyzer::resolveQuery(std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&)
DB::QueryAnalyzer::resolve(std::__1::shared_ptr<DB::IQueryTreeNode>&, std::__1::shared_ptr<DB::IQueryTreeNode> const&, std::__1::shared_ptr<DB::Context const>)
DB::QueryAnalysisPass::run(std::__1::shared_ptr<DB::IQueryTreeNode>&, std::__1::shared_ptr<DB::Context const>)
DB::QueryTreePassManager::run(std::__1::shared_ptr<DB::IQueryTreeNode>, unsigned long)
DB::buildQueryTreeAndRunPasses(std::__1::shared_ptr<DB::IAST> const&, DB::SelectQueryOptions const&, std::__1::shared_ptr<DB::Context const> const&, std::__1::shared_ptr<DB::IStorage> const&)
DB::InterpreterSelectQueryAnalyzer::InterpreterSelectQueryAnalyzer(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const> const&, DB::SelectQueryOptions const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&)
DB::StorageView::read(DB::QueryPlan&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&, std::__1::shared_ptr<DB::StorageSnapshot> const&, DB::SelectQueryInfo&, std::__1::shared_ptr<DB::Context const>, DB::QueryProcessingStage::Enum, unsigned long, unsigned long)
DB::(anonymous namespace)::buildQueryPlanForTableExpression(std::__1::shared_ptr<DB::IQueryTreeNode>, std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::SelectQueryInfo const&, DB::SelectQueryOptions const&, std::__1::shared_ptr<DB::PlannerContext>&, bool, bool)
DB::buildJoinTreeQueryPlan(std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::SelectQueryInfo const&, DB::SelectQueryOptions&, std::__1::unordered_set<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&, std::__1::shared_ptr<DB::PlannerContext>&)
DB::Planner::buildPlanForQueryNode()
DB::Planner::buildQueryPlanIfNeeded()
DB::executeQueryImpl(char const*, char const*, std::__1::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*, std::__1::shared_ptr<DB::IAST>&)
DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum)
DB::TCPHandler::runImpl()
DB::TCPHandler::run()
Poco::Net::TCPServerConnection::start()
Poco::Net::TCPServerDispatcher::run()
Poco::PooledThread::run()
Poco::ThreadImpl::runnableEntry(void*)


sz:            198878

Row 3:
──────
trace_symbols: DB::(anonymous namespace)::writeTraceInfo(DB::TraceType, int, siginfo_t*, void*)



std::__1::mutex::lock()
DB::DataPartsLock::DataPartsLock(std::__1::mutex&)
DB::MergeTreeData::getStorageSnapshot(std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::__1::shared_ptr<DB::Context const>) const
DB::IdentifierResolver::tryResolveTableIdentifier(DB::Identifier const&, std::__1::shared_ptr<DB::Context const> const&)
DB::QueryAnalyzer::tryResolveIdentifier(DB::IdentifierLookup const&, DB::IdentifierResolveScope&, DB::IdentifierResolveContext)
DB::QueryAnalyzer::resolveQuery(std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&)
DB::QueryAnalyzer::resolve(std::__1::shared_ptr<DB::IQueryTreeNode>&, std::__1::shared_ptr<DB::IQueryTreeNode> const&, std::__1::shared_ptr<DB::Context const>)
DB::QueryAnalysisPass::run(std::__1::shared_ptr<DB::IQueryTreeNode>&, std::__1::shared_ptr<DB::Context const>)
DB::QueryTreePassManager::run(std::__1::shared_ptr<DB::IQueryTreeNode>, unsigned long)
DB::buildQueryTreeAndRunPasses(std::__1::shared_ptr<DB::IAST> const&, DB::SelectQueryOptions const&, std::__1::shared_ptr<DB::Context const> const&, std::__1::shared_ptr<DB::IStorage> const&)
DB::InterpreterSelectQueryAnalyzer::InterpreterSelectQueryAnalyzer(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const> const&, DB::SelectQueryOptions const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&)
std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>> std::__1::__function::__policy_invoker<std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>> (DB::InterpreterFactory::Arguments const&)>::__call_impl[abi:ne190107]<std::__1::__function::__default_alloc_func<DB::registerInterpreterSelectQueryAnalyzer(DB::InterpreterFactory&)::$_0, std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>> (DB::InterpreterFactory::Arguments const&)>>(std::__1::__function::__policy_storage const*, DB::InterpreterFactory::Arguments const&) (.llvm.1496827156594342616)
DB::InterpreterFactory::get(std::__1::shared_ptr<DB::IAST>&, std::__1::shared_ptr<DB::Context>, DB::SelectQueryOptions const&)
DB::executeQueryImpl(char const*, char const*, std::__1::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*, std::__1::shared_ptr<DB::IAST>&)
DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum)
DB::TCPHandler::runImpl()
DB::TCPHandler::run()
Poco::Net::TCPServerConnection::start()
Poco::Net::TCPServerDispatcher::run()
Poco::PooledThread::run()
Poco::ThreadImpl::runnableEntry(void*)


sz:            171786

Row 4:
──────
trace_symbols: DB::(anonymous namespace)::writeTraceInfo(DB::TraceType, int, siginfo_t*, void*)



DB::ExecutionThreadContext::wait(std::__1::atomic<bool>&)
DB::ExecutorTasks::tryGetTask(DB::ExecutionThreadContext&)
DB::PipelineExecutor::executeStepImpl(unsigned long, std::__1::atomic<bool>*)
void std::__1::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__1::__function::__default_alloc_func<DB::PipelineExecutor::spawnThreadsImpl(std::__1::shared_ptr<DB::IAcquiredSlot>)::$_0, void ()>>(std::__1::__function::__policy_storage const*)
ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::worker()
void std::__1::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__1::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false, true>::ThreadFromGlobalPoolImpl<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&&)::'lambda'(), void ()>>(std::__1::__function::__policy_storage const*)
ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool::worker()
void* std::__1::__thread_proxy[abi:ne190107]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool*>>(void*)


sz:            105504

Row 5:
──────
trace_symbols: DB::(anonymous namespace)::writeTraceInfo(DB::TraceType, int, siginfo_t*, void*)



std::__1::mutex::lock()
DB::DataPartsLock::DataPartsLock(std::__1::mutex&)
DB::MergeTreeData::getStorageSnapshot(std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::__1::shared_ptr<DB::Context const>) const
DB::IdentifierResolver::tryResolveTableIdentifier(DB::Identifier const&, std::__1::shared_ptr<DB::Context const> const&)
DB::QueryAnalyzer::tryResolveIdentifier(DB::IdentifierLookup const&, DB::IdentifierResolveScope&, DB::IdentifierResolveContext)
DB::QueryAnalyzer::resolveQuery(std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&)
DB::QueryAnalyzer::resolveExpressionNode(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool, bool)
DB::QueryAnalyzer::resolveQueryJoinTreeNode(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&)
DB::QueryAnalyzer::resolveQuery(std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&)
DB::QueryAnalyzer::resolveExpressionNode(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool, bool)
DB::QueryAnalyzer::resolveQueryJoinTreeNode(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&)
DB::QueryAnalyzer::resolveQuery(std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&)
DB::QueryAnalyzer::resolve(std::__1::shared_ptr<DB::IQueryTreeNode>&, std::__1::shared_ptr<DB::IQueryTreeNode> const&, std::__1::shared_ptr<DB::Context const>)
DB::QueryAnalysisPass::run(std::__1::shared_ptr<DB::IQueryTreeNode>&, std::__1::shared_ptr<DB::Context const>)
DB::QueryTreePassManager::run(std::__1::shared_ptr<DB::IQueryTreeNode>, unsigned long)
DB::buildQueryTreeAndRunPasses(std::__1::shared_ptr<DB::IAST> const&, DB::SelectQueryOptions const&, std::__1::shared_ptr<DB::Context const> const&, std::__1::shared_ptr<DB::IStorage> const&)
DB::InterpreterSelectQueryAnalyzer::InterpreterSelectQueryAnalyzer(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const> const&, DB::SelectQueryOptions const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&)
DB::TableFunctionView::getActualTableStructure(std::__1::shared_ptr<DB::Context const>, bool) const
DB::TableFunctionView::executeImpl(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, DB::ColumnsDescription, bool) const
DB::ITableFunction::execute(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, DB::ColumnsDescription, bool, bool) const
DB::Context::executeTableFunction(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::ITableFunction> const&, std::__1::shared_ptr<DB::Context const> const&)
DB::QueryAnalyzer::resolveTableFunction(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&, bool)
DB::QueryAnalyzer::resolveQueryJoinTreeNode(std::__1::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&)
DB::QueryAnalyzer::resolveQuery(std::__1::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&)
DB::QueryAnalyzer::resolve(std::__1::shared_ptr<DB::IQueryTreeNode>&, std::__1::shared_ptr<DB::IQueryTreeNode> const&, std::__1::shared_ptr<DB::Context const>)
DB::QueryAnalysisPass::run(std::__1::shared_ptr<DB::IQueryTreeNode>&, std::__1::shared_ptr<DB::Context const>)
DB::QueryTreePassManager::run(std::__1::shared_ptr<DB::IQueryTreeNode>, unsigned long)
DB::buildQueryTreeAndRunPasses(std::__1::shared_ptr<DB::IAST> const&, DB::SelectQueryOptions const&, std::__1::shared_ptr<DB::Context const> const&, std::__1::shared_ptr<DB::IStorage> const&)
DB::InterpreterSelectQueryAnalyzer::InterpreterSelectQueryAnalyzer(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const> const&, DB::SelectQueryOptions const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&)
std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>> std::__1::__function::__policy_invoker<std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>> (DB::InterpreterFactory::Arguments const&)>::__call_impl[abi:ne190107]<std::__1::__function::__default_alloc_func<DB::registerInterpreterSelectQueryAnalyzer(DB::InterpreterFactory&)::$_0, std::__1::unique_ptr<DB::IInterpreter, std::__1::default_delete<DB::IInterpreter>> (DB::InterpreterFactory::Arguments const&)>>(std::__1::__function::__policy_storage const*, DB::InterpreterFactory::Arguments const&) (.llvm.1496827156594342616)
DB::InterpreterFactory::get(std::__1::shared_ptr<DB::IAST>&, std::__1::shared_ptr<DB::Context>, DB::SelectQueryOptions const&)
DB::executeQueryImpl(char const*, char const*, std::__1::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*, std::__1::shared_ptr<DB::IAST>&)
DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum)
DB::TCPHandler::runImpl()
DB::TCPHandler::run()
Poco::Net::TCPServerConnection::start()
Poco::Net::TCPServerDispatcher::run()
Poco::PooledThread::run()
Poco::ThreadImpl::runnableEntry(void*)


sz:            65665

@amosbird

@canhld94
Copy link
Copy Markdown
Contributor

canhld94 commented Jul 9, 2025

I move construction of RangesInDataParts to outside of scope of part lock, and there's no trace of getting storage snapshot anymore:

WITH
    (
        SELECT now() - 60
    ) AS start_time,
    (
        SELECT now()
    ) AS end_time
SELECT
    arrayStringConcat(arrayMap(x -> demangle(addressToSymbol(x)), trace), '\n') AS trace_symbols,
    count() AS sz
FROM system.trace_log
WHERE ((event_time >= start_time) AND (event_time <= end_time)) AND (trace_type = 'Real')
GROUP BY trace_symbols
ORDER BY sz DESC
LIMIT 20
SETTINGS allow_introspection_functions = 1

Query id: 0607b20f-ad01-40c7-8dc1-1550d903c4ac

Row 1:
──────
trace_symbols: DB::(anonymous namespace)::writeTraceInfo(DB::TraceType, int, siginfo_t*, void*)



DB::ExecutionThreadContext::wait(std::__1::atomic<bool>&)
DB::ExecutorTasks::tryGetTask(DB::ExecutionThreadContext&)
DB::PipelineExecutor::executeStepImpl(unsigned long, std::__1::atomic<bool>*)
void std::__1::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__1::__function::__default_alloc_func<DB::PipelineExecutor::spawnThreadsImpl(std::__1::shared_ptr<DB::IAcquiredSlot>)::$_0, void ()>>(std::__1::__function::__policy_storage const*)
ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::worker()
void std::__1::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__1::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false, true>::ThreadFromGlobalPoolImpl<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::Th
readFromThreadPool*>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&&)::'lambda'(), void ()>>(std::__1::__function::__policy_storage const*)
ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool::worker()
void* std::__1::__thread_proxy[abi:ne190107]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool*>>(void*)


sz:            14865

Row 2:
──────
trace_symbols: DB::(anonymous namespace)::writeTraceInfo(DB::TraceType, int, siginfo_t*, void*)



bool ConcurrentBoundedQueue<DB::Chunk>::popImpl<true>(DB::Chunk&, std::__1::optional<unsigned long>)
DB::LazyOutputFormat::getChunk(unsigned long)
DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)
DB::PullingAsyncPipelineExecutor::pull(DB::Block&, unsigned long)
DB::TCPHandler::runImpl()
DB::TCPHandler::run()
Poco::Net::TCPServerConnection::start()
Poco::Net::TCPServerDispatcher::run()
Poco::PooledThread::run()
Poco::ThreadImpl::runnableEntry(void*)


sz:            2358

Row 3:
──────
trace_symbols: DB::(anonymous namespace)::writeTraceInfo(DB::TraceType, int, siginfo_t*, void*)



std::__1::future_status std::__1::__assoc_sub_state::wait_until<std::__1::chrono::steady_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>(std::__1::chrono::time_point<std::__1::chrono::steady_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>> const&) const
DB::WaitForAsyncInsertSource::generate()
DB::ISource::tryGenerate()
DB::ISource::work()
DB::ExecutionThreadContext::executeTask()
DB::PipelineExecutor::executeStepImpl(unsigned long, std::__1::atomic<bool>*)
DB::PipelineExecutor::execute(unsigned long, bool)
DB::CompletedPipelineExecutor::execute()
DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, std::__1::shared_ptr<DB::Context>, std::__1::function<void (DB::QueryResultDetails const&)>, DB::QueryFlags, std::__1::optional<DB::FormatSettings> const&, std::__1::function<void (DB::IOutputFormat&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::Context const> const&, std::__1::optional<DB::FormatSettings> const&)>, std::__1::function<void ()>)
DB::HTTPHandler::processQuery(DB::HTTPServerRequest&, DB::HTMLForm&, DB::HTTPServerResponse&, DB::HTTPHandler::Output&, std::__1::optional<DB::CurrentThread::QueryScope>&, StrongTypedef<unsigned long, ProfileEvents::EventTag> const&)
DB::HTTPHandler::handleRequest(DB::HTTPServerRequest&, DB::HTTPServerResponse&, StrongTypedef<unsigned long, ProfileEvents::EventTag> const&)
DB::HTTPServerConnection::run()
Poco::Net::TCPServerConnection::start()
Poco::Net::TCPServerDispatcher::run()
Poco::PooledThread::run()
Poco::ThreadImpl::runnableEntry(void*)


sz:            2167

Row 4:
──────
trace_symbols: DB::(anonymous namespace)::writeTraceInfo(DB::TraceType, int, siginfo_t*, void*)


DB::Epoll::getManyReady(int, epoll_event*, int) const
DB::PollingQueue::getTask(std::__1::unique_lock<std::__1::mutex>&, int)
DB::ExecutorTasks::processAsyncTasks()
DB::PipelineExecutor::execute(unsigned long, bool)
void std::__1::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__1::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::'lambda'(), void ()>>(std::__1::__function::__policy_storage const*)
ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool::worker()
void* std::__1::__thread_proxy[abi:ne190107]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool*>>(void*)


sz:            1200

Row 5:
──────
trace_symbols: DB::(anonymous namespace)::writeTraceInfo(DB::TraceType, int, siginfo_t*, void*)



ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::wait()
DB::MergeTreeDataSelectExecutor::filterPartsByPrimaryKeyAndSkipIndexes(DB::RangesInDataParts, std::__1::shared_ptr<DB::StorageInMemoryMetadata const>, std::__1::shared_ptr<DB::Context const> const&, DB::KeyCondition const&, std::__1::optional<DB::KeyCondition> const&, std::__1::optional<DB::KeyCondition> const&, DB::UsefulSkipIndexes const&, DB::MergeTreeReaderSettings const&, std::__1::shared_ptr<Poco::Logger>, unsigned long, std::__1::vector<DB::ReadFromMergeTree::IndexStat, std::__1::allocator<DB::ReadFromMergeTree::IndexStat>>&, bool, bool, bool)
DB::ReadFromMergeTree::selectRangesToRead(DB::RangesInDataParts, std::__1::shared_ptr<DB::MergeTreeData::IMutationsSnapshot const>, std::__1::optional<DB::VectorSearchParameters> const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&, DB::SelectQueryInfo const&, std::__1::shared_ptr<DB::Context const>, unsigned long, std::__1::shared_ptr<std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, long, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, long>>> const>, DB::MergeTreeData const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&, std::__1::shared_ptr<Poco::Logger>, std::__1::optional<DB::ReadFromMergeTree::Indexes>&, bool)
DB::ReadFromMergeTree::selectRangesToRead(DB::RangesInDataParts, bool) const
DB::ReadFromMergeTree::selectRangesToRead(bool) const
DB::ReadFromMergeTree::initializePipeline(DB::QueryPipelineBuilder&, DB::BuildQueryPipelineSettings const&)
DB::ISourceStep::updatePipeline(std::__1::vector<std::__1::unique_ptr<DB::QueryPipelineBuilder, std::__1::default_delete<DB::QueryPipelineBuilder>>, std::__1::allocator<std::__1::unique_ptr<DB::QueryPipelineBuilder, std::__1::default_delete<DB::QueryPipelineBuilder>>>>, DB::BuildQueryPipelineSettings const&)
DB::QueryPlan::buildQueryPipeline(DB::QueryPlanOptimizationSettings const&, DB::BuildQueryPipelineSettings const&, bool)
DB::InterpreterSelectQueryAnalyzer::buildQueryPipeline()
DB::InterpreterSelectQueryAnalyzer::execute()
DB::executeQueryImpl(char const*, char const*, std::__1::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*, std::__1::shared_ptr<DB::IAST>&)
DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum)
DB::TCPHandler::runImpl()
DB::TCPHandler::run()
Poco::Net::TCPServerConnection::start()
Poco::Net::TCPServerDispatcher::run()
Poco::PooledThread::run()
Poco::ThreadImpl::runnableEntry(void*)


sz:            1160

Row 6:
──────
trace_symbols: DB::(anonymous namespace)::writeTraceInfo(DB::TraceType, int, siginfo_t*, void*)



Poco::EventImpl::waitImpl()
DB::PullingAsyncPipelineExecutor::cancel()
DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)
DB::PullingAsyncPipelineExecutor::pull(DB::Block&, unsigned long)
DB::TCPHandler::runImpl()
DB::TCPHandler::run()
Poco::Net::TCPServerConnection::start()
Poco::Net::TCPServerDispatcher::run()
Poco::PooledThread::run()
Poco::ThreadImpl::runnableEntry(void*)


sz:            964

Row 7:
──────
trace_symbols: DB::(anonymous namespace)::writeTraceInfo(DB::TraceType, int, siginfo_t*, void*)


DB::(anonymous namespace)::query(int, unsigned short, unsigned int, char8_t, unsigned short, void const*, int) (.llvm.14952529482316836871)
DB::NetlinkMetricsProvider::getStat(taskstats&, int) const
taskstats std::__1::__function::__policy_invoker<taskstats ()>::__call_impl[abi:ne190107]<std::__1::__function::__default_alloc_func<DB::TasksStatsCounters::TasksStatsCounters(unsigned long, DB::TasksStatsCounters::MetricsProvider)::$_0, taskstats ()>>(std::__1::__function::__policy_storage const*)
DB::ThreadStatus::updatePerformanceCounters()
DB::ReadProgressCallback::onProgress(unsigned long, unsigned long, std::__1::list<DB::StorageLimits, std::__1::allocator<DB::StorageLimits>> const&)
DB::ExecutionThreadContext::executeTask()
DB::PipelineExecutor::executeStepImpl(unsigned long, std::__1::atomic<bool>*)
void std::__1::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__1::__function::__default_alloc_func<DB::PipelineExecutor::spawnThreadsImpl(std::__1::shared_ptr<DB::IAcquiredSlot>)::$_0, void ()>>(std::__1::__function::__policy_storage const*)
ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::worker()
void std::__1::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__1::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false, true>::ThreadFromGlobalPoolImpl<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&&)::'lambda'(), void ()>>(std::__1::__function::__policy_storage const*)
ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool::worker()
void* std::__1::__thread_proxy[abi:ne190107]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool*>>(void*)


sz:            787

Row 8:
──────
trace_symbols: DB::(anonymous namespace)::writeTraceInfo(DB::TraceType, int, siginfo_t*, void*)

unsigned long DB::HashJoinMethods<(DB::JoinKind)1, (DB::JoinStrictness)2, DB::HashJoin::MapsTemplate<DB::RowRef>>::switchJoinRightColumns<DB::AddedColumns<false>>(std::__1::vector<DB::HashJoin::MapsTemplate<DB::RowRef> const*, std::__1::allocator<DB::HashJoin::MapsTemplate<DB::RowRef> const*>> const&, DB::AddedColumns<false>&, DB::HashJoin::Type, DB::JoinStuff::JoinUsedFlags&)
DB::HashJoinMethods<(DB::JoinKind)1, (DB::JoinStrictness)2, DB::HashJoin::MapsTemplate<DB::RowRef>>::joinBlockImpl(DB::HashJoin const&, DB::ScatteredBlock&, DB::Block const&, std::__1::vector<DB::HashJoin::MapsTemplate<DB::RowRef> const*, std::__1::allocator<DB::HashJoin::MapsTemplate<DB::RowRef> const*>> const&, bool)
DB::HashJoinMethods<(DB::JoinKind)1, (DB::JoinStrictness)2, DB::HashJoin::MapsTemplate<DB::RowRef>>::joinBlockImpl(DB::HashJoin const&, DB::Block&, DB::Block const&, std::__1::vector<DB::HashJoin::MapsTemplate<DB::RowRef> const*, std::__1::allocator<DB::HashJoin::MapsTemplate<DB::RowRef> const*>> const&, bool)
DB::HashJoin::joinGet(DB::Block const&, DB::Block const&) const
DB::StorageJoin::joinGet(DB::Block const&, DB::Block const&, std::__1::shared_ptr<DB::Context const>) const
DB::(anonymous namespace)::ExecutableFunctionJoinGet<true>::executeImpl(std::__1::vector<DB::ColumnWithTypeAndName, std::__1::allocator<DB::ColumnWithTypeAndName>> const&, std::__1::shared_ptr<DB::IDataType const> const&, unsigned long) const (.09ea87334475e1c7b7fe23889f8e2de0)
DB::IExecutableFunction::executeWithoutLowCardinalityColumns(std::__1::vector<DB::ColumnWithTypeAndName, std::__1::allocator<DB::ColumnWithTypeAndName>> const&, std::__1::shared_ptr<DB::IDataType const> const&, unsigned long, bool) const
DB::IExecutableFunction::executeWithoutSparseColumns(std::__1::vector<DB::ColumnWithTypeAndName, std::__1::allocator<DB::ColumnWithTypeAndName>> const&, std::__1::shared_ptr<DB::IDataType const> const&, unsigned long, bool) const
DB::IExecutableFunction::execute(std::__1::vector<DB::ColumnWithTypeAndName, std::__1::allocator<DB::ColumnWithTypeAndName>> const&, std::__1::shared_ptr<DB::IDataType const> const&, unsigned long, bool) const
DB::ExpressionActions::execute(DB::Block&, unsigned long&, bool, bool) const
DB::ExpressionTransform::transform(DB::Chunk&)
DB::ISimpleTransform::transform(DB::Chunk&, DB::Chunk&)
DB::ISimpleTransform::work()
DB::ExecutionThreadContext::executeTask()
DB::PipelineExecutor::executeStepImpl(unsigned long, std::__1::atomic<bool>*)
void std::__1::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__1::__function::__default_alloc_func<DB::PipelineExecutor::spawnThreadsImpl(std::__1::shared_ptr<DB::IAcquiredSlot>)::$_0, void ()>>(std::__1::__function::__policy_storage const*)
ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::worker()
void std::__1::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__1::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false, true>::ThreadFromGlobalPoolImpl<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&&)::'lambda'(), void ()>>(std::__1::__function::__policy_storage const*)
ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool::worker()
void* std::__1::__thread_proxy[abi:ne190107]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool*>>(void*)


sz:            683

Row 9:
───────
trace_symbols: DB::(anonymous namespace)::writeTraceInfo(DB::TraceType, int, siginfo_t*, void*)

ZSTD_decompressSequences.llvm.9384621155403057804
ZSTD_decompressBlock_internal
ZSTD_decompressMultiFrame.llvm.13857306737008492306
DB::CompressionCodecZSTD::doDecompressData(char const*, unsigned int, char*, unsigned int) const
DB::ICompressionCodec::decompress(char const*, unsigned int, char*) const
DB::CompressedReadBufferFromFile::nextImpl()
void DB::deserializeBinarySSE2OrAVX2<4>(DB::PODArray<char8_t, 4096ul, Allocator<false, false>, 63ul, 64ul>&, DB::PODArray<unsigned long, 4096ul, Allocator<false, false>, 63ul, 64ul>&, DB::ReadBuffer&, unsigned long)
DB::ISerialization::deserializeBinaryBulkWithMultipleStreams(COW<DB::IColumn>::immutable_ptr<DB::IColumn>&, unsigned long, unsigned long, DB::ISerialization::DeserializeBinaryBulkSettings&, std::__1::shared_ptr<DB::ISerialization::DeserializeBinaryBulkState>&, std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, COW<DB::IColumn>::immutable_ptr<DB::IColumn>>>>*) const
DB::SerializationNullable::deserializeBinaryBulkWithMultipleStreams(COW<DB::IColumn>::immutable_ptr<DB::IColumn>&, unsigned long, unsigned long, DB::ISerialization::DeserializeBinaryBulkSettings&, std::__1::shared_ptr<DB::ISerialization::DeserializeBinaryBulkState>&, std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, COW<DB::IColumn>::immutable_ptr<DB::IColumn>>>>*) const
DB::MergeTreeReaderWide::readRows(unsigned long, unsigned long, bool, unsigned long, unsigned long, std::__1::vector<COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::__1::allocator<COW<DB::IColumn>::immutable_ptr<DB::IColumn>>>&)
DB::MergeTreeRangeReader::DelayedStream::finalize(std::__1::vector<COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::__1::allocator<COW<DB::IColumn>::immutable_ptr<DB::IColumn>>>&)
DB::MergeTreeRangeReader::continueReadingChain(DB::MergeTreeRangeReader::ReadResult&, unsigned long&)
DB::MergeTreeReadersChain::read(unsigned long, DB::MarkRanges&)
DB::MergeTreeReadTask::read()
DB::MergeTreeInOrderSelectAlgorithm::readFromTask(DB::MergeTreeReadTask&)
DB::MergeTreeSelectProcessor::read()
DB::MergeTreeSource::tryGenerate()
DB::ISource::work()
DB::ExecutionThreadContext::executeTask()
DB::PipelineExecutor::executeStepImpl(unsigned long, std::__1::atomic<bool>*)
DB::PipelineExecutor::execute(unsigned long, bool)
void std::__1::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__1::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::'lambda'(), void ()>>(std::__1::__function::__policy_storage const*)
ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool::worker()
void* std::__1::__thread_proxy[abi:ne190107]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool*>>(void*)


sz:            645

Row 10:
───────
trace_symbols: DB::(anonymous namespace)::writeTraceInfo(DB::TraceType, int, siginfo_t*, void*)

DB::ColumnVector<char8_t>::insertFrom(DB::IColumn const&, unsigned long)
DB::ColumnNullable::insertFrom(DB::IColumn const&, unsigned long)
DB::AddedColumns<false>::appendFromBlock(DB::RowRef const*, bool)
unsigned long DB::HashJoinMethods<(DB::JoinKind)1, (DB::JoinStrictness)2, DB::HashJoin::MapsTemplate<DB::RowRef>>::switchJoinRightColumns<DB::AddedColumns<false>>(std::__1::vector<DB::HashJoin::MapsTemplate<DB::RowRef> const*, std::__1::allocator<DB::HashJoin::MapsTemplate<DB::RowRef> const*>> const&, DB::AddedColumns<false>&, DB::HashJoin::Type, DB::JoinStuff::JoinUsedFlags&)
DB::HashJoinMethods<(DB::JoinKind)1, (DB::JoinStrictness)2, DB::HashJoin::MapsTemplate<DB::RowRef>>::joinBlockImpl(DB::HashJoin const&, DB::ScatteredBlock&, DB::Block const&, std::__1::vector<DB::HashJoin::MapsTemplate<DB::RowRef> const*, std::__1::allocator<DB::HashJoin::MapsTemplate<DB::RowRef> const*>> const&, bool)
DB::HashJoinMethods<(DB::JoinKind)1, (DB::JoinStrictness)2, DB::HashJoin::MapsTemplate<DB::RowRef>>::joinBlockImpl(DB::HashJoin const&, DB::Block&, DB::Block const&, std::__1::vector<DB::HashJoin::MapsTemplate<DB::RowRef> const*, std::__1::allocator<DB::HashJoin::MapsTemplate<DB::RowRef> const*>> const&, bool)
DB::HashJoin::joinGet(DB::Block const&, DB::Block const&) const
DB::StorageJoin::joinGet(DB::Block const&, DB::Block const&, std::__1::shared_ptr<DB::Context const>) const
DB::(anonymous namespace)::ExecutableFunctionJoinGet<true>::executeImpl(std::__1::vector<DB::ColumnWithTypeAndName, std::__1::allocator<DB::ColumnWithTypeAndName>> const&, std::__1::shared_ptr<DB::IDataType const> const&, unsigned long) const (.09ea87334475e1c7b7fe23889f8e2de0)
DB::IExecutableFunction::executeWithoutLowCardinalityColumns(std::__1::vector<DB::ColumnWithTypeAndName, std::__1::allocator<DB::ColumnWithTypeAndName>> const&, std::__1::shared_ptr<DB::IDataType const> const&, unsigned long, bool) const
DB::IExecutableFunction::executeWithoutSparseColumns(std::__1::vector<DB::ColumnWithTypeAndName, std::__1::allocator<DB::ColumnWithTypeAndName>> const&, std::__1::shared_ptr<DB::IDataType const> const&, unsigned long, bool) const
DB::IExecutableFunction::execute(std::__1::vector<DB::ColumnWithTypeAndName, std::__1::allocator<DB::ColumnWithTypeAndName>> const&, std::__1::shared_ptr<DB::IDataType const> const&, unsigned long, bool) const
DB::ExpressionActions::execute(DB::Block&, unsigned long&, bool, bool) const
DB::ExpressionTransform::transform(DB::Chunk&)
DB::ISimpleTransform::transform(DB::Chunk&, DB::Chunk&)
DB::ISimpleTransform::work()
DB::ExecutionThreadContext::executeTask()
DB::PipelineExecutor::executeStepImpl(unsigned long, std::__1::atomic<bool>*)
void std::__1::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__1::__function::__default_alloc_func<DB::PipelineExecutor::spawnThreadsImpl(std::__1::shared_ptr<DB::IAcquiredSlot>)::$_0, void ()>>(std::__1::__function::__policy_storage const*)
ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::worker()
void std::__1::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__1::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false, true>::ThreadFromGlobalPoolImpl<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&&)::'lambda'(), void ()>>(std::__1::__function::__policy_storage const*)
ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool::worker()
void* std::__1::__thread_proxy[abi:ne190107]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool*>>(void*)


sz:            642

@canhld94
Copy link
Copy Markdown
Contributor

canhld94 commented Jul 9, 2025

PartsLockWaitMicroseconds before:

SELECT
    initial_query_id,
    ProfileEvents['PartsLockWaitMicroseconds']
FROM system.query_log
WHERE (event_time >= ((now() - uptime()) - 600)) AND (event_time <= (now() - uptime()))
ORDER BY ProfileEvents['PartsLockWaitMicroseconds'] DESC
LIMIT 10

Query id: abb92f1f-c2e5-4534-b760-66fe7eb60f9c

    ┌─initial_query_id──────────────────────────────────────────────────────────────────┬─arrayElement⋯roseconds')─┐
 1. │ 723377:aa-prod-public-product@frontend-sg,152394560b3784d713cf55cc1ec63fe0,96836  │                254805942 │
 2. │ 723381:aa-prod-public-product@frontend-sg,e90887a9dbde3bb3b4cc67024c60cc23,84406  │                226634398 │
 3. │ 3321029:aa-prod-public-product@frontend-jp,129414528efad3dd1f738185475cde7e,54717 │                203957629 │
 4. │ 723367:aa-prod-public-product@frontend-sg,dd250f9a86191033acc34a2ec626dfca,85101  │                194926035 │
 5. │ 3321031:aa-prod-public-product@frontend-jp,d97a2d5ac27502838c83eb9ecc2dfc0b,57954 │                184798374 │
 6. │ 3321024:aa-prod-public-product@frontend-jp,6d2fabc2b52e90440207767e2301eba1,53160 │                182084248 │
 7. │ 765957:aa-prod-public-product@frontend-au,5015c5e548b408642feb61530f9d1f8c,11307  │                172873235 │
 8. │ 723369:aa-prod-public-product@frontend-sg,e8e3f55e43a3e4c73beeb9d3f04f18e2,86144  │                172493345 │
 9. │ 723379:aa-prod-public-product@frontend-sg,be0da2d6366a0f29be2c9259b6b4e155,100034 │                171100890 │
10. │ 3321032:aa-prod-public-product@frontend-jp,f23284e260dae97b53e45ca3560ee3f5,55177 │                166905824 │
    └───────────────────────────────────────────────────────────────────────────────────┴──────────────────────────┘

and after:

    ┌─initial_query_id────────────────────────────────────────────────────────────────────┬─arrayElement⋯roseconds')─┐
 1. │ 723378:aa-prod-public-product@frontend-sg,fd56e2acf011370b3cbdba844dbec9e0,102029   │                  1772046 │
 2. │ 3321039:aa-prod-public-product@frontend-jp,c60488741d46ea0e9f4e7293173168f6,62750   │                  1603718 │
 3. │ 3321026:aa-prod-public-product@frontend-jp,7a39629340b6d664e948299fbdfc4951,65155   │                  1447563 │
 4. │ 723382:aa-prod-public-product@frontend-sg,e0490b3e928488977e2261ffa38cef2e,105373   │                  1194062 │
 5. │ 723376:aa-prod-public-product@frontend-sg,458f24e909f2bc9ce9aa82046e3a2a79,95584    │                  1126935 │
 6. │ 723375:aa-prod-public-product@frontend-sg,1f68c959001d61763a90e142cb88041a,96417    │                  1091916 │
 7. │ 3484830:aa-prod-public-product@frontend-eu3,bcc09c051b231e167abb1969fabee084,108199 │                  1084431 │
 8. │ 3321022:aa-prod-public-product@frontend-jp,7fb78839ad22b2fd3dc0f18f994189d2,66594   │                  1080182 │
 9. │ 3321021:aa-prod-public-product@frontend-jp,47d9e4a3693e3c4fe33b9350c93391dc,66044   │                  1047172 │
10. │ 723378:aa-prod-public-product@frontend-sg,e095feb5d3b584acb4a696cd9fba3123,102027   │                  1007311 │
    └─────────────────────────────────────────────────────────────────────────────────────┴──────────────────────────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants