Skip to content

[CI crash] Incorrect destruction of MergeTreeDataPartCompact caches #94661

@robot-ch-test-poll4

Description

@robot-ch-test-poll4
Stack trace details

The sipHash64(st.trace_full) is 17987466783627542056
The trace is from the master or release branch: True

The query for CIDB to compare the trace with the known one:

WITH
    (
        SELECT groupArrayDistinct(cleanStackTrace(trace_full) AS trace) FROM default.stack_traces
        WHERE sipHash64(trace) IN (17987466783627542056, {ANOTHER_TRACE_HASH}) -- FIXME: replace with the known hash
    ) AS traces,
    1.97 AS alpha,
    stack_frame_weights AS (
        WITH
            (
                SELECT count()
                FROM default.stack_traces
                FINAL
            ) AS total,
            2.0 AS beta,
            3.7 AS gamma
        SELECT
            arrayJoin(cleanStackTrace(trace_full)) AS frame,
            countDistinct(trace_full) AS count,
            log(total / count) AS IDF,
            sigmoid(beta * (IDF - gamma)) AS weight
        FROM default.stack_traces
        FINAL
        GROUP BY frame
    ),
    (SELECT groupArray(weight) AS w, groupArray(frame) AS f FROM stack_frame_weights) AS weights,
    (trace -> arrayMap((_frame, pos) -> (pow(pos, -alpha) * arrayFirst(w, f -> (f = _frame), weights.w, weights.f)), trace, arrayEnumerate(trace))) AS get_trace_weights,
    (arr -> arrayStringConcat(arr, '\n')) AS joinArr

SELECT arraySimilarity(traces[1], traces[2], get_trace_weights(traces[1]) AS weights1, get_trace_weights(traces[2]) AS weights2) AS similarity,
    arrayLevenshteinDistanceWeighted(traces[1], traces[2], weights1, weights2),
    joinArr(traces[1]), joinArr(traces[2]), joinArr(weights1), joinArr(weights2)

The following new stack trace from CI Logs system.crash_log found:

   DB::MergeTreeData::getPrimaryIndexCache() const
   DB::IMergeTreeDataPart::clearCaches()
   DB::IMergeTreeDataPart::removeIfNeeded()
   DB::MergeTreeDataPartCompact::~MergeTreeDataPartCompact()
   DB::RangesInDataPart::~RangesInDataPart()
   DB::MergeTreeReadPoolBase::~MergeTreeReadPoolBase()
   DB::MergeTreeReadPool::~MergeTreeReadPool()
   DB::MergeTreeSelectProcessor::~MergeTreeSelectProcessor()
   DB::MergeTreeSource::~MergeTreeSource()
   std::__shared_ptr_emplace<std::vector<std::shared_ptr<DB::IProcessor>, std::allocator<std::shared_ptr<DB::IProcessor>>>, std::allocator<std::vector<std::shared_ptr<DB::IProcessor>, std::allocator<std::shared_ptr<DB::IProcessor>>>>>::__on_zero_shared()
   DB::QueryPipeline::~QueryPipeline()
   DB::QueryPipeline::reset()
   DB::TCPHandler::runImpl()
   DB::TCPHandler::run()
   Poco::Net::TCPServerConnection::start()
   Poco::Net::TCPServerDispatcher::run()
   Poco::PooledThread::run()
   Poco::ThreadImpl::runnableEntry(void*)

Possible causes:

  • Improper cleanup of cache objects during destruction
  • Double deletion or invalid reference to cache data
  • Race condition during cache destruction in multi-threaded environment
  • Cache not properly invalidated before destruction

The stack trace appeared in the following checks:

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions