Skip to content

[CI crash] Failed to cancel removed parts check in replicated merge tree #94755

@robot-clickhouse

Description

@robot-clickhouse
Stack trace details

The sipHash64(st.trace_full) is 2047207756349845257
The trace is from the master or release branch: False

The query for CIDB to compare the trace with the known one:

WITH
    (
        SELECT groupArrayDistinct(cleanStackTrace(trace_full) AS trace) FROM default.stack_traces
        WHERE sipHash64(trace) IN (2047207756349845257, {ANOTHER_TRACE_HASH}) -- FIXME: replace with the known hash
    ) AS traces,
    1.97 AS alpha,
    stack_frame_weights AS (
        WITH
            (
                SELECT count()
                FROM default.stack_traces
                FINAL
            ) AS total,
            2.0 AS beta,
            3.7 AS gamma
        SELECT
            arrayJoin(cleanStackTrace(trace_full)) AS frame,
            countDistinct(trace_full) AS count,
            log(total / count) AS IDF,
            sigmoid(beta * (IDF - gamma)) AS weight
        FROM default.stack_traces
        FINAL
        GROUP BY frame
    ),
    (SELECT groupArray(weight) AS w, groupArray(frame) AS f FROM stack_frame_weights) AS weights,
    (trace -> arrayMap((_frame, pos) -> (pow(pos, -alpha) * arrayFirst(w, f -> (f = _frame), weights.w, weights.f)), trace, arrayEnumerate(trace))) AS get_trace_weights,
    (arr -> arrayStringConcat(arr, '\n')) AS joinArr

SELECT arraySimilarity(traces[1], traces[2], get_trace_weights(traces[1]) AS weights1, get_trace_weights(traces[2]) AS weights2) AS similarity,
    arrayLevenshteinDistanceWeighted(traces[1], traces[2], weights1, weights2),
    joinArr(traces[1]), joinArr(traces[2]), joinArr(weights1), joinArr(weights2)

The following new stack trace from CI Logs system.crash_log found:

   DB::handle_error_code(String const&, std::basic_string_view<char, std::char_traits<char>>, int, bool, std::vector<void*, std::allocator<void*>> const&)
   DB::ReplicatedMergeTreePartCheckThread::cancelRemovedPartsCheck(DB::MergeTreePartInfo const&)
   DB::StorageReplicatedMergeTree::executeDropRange(DB::ReplicatedMergeTreeLogEntry const&)
   DB::StorageReplicatedMergeTree::executeLogEntry(DB::ReplicatedMergeTreeLogEntry&)
   DB::ReplicatedMergeTreeQueue::processEntry(std::function<std::shared_ptr<zkutil::ZooKeeper> ()>, std::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&, std::function<bool (std::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&)>)
   DB::StorageReplicatedMergeTree::processQueueEntry(std::shared_ptr<DB::ReplicatedMergeTreeQueue::SelectedEntry>)
   DB::ExecutableLambdaAdapter::executeStep()
   DB::TaskRuntimeData::executeStep() const
   DB::MergeTreeBackgroundExecutor<DB::RoundRobinRuntimeQueue>::routine(std::shared_ptr<DB::TaskRuntimeData>)
   DB::MergeTreeBackgroundExecutor<DB::RoundRobinRuntimeQueue>::threadFunction()
   ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::worker()
   ThreadFromGlobalPoolImpl<false, true>::ThreadFromGlobalPoolImpl<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&&)::'lambda'()::operator()()
   ThreadPoolImpl<std::thread>::ThreadFromThreadPool::worker()
   void* std::__thread_proxy[$ABI]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>>(void*)

Possible causes:

  • Invalid part information during cancellation
  • Race condition in part check cancellation
  • Incorrect log entry processing in replicated merge tree
  • Inconsistent state of merge tree parts during operation

The stack trace appeared in the following checks:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions