Actions
Bug #73053
closedMDS crashed with coredump while running smbtorture tests
Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:
0%
Source:
Development
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Tags (freeform):
Merge Commit:
Fixed In:
v20.3.0-3108-gf7bf6bb97b
Released In:
Upkeep Timestamp:
2025-09-18T08:52:36+00:00
Description
Following backtrace was seen while running smbtorture tests on a share backed by CephFS:
Core was generated by `/usr/bin/ceph-mds -n mds.sit_fs.storage0.xwiokv -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-journald=true --default-log-to-stderr=false'.
(gdb) bt
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1 0x00007f371d9170a3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2 0x00007f371d8c9b86 in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26
#3 0x000055bce5dc9e76 in reraise_fatal (signum=6) at /usr/src/debug/ceph-20.3.0-3006.g0d5b95e8.el9.x86_64/src/global/signal_handler.cc:91
#4 handle_oneshot_fatal_signal (signum=6) at /usr/src/debug/ceph-20.3.0-3006.g0d5b95e8.el9.x86_64/src/global/signal_handler.cc:370
#5 <signal handler called>
#6 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#7 0x00007f371d9170a3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#8 0x00007f371d8c9b86 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#9 0x00007f371d8b3873 in __GI_abort () at abort.c:79
#10 0x00007f371e52feb0 in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=722, func=<optimized out>)
at /usr/src/debug/ceph-20.3.0-3006.g0d5b95e8.el9.x86_64/src/common/assert.cc:101
#11 0x00007f371e53d79c in operator() (eval=<optimized out>, __closure=<optimized out>)
at /usr/src/debug/ceph-20.3.0-3006.g0d5b95e8.el9.x86_64/src/common/perf_counters.cc:722
#12 ceph::common::PerfCountersBuilder::create_perf_counters (this=this@entry=0x7f3718bde4e0)
at /usr/src/debug/ceph-20.3.0-3006.g0d5b95e8.el9.x86_64/src/common/perf_counters.cc:722
#13 0x000055bce5d6f8e5 in MetricAggregator::refresh_subvolume_metrics_for_rank (this=this@entry=0x55bce85bf000, rank=rank@entry=0,
metrics=std::vector of length 1, capacity 1 = {...}) at /usr/src/debug/ceph-20.3.0-3006.g0d5b95e8.el9.x86_64/src/mds/MetricAggregator.cc:192
#14 0x000055bce5d7487f in MetricAggregator::handle_mds_metrics (this=0x55bce85bf000, m=...)
at /usr/src/debug/ceph-20.3.0-3006.g0d5b95e8.el9.x86_64/src/mds/MetricAggregator.cc:728
#15 0x000055bce5d74c01 in MetricAggregator::ms_dispatch2 (this=0x55bce85bf000, m=...)
at /usr/src/debug/ceph-20.3.0-3006.g0d5b95e8.el9.x86_64/src/mds/MetricAggregator.cc:160
#16 0x00007f371e7b3a98 in Messenger::ms_deliver_dispatch (m=..., this=<optimized out>) at /usr/src/debug/ceph-20.3.0-3006.g0d5b95e8.el9.x86_64/src/msg/Messenger.h:747
#17 DispatchQueue::entry (this=0x55bce8655310) at /usr/src/debug/ceph-20.3.0-3006.g0d5b95e8.el9.x86_64/src/msg/DispatchQueue.cc:202
#18 0x00007f371e846f51 in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /usr/src/debug/ceph-20.3.0-3006.g0d5b95e8.el9.x86_64/src/msg/DispatchQueue.h:101
#19 0x00007f371d9152fa in start_thread (arg=<optimized out>) at pthread_create.c:443
#20 0x00007f371d99a540 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) f 12
#12 ceph::common::PerfCountersBuilder::create_perf_counters (this=this@entry=0x7f3718bde4e0)
at /usr/src/debug/ceph-20.3.0-3006.g0d5b95e8.el9.x86_64/src/common/perf_counters.cc:722
722 ceph_assert(d->type != PERFCOUNTER_NONE);
(gdb) p d
$1 = {name = 0x0, description = 0x0, nick = 0x0, prio = 0 '\000', type = PERFCOUNTER_NONE, unit = UNIT_NONE, u64 = std::atomic<unsigned long> = { 0 }, max_u64_inc = std::atomic<unsigned long> = { 0 }, avgcount = std::atomic<unsigned long> = { 0 }, avgcount2 = std::atomic<unsigned long> = { 0 }, histogram = std::unique_ptr<PerfHistogram<2>> = {get() = 0x0}}
Updated by Anoop C S 6 months ago
- Severity changed from 2 - major to 1 - critical
Today I figured that the same crash is the root cause for go-ceph CI failures reported via https://github.com/ceph/go-ceph/issues/1172.
Updated by Igor Golikov 6 months ago
- Status changed from New to Fix Under Review
- Pull request ID set to 65565
Updated by Venky Shankar 6 months ago
- Status changed from Fix Under Review to Resolved
Updated by Upkeep Bot 6 months ago
- Merge Commit set to f7bf6bb97bf028c50100b8a4463ed0bcdfe182f1
- Fixed In set to v20.3.0-3108-gf7bf6bb97b
- Upkeep Timestamp set to 2025-09-18T08:52:36+00:00
Actions