Actions
Bug #62537
closedcephfs scrub command will crash the standby-replay MDSs
Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:
0%
Source:
Development
Backport:
quincy,reef,squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Tags (freeform):
backport_processed
Merge Commit:
Fixed In:
v19.3.0-3269-g69704e91bf
Released In:
v20.2.0~2535
Upkeep Timestamp:
2025-11-01T01:34:26+00:00
Description
2023-08-21T23:49:14.169647955Z debug -10> 2023-08-21T23:49:14.092+0000 7ff7626d2700 1 mds.ocs-storagecluster-cephfilesystem-b asok_command: scrub start {path=/,prefix=scrub start,scrubops=[recursive,repair]} (starting...)
2023-08-21T23:49:14.169647955Z debug -9> 2023-08-21T23:49:14.092+0000 7ff75a6c2700 0 log_channel(cluster) log [INF] : scrub queued for path: /
2023-08-21T23:49:14.169647955Z debug -8> 2023-08-21T23:49:14.092+0000 7ff75a6c2700 0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [/]
2023-08-21T23:49:14.169656471Z debug -7> 2023-08-21T23:49:14.092+0000 7ff75a6c2700 0 log_channel(cluster) log [INF] : scrub summary: active paths [/]
2023-08-21T23:49:14.169656471Z debug -6> 2023-08-21T23:49:14.093+0000 7ff7606ce700 5 mds.ocs-storagecluster-cephfilesystem-b ms_handle_reset on 10.128.29.132:0/145467019
2023-08-21T23:49:14.169656471Z debug -5> 2023-08-21T23:49:14.093+0000 7ff7606ce700 3 mds.ocs-storagecluster-cephfilesystem-b ms_handle_reset closing connection for session client.51223961 10.128.29.132:0/145467019
2023-08-21T23:49:14.169672331Z debug -4> 2023-08-21T23:49:14.093+0000 7ff75bec5700 -1 mds.0.cache.den(0x1 volumes) scrub: first > next_seq (1) [dentry #0x1/volumes [2,head] auth (dversion lock) v=122792435 ino=0x10000000000 state=1610612736 | inodepin=1 dirty=1 0x55685f63d400]
2023-08-21T23:49:14.169672331Z debug -3> 2023-08-21T23:49:14.093+0000 7ff75bec5700 0 log_channel(cluster) log [WRN] : Scrub error on dentry [dentry #0x1/volumes [2,head] auth (dversion lock) v=122792435 ino=0x10000000000 state=1610612736 | inodepin=1 dirty=1 0x55685f63d400] see mds.ocs-storagecluster-cephfilesystem-b log and `damage ls` output for details
2023-08-21T23:49:14.169672331Z debug -2> 2023-08-21T23:49:14.096+0000 7ff75a6c2700 -1 /builddir/build/BUILD/ceph-16.2.10/src/mds/MDLog.cc: In function 'void MDLog::_submit_entry(LogEvent*, MDSLogContextBase*)' thread 7ff75a6c2700 time 2023-08-21T23:49:14.095409+0000
2023-08-21T23:49:14.169672331Z /builddir/build/BUILD/ceph-16.2.10/src/mds/MDLog.cc: 281: FAILED ceph_assert(!mds->is_any_replay())
2023-08-21T23:49:14.169672331Z
2023-08-21T23:49:14.169672331Z ceph version 16.2.10-172.el8cp (00a157ecd158911ece116ae43095de793ed9f389) pacific (stable)
2023-08-21T23:49:14.169672331Z 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7ff7693187b8]
2023-08-21T23:49:14.169672331Z 2: /usr/lib64/ceph/libceph-common.so.2(+0x2799d2) [0x7ff7693189d2]
2023-08-21T23:49:14.169672331Z 3: (MDLog::_submit_entry(LogEvent*, MDSLogContextBase*)+0x3f) [0x55685e18aaff]
2023-08-21T23:49:14.169672331Z 4: (Locker::scatter_writebehind(ScatterLock*)+0x6e8) [0x55685e03d0d8]
2023-08-21T23:49:14.169672331Z 5: (Locker::simple_lock(SimpleLock*, bool*)+0x428) [0x55685e03e938]
2023-08-21T23:49:14.169672331Z 6: (Locker::scatter_eval(ScatterLock*, bool*)+0x609) [0x55685e0420c9]
2023-08-21T23:49:14.169672331Z 7: (Locker::try_eval(SimpleLock*, bool*)+0x5f6) [0x55685e049316]
2023-08-21T23:49:14.169672331Z 8: (Locker::rdlock_finish(std::_Rb_tree_const_iterator<MutationImpl::LockOp> const&, MutationImpl*, bool*)+0x10a) [0x55685e04fd0a]
2023-08-21T23:49:14.169672331Z 9: (Locker::_drop_locks(MutationImpl*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*, bool)+0x63a) [0x55685e0512aa]
2023-08-21T23:49:14.169672331Z 10: (Locker::drop_locks(MutationImpl*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x7e) [0x55685e0513ae]
2023-08-21T23:49:14.169672331Z 11: (MDCache::request_cleanup(boost::intrusive_ptr<MDRequestImpl>&)+0xfa) [0x55685df8375a]
2023-08-21T23:49:14.169672331Z 12: (MDCache::request_finish(boost::intrusive_ptr<MDRequestImpl>&)+0x17b) [0x55685df83b5b]
2023-08-21T23:49:14.169672331Z 13: (Server::respond_to_request(boost::intrusive_ptr<MDRequestImpl>&, int)+0x206) [0x55685dec0436]
2023-08-21T23:49:14.169672331Z 14: (MDCache::rdlock_dirfrags_stats_work(boost::intrusive_ptr<MDRequestImpl>&)+0x217) [0x55685dfae6b7]
2023-08-21T23:49:14.169672331Z 15: (MDCache::rdlock_dirfrags_stats(CInode*, MDSInternalContext*)+0x63) [0x55685dfae8e3]
2023-08-21T23:49:14.169672331Z 16: ceph-mds(+0x451c87) [0x55685e0edc87]
2023-08-21T23:49:14.169672331Z 17: (Continuation::Callback::finish(int)+0x164) [0x55685e121dc4]
2023-08-21T23:49:14.169672331Z 18: (Context::complete(int)+0xd) [0x55685de5f86d]
2023-08-21T23:49:14.169672331Z 19: (MDSContext::complete(int)+0x203) [0x55685e177383]
2023-08-21T23:49:14.169672331Z 20: (CInode::_fetched(ceph::buffer::v15_2_0::list&, ceph::buffer::v15_2_0::list&, Context*)+0x379) [0x55685e10de99]
2023-08-21T23:49:14.169672331Z 21: (MDSContext::complete(int)+0x203) [0x55685e177383]
2023-08-21T23:49:14.169672331Z 22: (MDSIOContextBase::complete(int)+0x6ac) [0x55685e177b2c]
2023-08-21T23:49:14.169672331Z 23: (Finisher::finisher_thread_entry()+0x1a5) [0x7ff7693ba735]
2023-08-21T23:49:14.169672331Z 24: /lib64/libpthread.so.0(+0x81ca) [0x7ff7682f71ca]
2023-08-21T23:49:14.169672331Z 25: clone()
2023-08-21T23:49:14.169691227Z
IMO we should prevent the scrub command if we don't allow to run it in standby-replay MDSs instead of crashing it like this.
Updated by Venky Shankar over 2 years ago
- Category set to Correctness/Safety
- Assignee set to Neeraj Pratap Singh
- Target version set to v19.0.0
- Backport set to reef,quincy,pacific
- Component(FS) MDS added
Neeraj, please take this one.
Updated by Neeraj Pratap Singh over 2 years ago
- Status changed from New to Fix Under Review
Updated by Venky Shankar over 1 year ago
- Status changed from Fix Under Review to Pending Backport
- Target version changed from v19.0.0 to v20.0.0
- Source set to Development
- Backport changed from reef,quincy,pacific to quincy,reef,squid
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #66868: reef: cephfs scrub command will crash the standby-replay MDSs added
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #66869: squid: cephfs scrub command will crash the standby-replay MDSs added
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #66870: quincy: cephfs scrub command will crash the standby-replay MDSs added
Updated by Neeraj Pratap Singh over 1 year ago
- Status changed from Pending Backport to Resolved
Updated by Upkeep Bot 8 months ago
- Merge Commit set to 69704e91bfdfe6790aeacf6e4eb8bb9dd3f9f834
- Fixed In set to v19.3.0-3269-g69704e91bfd
- Upkeep Timestamp set to 2025-07-12T02:45:16+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v19.3.0-3269-g69704e91bfd to v19.3.0-3269-g69704e91bf
- Upkeep Timestamp changed from 2025-07-12T02:45:16+00:00 to 2025-07-14T23:40:03+00:00
Updated by Upkeep Bot 5 months ago
- Released In set to v20.2.0~2535
- Upkeep Timestamp changed from 2025-07-14T23:40:03+00:00 to 2025-11-01T01:34:26+00:00
Actions