Project

General

Profile

Actions

Bug #62537

closed

cephfs scrub command will crash the standby-replay MDSs

Added by Xiubo Li over 2 years ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Development
Backport:
quincy,reef,squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Tags (freeform):
backport_processed
Fixed In:
v19.3.0-3269-g69704e91bf
Released In:
v20.2.0~2535
Upkeep Timestamp:
2025-11-01T01:34:26+00:00

Description

2023-08-21T23:49:14.169647955Z debug    -10> 2023-08-21T23:49:14.092+0000 7ff7626d2700  1 mds.ocs-storagecluster-cephfilesystem-b asok_command: scrub start {path=/,prefix=scrub start,scrubops=[recursive,repair]} (starting...)
2023-08-21T23:49:14.169647955Z debug     -9> 2023-08-21T23:49:14.092+0000 7ff75a6c2700  0 log_channel(cluster) log [INF] : scrub queued for path: /
2023-08-21T23:49:14.169647955Z debug     -8> 2023-08-21T23:49:14.092+0000 7ff75a6c2700  0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [/]
2023-08-21T23:49:14.169656471Z debug     -7> 2023-08-21T23:49:14.092+0000 7ff75a6c2700  0 log_channel(cluster) log [INF] : scrub summary: active paths [/]
2023-08-21T23:49:14.169656471Z debug     -6> 2023-08-21T23:49:14.093+0000 7ff7606ce700  5 mds.ocs-storagecluster-cephfilesystem-b ms_handle_reset on 10.128.29.132:0/145467019
2023-08-21T23:49:14.169656471Z debug     -5> 2023-08-21T23:49:14.093+0000 7ff7606ce700  3 mds.ocs-storagecluster-cephfilesystem-b ms_handle_reset closing connection for session client.51223961 10.128.29.132:0/145467019
2023-08-21T23:49:14.169672331Z debug     -4> 2023-08-21T23:49:14.093+0000 7ff75bec5700 -1 mds.0.cache.den(0x1 volumes) scrub: first > next_seq (1) [dentry #0x1/volumes [2,head] auth (dversion lock) v=122792435 ino=0x10000000000 state=1610612736 | inodepin=1 dirty=1 0x55685f63d400]
2023-08-21T23:49:14.169672331Z debug     -3> 2023-08-21T23:49:14.093+0000 7ff75bec5700  0 log_channel(cluster) log [WRN] : Scrub error on dentry [dentry #0x1/volumes [2,head] auth (dversion lock) v=122792435 ino=0x10000000000 state=1610612736 | inodepin=1 dirty=1 0x55685f63d400] see mds.ocs-storagecluster-cephfilesystem-b log and `damage ls` output for details
2023-08-21T23:49:14.169672331Z debug     -2> 2023-08-21T23:49:14.096+0000 7ff75a6c2700 -1 /builddir/build/BUILD/ceph-16.2.10/src/mds/MDLog.cc: In function 'void MDLog::_submit_entry(LogEvent*, MDSLogContextBase*)' thread 7ff75a6c2700 time 2023-08-21T23:49:14.095409+0000
2023-08-21T23:49:14.169672331Z /builddir/build/BUILD/ceph-16.2.10/src/mds/MDLog.cc: 281: FAILED ceph_assert(!mds->is_any_replay())
2023-08-21T23:49:14.169672331Z
2023-08-21T23:49:14.169672331Z  ceph version 16.2.10-172.el8cp (00a157ecd158911ece116ae43095de793ed9f389) pacific (stable)
2023-08-21T23:49:14.169672331Z  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7ff7693187b8]
2023-08-21T23:49:14.169672331Z  2: /usr/lib64/ceph/libceph-common.so.2(+0x2799d2) [0x7ff7693189d2]
2023-08-21T23:49:14.169672331Z  3: (MDLog::_submit_entry(LogEvent*, MDSLogContextBase*)+0x3f) [0x55685e18aaff]
2023-08-21T23:49:14.169672331Z  4: (Locker::scatter_writebehind(ScatterLock*)+0x6e8) [0x55685e03d0d8]
2023-08-21T23:49:14.169672331Z  5: (Locker::simple_lock(SimpleLock*, bool*)+0x428) [0x55685e03e938]
2023-08-21T23:49:14.169672331Z  6: (Locker::scatter_eval(ScatterLock*, bool*)+0x609) [0x55685e0420c9]
2023-08-21T23:49:14.169672331Z  7: (Locker::try_eval(SimpleLock*, bool*)+0x5f6) [0x55685e049316]
2023-08-21T23:49:14.169672331Z  8: (Locker::rdlock_finish(std::_Rb_tree_const_iterator<MutationImpl::LockOp> const&, MutationImpl*, bool*)+0x10a) [0x55685e04fd0a]
2023-08-21T23:49:14.169672331Z  9: (Locker::_drop_locks(MutationImpl*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*, bool)+0x63a) [0x55685e0512aa]
2023-08-21T23:49:14.169672331Z  10: (Locker::drop_locks(MutationImpl*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x7e) [0x55685e0513ae]
2023-08-21T23:49:14.169672331Z  11: (MDCache::request_cleanup(boost::intrusive_ptr<MDRequestImpl>&)+0xfa) [0x55685df8375a]
2023-08-21T23:49:14.169672331Z  12: (MDCache::request_finish(boost::intrusive_ptr<MDRequestImpl>&)+0x17b) [0x55685df83b5b]
2023-08-21T23:49:14.169672331Z  13: (Server::respond_to_request(boost::intrusive_ptr<MDRequestImpl>&, int)+0x206) [0x55685dec0436]
2023-08-21T23:49:14.169672331Z  14: (MDCache::rdlock_dirfrags_stats_work(boost::intrusive_ptr<MDRequestImpl>&)+0x217) [0x55685dfae6b7]
2023-08-21T23:49:14.169672331Z  15: (MDCache::rdlock_dirfrags_stats(CInode*, MDSInternalContext*)+0x63) [0x55685dfae8e3]
2023-08-21T23:49:14.169672331Z  16: ceph-mds(+0x451c87) [0x55685e0edc87]
2023-08-21T23:49:14.169672331Z  17: (Continuation::Callback::finish(int)+0x164) [0x55685e121dc4]
2023-08-21T23:49:14.169672331Z  18: (Context::complete(int)+0xd) [0x55685de5f86d]
2023-08-21T23:49:14.169672331Z  19: (MDSContext::complete(int)+0x203) [0x55685e177383]
2023-08-21T23:49:14.169672331Z  20: (CInode::_fetched(ceph::buffer::v15_2_0::list&, ceph::buffer::v15_2_0::list&, Context*)+0x379) [0x55685e10de99]
2023-08-21T23:49:14.169672331Z  21: (MDSContext::complete(int)+0x203) [0x55685e177383]
2023-08-21T23:49:14.169672331Z  22: (MDSIOContextBase::complete(int)+0x6ac) [0x55685e177b2c]
2023-08-21T23:49:14.169672331Z  23: (Finisher::finisher_thread_entry()+0x1a5) [0x7ff7693ba735]
2023-08-21T23:49:14.169672331Z  24: /lib64/libpthread.so.0(+0x81ca) [0x7ff7682f71ca]
2023-08-21T23:49:14.169672331Z  25: clone()
2023-08-21T23:49:14.169691227Z

IMO we should prevent the scrub command if we don't allow to run it in standby-replay MDSs instead of crashing it like this.


Related issues 3 (0 open3 closed)

Copied to CephFS - Backport #66868: reef: cephfs scrub command will crash the standby-replay MDSsResolvedJos CollinActions
Copied to CephFS - Backport #66869: squid: cephfs scrub command will crash the standby-replay MDSsResolvedJos CollinActions
Copied to CephFS - Backport #66870: quincy: cephfs scrub command will crash the standby-replay MDSsResolvedNeeraj Pratap SinghActions
Actions #1

Updated by Venky Shankar over 2 years ago

  • Category set to Correctness/Safety
  • Assignee set to Neeraj Pratap Singh
  • Target version set to v19.0.0
  • Backport set to reef,quincy,pacific
  • Component(FS) MDS added

Neeraj, please take this one.

Actions #2

Updated by Neeraj Pratap Singh over 2 years ago

  • Pull request ID set to 53301
Actions #3

Updated by Neeraj Pratap Singh over 2 years ago

  • Status changed from New to Fix Under Review
Actions #4

Updated by Venky Shankar over 1 year ago

  • Status changed from Fix Under Review to Pending Backport
  • Target version changed from v19.0.0 to v20.0.0
  • Source set to Development
  • Backport changed from reef,quincy,pacific to quincy,reef,squid
Actions #5

Updated by Upkeep Bot over 1 year ago

  • Copied to Backport #66868: reef: cephfs scrub command will crash the standby-replay MDSs added
Actions #6

Updated by Upkeep Bot over 1 year ago

  • Copied to Backport #66869: squid: cephfs scrub command will crash the standby-replay MDSs added
Actions #7

Updated by Upkeep Bot over 1 year ago

  • Copied to Backport #66870: quincy: cephfs scrub command will crash the standby-replay MDSs added
Actions #8

Updated by Upkeep Bot over 1 year ago

  • Tags (freeform) set to backport_processed
Actions #9

Updated by Neeraj Pratap Singh over 1 year ago

  • Status changed from Pending Backport to Resolved
Actions #10

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to 69704e91bfdfe6790aeacf6e4eb8bb9dd3f9f834
  • Fixed In set to v19.3.0-3269-g69704e91bfd
  • Upkeep Timestamp set to 2025-07-12T02:45:16+00:00
Actions #11

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v19.3.0-3269-g69704e91bfd to v19.3.0-3269-g69704e91bf
  • Upkeep Timestamp changed from 2025-07-12T02:45:16+00:00 to 2025-07-14T23:40:03+00:00
Actions #12

Updated by Upkeep Bot 5 months ago

  • Released In set to v20.2.0~2535
  • Upkeep Timestamp changed from 2025-07-14T23:40:03+00:00 to 2025-11-01T01:34:26+00:00
Actions

Also available in: Atom PDF