Project

General

Profile

Actions

Bug #65658

closed

mds: MetricAggregator::ms_can_fast_dispatch2 acquires locks

Added by Patrick Donnelly almost 2 years ago. Updated 5 months ago.

Status:
Resolved
Priority:
High
Category:
Performance/Resource Usage
Target version:
% Done:

100%

Source:
Development
Backport:
squid,reef,quincy
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Tags (freeform):
Fixed In:
v19.3.0-2346-g52d39c3398
Released In:
v20.2.0~2847
Upkeep Timestamp:
2025-11-01T01:11:39+00:00

Description

There was a lot of discussion surrounding this in

https://github.com/ceph/ceph/pull/26004/

but circling back we have since seen evidence this is causing significant problems: after a long up:replay recovery, the MDS can be flooded with metrics messages by clients and the lock contention in fast_dispatch is preventing the MDS from sending beacons to the monitors. This then leads to undesirable MDS failovers.

We should convert this to using regular dispatch (and optimize later if needed).


Related issues 5 (1 open4 closed)

Related to CephFS - Feature #65637: mds: continue sending heartbeats during recovery when MDS journal is largeIn ProgressVenky Shankar

Actions
Related to CephFS - Bug #68865: mds: do not fast dispatch metrics messages from clientsResolvedVenky Shankar

Actions
Copied to CephFS - Backport #66188: squid: mds: MetricAggregator::ms_can_fast_dispatch2 acquires locksResolvedPatrick DonnellyActions
Copied to CephFS - Backport #66189: quincy: mds: MetricAggregator::ms_can_fast_dispatch2 acquires locksRejectedPatrick DonnellyActions
Copied to CephFS - Backport #66190: reef: mds: MetricAggregator::ms_can_fast_dispatch2 acquires locksResolvedPatrick DonnellyActions
Actions #1

Updated by Patrick Donnelly almost 2 years ago

  • Description updated (diff)
Actions #2

Updated by Patrick Donnelly almost 2 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 57081
Actions #3

Updated by Patrick Donnelly almost 2 years ago

  • Related to Feature #65637: mds: continue sending heartbeats during recovery when MDS journal is large added
Actions #4

Updated by Patrick Donnelly almost 2 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #5

Updated by Upkeep Bot almost 2 years ago

  • Copied to Backport #66188: squid: mds: MetricAggregator::ms_can_fast_dispatch2 acquires locks added
Actions #6

Updated by Upkeep Bot almost 2 years ago

  • Copied to Backport #66189: quincy: mds: MetricAggregator::ms_can_fast_dispatch2 acquires locks added
Actions #7

Updated by Upkeep Bot almost 2 years ago

  • Copied to Backport #66190: reef: mds: MetricAggregator::ms_can_fast_dispatch2 acquires locks added
Actions #9

Updated by Venky Shankar over 1 year ago

  • Related to Bug #68865: mds: do not fast dispatch metrics messages from clients added
Actions #10

Updated by Konstantin Shalygin about 1 year ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 0 to 100
Actions #11

Updated by Upkeep Bot 9 months ago

  • Merge Commit set to 52d39c33986643ea678d5fc5579ac6fd19ea420e
  • Fixed In set to v19.3.0-2346-g52d39c33986
  • Upkeep Timestamp set to 2025-06-30T19:36:03+00:00
Actions #12

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v19.3.0-2346-g52d39c33986 to v19.3.0-2346-g52d39c3398
  • Upkeep Timestamp changed from 2025-06-30T19:36:03+00:00 to 2025-07-14T16:45:06+00:00
Actions #13

Updated by Upkeep Bot 5 months ago

  • Released In set to v20.2.0~2847
  • Upkeep Timestamp changed from 2025-07-14T16:45:06+00:00 to 2025-11-01T01:11:39+00:00
Actions

Also available in: Atom PDF