Bug #66948: mon.a (mon.0) 326 : cluster [WRN] Health check failed: 1 MDSs behind on trimming (MDS_TRIM)" in cluster log - CephFS - Ceph

Actions

Copy link

Bug #66948

closed

mon.a (mon.0) 326 : cluster [WRN] Health check failed: 1 MDSs behind on trimming (MDS_TRIM)" in cluster log

Added by Venky Shankar over 1 year ago. Updated 5 months ago.

Status:

Resolved

Priority:

High

Assignee:

Venky Shankar

Category:

Introspection/Control

Target version:

Ceph - v20.0.0

% Done:

Source:

Q/A

Backport:

squid,reef

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

multimds

Pull request ID:

60381

Tags (freeform):

backport_processed

Merge Commit:

9d2b3aaa96fdbb7f8e1b231df918ef4aa8aab52c

Fixed In:

v19.3.0-6036-g9d2b3aaa96

Released In:

v20.2.0~1630

Upkeep Timestamp:

2025-11-01T01:35:55+00:00

Description

/a/vshankar-2024-07-08_07:21:13-fs-wip-vshankar-testing-20240705.150505-debug-testing-default-smithi/7791866

PRs under test were unrelated, so this trim warning should be inspected.

Related issues 3 (1 open — 2 closed)

Actions

Copy link

Updated by Venky Shankar over 1 year ago

Assignee set to Kotresh Hiremath Ravishankar

Actions

Copy link

Updated by Venky Shankar over 1 year ago

Status changed from New to Triaged

Actions

Copy link

Updated by Venky Shankar over 1 year ago

Priority changed from Normal to High

@Kotresh Hiremath Ravishankar - there is a BZ⁰ report downstream which has similar warning with squid. Were you able to dig anything interesting here?

[0]: https://bugzilla.redhat.com/show_bug.cgi?id=2312513

Actions

Copy link

Updated by Venky Shankar over 1 year ago

I think threre is a bug in MDLog::trim() where the MDS could not trim segments for a finite amount of time. Basically this part of the code

    unsigned num_remaining_segments = (segments.size() - expired_segments.size() - expiring_segments.size());
    dout(10) << __func__ << ": new_expiring_segments=" << new_expiring_segments
             << ", num_remaining_segments=" << num_remaining_segments
             << ", max_segments=" << max_segments << dendl;

    if ((num_remaining_segments <= max_segments) &&
        (max_ev < 0 || (num_events - expiring_events - expired_events) <= (uint64_t)max_ev)) {
      dout(10) << __func__ << ": breaking out of trim loop - segments/events fell below ceiling" 
               << " max_segments/max_ev" << dendl;
      break;
    }

num_remaining_segments is being calculated as:

unsigned num_remaining_segments = (segments.size() - expired_segments.size() - expiring_segments.size());

and then compared against max_segments which IMO could cause log skips in trimming log segments.

Actions

Copy link

Updated by Venky Shankar over 1 year ago

Assignee changed from Kotresh Hiremath Ravishankar to Venky Shankar

Kotresh, I'm taking this one.

Actions

Copy link

Updated by Venky Shankar over 1 year ago

Venky Shankar wrote in #note-4:

I think threre is a bug in MDLog::trim() where the MDS could not trim segments for a finite amount of time. Basically this part of the code

[...]

num_remaining_segments is being calculated as:

[...]

and then compared against max_segments which IMO could cause log skips in trimming log segments.

Note that when expiring a log segment takes a considerable amount of time, num_remaining_segments could stay just below max_segments for that duration unless, more segments get added (obviously) or segments expire and are removed from segments map (in _trim_expired_segments()).

Actions

Copy link

Updated by Venky Shankar over 1 year ago · Edited

Okay, so this is what's happening - MDLog has code that breaks out of the trim loop when num_remaining_segments falls below (or equal to max_segments) as mentioned in note-4. So, it's possible that the number of segments is more than twice the value of max_segments, however num_remaining_segments is still below max_segments due to the adjustment of expiring/expired segments. In Beacon.cc, the following check

  // Detect MDS_HEALTH_TRIM condition
  // Indicates MDS is not trimming promptly
  {
    const auto log_max_segments = mds->mdlog->get_max_segments();
    const auto log_warn_factor = g_conf().get_val<double>("mds_log_warn_factor");
    if (mds->mdlog->get_num_segments() > (size_t)(log_max_segments * log_warn_factor)) {
      CachedStackStringStream css;
      *css << "Behind on trimming (" << mds->mdlog->get_num_segments()
        << "/" << log_max_segments << ")";

      MDSHealthMetric m(MDS_HEALTH_TRIM, HEALTH_WARN, css->strv());
      m.metadata["num_segments"] = stringify(mds->mdlog->get_num_segments());
      m.metadata["max_segments"] = stringify(log_max_segments);
      health.metrics.push_back(m);
    }
  }

Only checks if the number go segments is way above (actually twice by default) max_segments which could cause trim warning to be generated even if MDLog considers num_remaining_segments and breaks of out trim loop.

Actions

Copy link

Updated by Venky Shankar over 1 year ago

There is a time window where the MDS had a lot of log segments and the expired counters wasn't increasing. num_remainsing_segments was just below (or around) max_segments. The MDS noticed the increased number of log segments (basically, exceeding log_max_segments * log_warn_factor) and raising trim warning. It would be interesting to see why did the MDS take long time to expire a set of log segments.

Actions

Copy link

Updated by Venky Shankar over 1 year ago

Okay, back to this after I got sidelines elsewhere. So, the mdlog can reach a state where trim would not continue trimming log segments, but the number of segments exceeds mds_log_max_segments * mds_log_warn_factor. E.g.:

./remote/smithi089/log/ceph-mds.f.log.1.gz:2024-07-09T23:43:46.942+0000 7f2ed139b640 10 mds.2.log trim 310 / 128 segments, 44803 / -1 events, 0 (0) expiring, 182 (9191) expired
./remote/smithi089/log/ceph-mds.f.log.1.gz:2024-07-09T23:43:47.946+0000 7f2ed139b640 10 mds.2.log trim 310 / 128 segments, 44861 / -1 events, 0 (0) expiring, 182 (9191) expired
./remote/smithi089/log/ceph-mds.f.log.1.gz:2024-07-09T23:43:48.946+0000 7f2ed139b640 10 mds.2.log trim 310 / 128 segments, 44922 / -1 events, 0 (0) expiring, 182 (9191) expired
./remote/smithi089/log/ceph-mds.f.log.1.gz:2024-07-09T23:43:49.946+0000 7f2ed139b640 10 mds.2.log trim 310 / 128 segments, 44997 / -1 events, 0 (0) expiring, 182 (9191) expired

Here:
- number of segments: 310
- expiring segments: 0
- expired segments: 182

Therefore as per note-4, num_remaining_segments = 128 (310 - 0 - 182), which satisfies the condition to break out of the segment trimming loop. Once this state is reached, no more log segments get trimmed, unless more segments gets added obviously, however, beacon notices that the number of segments (310) exceeds the threshold (128 * 2) and therefore generates MDS_TRIM health warning.

I think it would suffice to trim log segments if no segments were trim for some interval and the number of log segments exceeds the threshold (max_seg * factor).

Actions

Copy link

#10

Updated by Greg Farnum over 1 year ago

This is bizarre, why is the mdlog leaving expired_segments just sitting in the log? Is that formula just wrong, or being used for two purposes that don't quite mesh or something?

Actions

Copy link

#11

Updated by Venky Shankar over 1 year ago

Status changed from Triaged to Fix Under Review
Pull request ID set to 60381

Actions

Copy link

#12

Updated by Venky Shankar over 1 year ago

I pushed out a change to trim mdlog under certain conditions -- we can move the discussion over to the PR.

Actions

Copy link

#13

Updated by Venky Shankar over 1 year ago

Related to QA Run #68744: wip-vshankar-testing-20241029.045257-debug added

Actions

Copy link

#14

Updated by Venky Shankar over 1 year ago

Related to deleted (QA Run #68744: wip-vshankar-testing-20241029.045257-debug)

Actions

Copy link

#15

Updated by Patrick Donnelly over 1 year ago

Status changed from Fix Under Review to Pending Backport
Backport set to squid,reef

Actions

Copy link

#16

Updated by Upkeep Bot over 1 year ago

Copied to Backport #68919: squid: mon.a (mon.0) 326 : cluster [WRN] Health check failed: 1 MDSs behind on trimming (MDS_TRIM)" in cluster log added

Actions

Copy link

#17

Updated by Upkeep Bot over 1 year ago

Copied to Backport #68920: reef: mon.a (mon.0) 326 : cluster [WRN] Health check failed: 1 MDSs behind on trimming (MDS_TRIM)" in cluster log added

Actions

Copy link

#18

Updated by Upkeep Bot over 1 year ago

Tags (freeform) set to backport_processed

Actions

Copy link

#19

Updated by Venky Shankar over 1 year ago

Related to Bug #68913: qa: test_export_pin_many: AssertionError: 0 not greater than 0 added

Actions

Copy link

#20

Updated by Matthew Hassel 10 months ago

Hi! Any idea when this may be slated for squid backport? Cluster is version 19.2.1 and hitting an ever incrementing "Behind in MDS" stat.

Actions

Copy link

#21

Updated by Venky Shankar 10 months ago

Matthew Hassel wrote in #note-20:

Hi! Any idea when this may be slated for squid backport? Cluster is version 19.2.1 and hitting an ever incrementing "Behind in MDS" stat.

The next squid point release is under planning. Will get this for 19.2.3.

Actions

Copy link

#22

Updated by Venky Shankar 10 months ago

Venky Shankar wrote in #note-21:

Matthew Hassel wrote in #note-20:

Hi! Any idea when this may be slated for squid backport? Cluster is version 19.2.1 and hitting an ever incrementing "Behind in MDS" stat.

The next squid point release is under planning. Will get this for 19.2.3.

Oh, I just checked. The squid backport https://github.com/ceph/ceph/pull/60838 is already in squid branch. So, when 19.2.3 is cut, the fix will be available.

Actions

Copy link

#23

Updated by Upkeep Bot 9 months ago

Status changed from Pending Backport to Resolved
Upkeep Timestamp set to 2025-07-08T18:24:24+00:00

Actions

Copy link

#24

Updated by Darrell Enns 8 months ago

For those of us on 19.2.2 with a constantly increasing "behind on trim" count: Is there any workaround? Is it safe to just ignore the warning for now?

Actions

Copy link

#25

Updated by Upkeep Bot 8 months ago

Merge Commit set to 9d2b3aaa96fdbb7f8e1b231df918ef4aa8aab52c
Fixed In set to v19.3.0-6036-g9d2b3aaa96
Upkeep Timestamp changed from 2025-07-08T18:24:24+00:00 to 2025-08-02T04:50:32+00:00

Actions

Copy link

#26

Updated by Upkeep Bot 5 months ago

Released In set to v20.2.0~1630
Upkeep Timestamp changed from 2025-08-02T04:50:32+00:00 to 2025-11-01T01:35:55+00:00

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Tags

Custom queries

Bug #66948

mon.a (mon.0) 326 : cluster [WRN] Health check failed: 1 MDSs behind on trimming (MDS_TRIM)" in cluster log

Updated by Venky Shankar over 1 year ago

Updated by Venky Shankar over 1 year ago

Updated by Venky Shankar over 1 year ago

Updated by Venky Shankar over 1 year ago

Updated by Venky Shankar over 1 year ago

Updated by Venky Shankar over 1 year ago

Updated by Venky Shankar over 1 year ago · Edited

Updated by Venky Shankar over 1 year ago

Updated by Venky Shankar over 1 year ago

Updated by Greg Farnum over 1 year ago

Updated by Venky Shankar over 1 year ago

Updated by Venky Shankar over 1 year ago

Updated by Venky Shankar over 1 year ago

Updated by Venky Shankar over 1 year ago

Updated by Patrick Donnelly over 1 year ago

Updated by Upkeep Bot over 1 year ago

Updated by Upkeep Bot over 1 year ago

Updated by Upkeep Bot over 1 year ago

Updated by Venky Shankar over 1 year ago

Updated by Matthew Hassel 10 months ago

Updated by Venky Shankar 10 months ago

Updated by Venky Shankar 10 months ago

Updated by Upkeep Bot 9 months ago

Updated by Darrell Enns 8 months ago

Updated by Upkeep Bot 8 months ago

Updated by Upkeep Bot 5 months ago