mds: use regular dispatch for processing metrics by batrick · Pull Request #57081 · ceph/ceph

batrick · 2024-04-24T19:41:20Z

There have been cases where the MDS does an undesirable failover because it misses heartbeat resets after a long recovery in up:replay. It was observed that the MDS was processing a flood of metrics messages from all reconnecting clients. This likely caused undersiable MetricAggregator::lock contention in the messenger threads while fast dispatching client metrics.

Instead, use the normal dispatch where acquiring locks is okay to do.

See-also: linux.git/f7c2f4f6ce16fb58f7d024f3e1b40023c4b43ff9
Fixes: https://tracker.ceph.com/issues/65658

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e

batrick · 2024-05-01T15:22:12Z

jenkins test api

batrick · 2024-05-01T15:22:19Z

jenkins test make check arm64

vshankar

This change is fine - I guess we underestimated issues when using fast dispatch in the original PR, especially when using locks - the whole discussion of using spinlocks vs mutexes. I recall that we switched to using a mutex since the code was doing memory allocation (under a spinlock which is a big no no). Maybe it's time to revisit that.

MDS uses fast dispatch with beacon (reply) messages which aren't too frequently received or there isn't a flood of those at any given point. The effect with metrics messages are that they can be huge in number and too frequent.

batrick · 2024-05-03T01:07:09Z

This PR is under test in https://tracker.ceph.com/issues/65771.

* refs/pull/57081/head: mds: use regular dispatch for processing metrics

batrick · 2024-05-07T14:15:31Z

https://tracker.ceph.com/issues/65771#note-3

batrick · 2024-05-07T14:30:55Z

I'm not going to merge this right yet: I think we should also increase the priority of metrics messages so they get processed ahead of regular requests.

vshankar · 2024-05-08T12:30:03Z

I'm not going to merge this right yet: I think we should also increase the priority of metrics messages so they get processed ahead of regular requests.

Does it matter if the processing is delayed a bit? Did you see huge lags in metrics updation in tests?

batrick · 2024-05-08T13:06:39Z

I'm not going to merge this right yet: I think we should also increase the priority of metrics messages so they get processed ahead of regular requests.

Does it matter if the processing is delayed a bit? Did you see huge lags in metrics updation in tests?

No. But I'm trying to maintain the current behavior of metrics messages being processed promptly by the MDS. In any case, it makes sense to me that these messages should not be queued behind potentially thousands of client requests.

There have been cases where the MDS does an undesirable failover because it misses heartbeat resets after a long recovery in up:replay. It was observed that the MDS was processing a flood of metrics messages from all reconnecting clients. This likely caused undersiable MetricAggregator::lock contention in the messenger threads while fast dispatching client metrics. Instead, use the normal dispatch where acquiring locks is okay to do. See-also: linux.git/f7c2f4f6ce16fb58f7d024f3e1b40023c4b43ff9 Fixes: https://tracker.ceph.com/issues/65658 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

Since these are no longer fast dispatched, we need to ensure they are processed in a timely fashion and ahead of any incoming requests. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

batrick · 2024-05-08T18:38:42Z

jenkins test make check

batrick · 2024-05-08T18:38:48Z

jenkins test windows

batrick · 2024-05-08T18:39:25Z

This PR is under test in https://tracker.ceph.com/issues/65867.

gregsfortytwo

LGTM

gregsfortytwo · 2024-05-14T16:20:13Z

@vshankar

This change is fine - I guess we underestimated issues when using fast dispatch in the original PR, especially when using locks - the whole discussion of using spinlocks vs mutexes. I recall that we switched to using a mutex since the code was doing memory allocation (under a spinlock which is a big no no). Maybe it's time to revisit that.

MDS uses fast dispatch with beacon (reply) messages which aren't too frequently received or there isn't a flood of those at any given point. The effect with metrics messages are that they can be huge in number and too frequent.

You are just not allowed to block inside of an ms_fast_dispatch function. Like, at all. It's stealing one of the AsyncMessenger threads (default: 8?) to do the fast dispatch and if you go to sleep holding it then messaging work can stop. So holding a mutex for the entirety of the work duration is really, really bad, and the client reconnect storms we were getting straightforwardly translate into blocked messaging (because way more than 8 clients can simultaneously send in looooots of MClientMetrics messages, which have to be processed serially, and as they're holding messenger locks while being processed, they will starve any other attempts to use the messenger and back up other MDS worker threads when outgoing throttle limits are hit).

If you look at the OSD, I believe you'll find it grabs one mutex for the duration of an "enqueue" call. But everything else is lock free.

gregsfortytwo · 2024-05-14T16:35:56Z

Actually @vshankar, is there some way we could update the fast_dispatch documentation to be clearer about these requirements? I thought it was already pretty explicit, but it's clearly not good enough.

gregsfortytwo · 2024-05-14T16:56:33Z

@batrick we have similar problems with Beacon https://tracker.ceph.com/issues/66014 so please look at that next. We'll need to backport both of these everywhere.

batrick · 2024-05-15T01:52:07Z

@batrick we have similar problems with Beacon https://tracker.ceph.com/issues/66014 so please look at that next. We'll need to backport both of these everywhere.

#57469

batrick · 2024-05-17T01:25:16Z

This PR is under test in https://tracker.ceph.com/issues/66086.

batrick added cephfs Ceph File System needs-review labels Apr 24, 2024

batrick requested review from joscollin and vshankar April 24, 2024 19:41

vshankar approved these changes May 2, 2024

View reviewed changes

batrick added needs-qa wip-pdonnell-testing and removed needs-review labels May 2, 2024

batrick added a commit to batrick/ceph that referenced this pull request May 7, 2024

Merge PR ceph#57081 into wip-pdonnell-testing-20240503.163550-debug

d6d140d

* refs/pull/57081/head: mds: use regular dispatch for processing metrics

batrick force-pushed the i65658 branch from 8a4e4c6 to b728c10 Compare May 7, 2024 20:11

batrick added 2 commits May 8, 2024 09:07

messages/MClientMetrics: increase priority ahead of regular requests

d56b502

Since these are no longer fast dispatched, we need to ensure they are processed in a timely fashion and ahead of any incoming requests. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

batrick force-pushed the i65658 branch from b728c10 to d56b502 Compare May 8, 2024 13:08

gregsfortytwo approved these changes May 14, 2024

View reviewed changes

batrick merged commit 52d39c3 into ceph:main May 23, 2024

batrick deleted the i65658 branch May 23, 2024 01:12

This was referenced May 23, 2024

squid: mds: use regular dispatch for processing metrics #57678

Merged

quincy: mds: use regular dispatch for processing metrics #57679

Closed

reef: mds: use regular dispatch for processing metrics #57681

Merged

Conversation

batrick commented Apr 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contribution Guidelines

Checklist

Uh oh!

batrick commented May 1, 2024

Uh oh!

batrick commented May 1, 2024

Uh oh!

vshankar left a comment

Choose a reason for hiding this comment

Uh oh!

batrick commented May 3, 2024

Uh oh!

batrick commented May 7, 2024

Uh oh!

batrick commented May 7, 2024

Uh oh!

vshankar commented May 8, 2024

Uh oh!

batrick commented May 8, 2024

Uh oh!

batrick commented May 8, 2024

Uh oh!

batrick commented May 8, 2024

Uh oh!

batrick commented May 8, 2024

Uh oh!

gregsfortytwo left a comment

Choose a reason for hiding this comment

Uh oh!

gregsfortytwo commented May 14, 2024

Uh oh!

gregsfortytwo commented May 14, 2024

Uh oh!

gregsfortytwo commented May 14, 2024

Uh oh!

batrick commented May 15, 2024

Uh oh!

batrick commented May 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

batrick commented Apr 24, 2024 •

edited

Loading