Skip to content

mds: use regular dispatch for processing metrics#57081

Merged
batrick merged 2 commits intoceph:mainfrom
batrick:i65658
May 23, 2024
Merged

mds: use regular dispatch for processing metrics#57081
batrick merged 2 commits intoceph:mainfrom
batrick:i65658

Conversation

@batrick
Copy link
Member

@batrick batrick commented Apr 24, 2024

There have been cases where the MDS does an undesirable failover because it misses heartbeat resets after a long recovery in up:replay. It was observed that the MDS was processing a flood of metrics messages from all reconnecting clients. This likely caused undersiable MetricAggregator::lock contention in the messenger threads while fast dispatching client metrics.

Instead, use the normal dispatch where acquiring locks is okay to do.

See-also: linux.git/f7c2f4f6ce16fb58f7d024f3e1b40023c4b43ff9
Fixes: https://tracker.ceph.com/issues/65658

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@batrick batrick added cephfs Ceph File System needs-review labels Apr 24, 2024
@batrick batrick requested review from joscollin and vshankar April 24, 2024 19:41
@batrick
Copy link
Member Author

batrick commented May 1, 2024

jenkins test api

@batrick
Copy link
Member Author

batrick commented May 1, 2024

jenkins test make check arm64

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is fine - I guess we underestimated issues when using fast dispatch in the original PR, especially when using locks - the whole discussion of using spinlocks vs mutexes. I recall that we switched to using a mutex since the code was doing memory allocation (under a spinlock which is a big no no). Maybe it's time to revisit that.

MDS uses fast dispatch with beacon (reply) messages which aren't too frequently received or there isn't a flood of those at any given point. The effect with metrics messages are that they can be huge in number and too frequent.

@batrick
Copy link
Member Author

batrick commented May 3, 2024

This PR is under test in https://tracker.ceph.com/issues/65771.

batrick added a commit to batrick/ceph that referenced this pull request May 7, 2024
* refs/pull/57081/head:
	mds: use regular dispatch for processing metrics
@batrick
Copy link
Member Author

batrick commented May 7, 2024

@batrick
Copy link
Member Author

batrick commented May 7, 2024

I'm not going to merge this right yet: I think we should also increase the priority of metrics messages so they get processed ahead of regular requests.

@vshankar
Copy link
Contributor

vshankar commented May 8, 2024

I'm not going to merge this right yet: I think we should also increase the priority of metrics messages so they get processed ahead of regular requests.

Does it matter if the processing is delayed a bit? Did you see huge lags in metrics updation in tests?

@batrick
Copy link
Member Author

batrick commented May 8, 2024

I'm not going to merge this right yet: I think we should also increase the priority of metrics messages so they get processed ahead of regular requests.

Does it matter if the processing is delayed a bit? Did you see huge lags in metrics updation in tests?

No. But I'm trying to maintain the current behavior of metrics messages being processed promptly by the MDS. In any case, it makes sense to me that these messages should not be queued behind potentially thousands of client requests.

batrick added 2 commits May 8, 2024 09:07
There have been cases where the MDS does an undesirable failover because it
misses heartbeat resets after a long recovery in up:replay.  It was observed
that the MDS was processing a flood of metrics messages from all reconnecting
clients. This likely caused undersiable MetricAggregator::lock contention in
the messenger threads while fast dispatching client metrics.

Instead, use the normal dispatch where acquiring locks is okay to do.

See-also: linux.git/f7c2f4f6ce16fb58f7d024f3e1b40023c4b43ff9
Fixes: https://tracker.ceph.com/issues/65658
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Since these are no longer fast dispatched, we need to ensure they are processed
in a timely fashion and ahead of any incoming requests.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
@batrick
Copy link
Member Author

batrick commented May 8, 2024

jenkins test make check

@batrick
Copy link
Member Author

batrick commented May 8, 2024

jenkins test windows

@batrick
Copy link
Member Author

batrick commented May 8, 2024

This PR is under test in https://tracker.ceph.com/issues/65867.

Copy link
Member

@gregsfortytwo gregsfortytwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gregsfortytwo
Copy link
Member

@vshankar

This change is fine - I guess we underestimated issues when using fast dispatch in the original PR, especially when using locks - the whole discussion of using spinlocks vs mutexes. I recall that we switched to using a mutex since the code was doing memory allocation (under a spinlock which is a big no no). Maybe it's time to revisit that.

MDS uses fast dispatch with beacon (reply) messages which aren't too frequently received or there isn't a flood of those at any given point. The effect with metrics messages are that they can be huge in number and too frequent.

You are just not allowed to block inside of an ms_fast_dispatch function. Like, at all. It's stealing one of the AsyncMessenger threads (default: 8?) to do the fast dispatch and if you go to sleep holding it then messaging work can stop. So holding a mutex for the entirety of the work duration is really, really bad, and the client reconnect storms we were getting straightforwardly translate into blocked messaging (because way more than 8 clients can simultaneously send in looooots of MClientMetrics messages, which have to be processed serially, and as they're holding messenger locks while being processed, they will starve any other attempts to use the messenger and back up other MDS worker threads when outgoing throttle limits are hit).

If you look at the OSD, I believe you'll find it grabs one mutex for the duration of an "enqueue" call. But everything else is lock free.

@gregsfortytwo
Copy link
Member

Actually @vshankar, is there some way we could update the fast_dispatch documentation to be clearer about these requirements? I thought it was already pretty explicit, but it's clearly not good enough.

@gregsfortytwo
Copy link
Member

@batrick we have similar problems with Beacon https://tracker.ceph.com/issues/66014 so please look at that next. We'll need to backport both of these everywhere.

@batrick
Copy link
Member Author

batrick commented May 15, 2024

@batrick we have similar problems with Beacon https://tracker.ceph.com/issues/66014 so please look at that next. We'll need to backport both of these everywhere.

#57469

@batrick
Copy link
Member Author

batrick commented May 17, 2024

This PR is under test in https://tracker.ceph.com/issues/66086.

@batrick batrick merged commit 52d39c3 into ceph:main May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants