Skip to content

mgr: register OSDs in ms_handle_accept#52292

Merged
batrick merged 2 commits intoceph:mainfrom
batrick:i61874
Aug 28, 2023
Merged

mgr: register OSDs in ms_handle_accept#52292
batrick merged 2 commits intoceph:mainfrom
batrick:i61874

Conversation

@batrick
Copy link
Member

@batrick batrick commented Jul 3, 2023

It's a no-no to acquire locks in these "fast" messenger methods. This can lead to messenger slow downs in the best case as it's blocking reads on the wire. In the worse case, the messenger may deadlock with other threads, preventing any further message reads off the wire.

It's not obvious this method is "fast" so I've added a comment regarding this.

Fixes: https://tracker.ceph.com/issues/61874

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@gregsfortytwo
Copy link
Member

Hmm, we should audit other users of ms_handle_accept -- I haven't looked at this in a while but im pretty sure we put "fast" in the name of everything which needed to be non-blocking when we set that interface up, so there may be a conflict here.

@batrick batrick force-pushed the i61874 branch 2 times, most recently from 106a3a3 to c0bcf30 Compare July 3, 2023 01:45
batrick added 2 commits July 10, 2023 17:12
Like other fast Dispatcher methods, it must not acquire locks or do
anything that might take a long time.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
It's a no-no to acquire locks in these "fast" messenger methods. This
can lead to messenger slow downs in the best case as it's blocking reads
on the wire. In the worse case, the messenger may deadlock with other
threads, preventing any further message reads off the wire.

It's not obvious this method is "fast" so I've added a comment regarding
this.

Fixes: https://tracker.ceph.com/issues/61874
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
@batrick
Copy link
Member Author

batrick commented Jul 10, 2023

Hmm, we should audit other users of ms_handle_accept -- I haven't looked at this in a while but im pretty sure we put "fast" in the name of everything which needed to be non-blocking when we set that interface up, so there may be a conflict here.

The issue is ms_handle_authentication not ms_handle_accept. The former is run as a "fast" method and the latter in the regular dispatcher threads.

I've already gone through other users of ms_handle_authentication but have now added a commit that makes it more clear in the method name, as with the others. PTAL.

@batrick batrick requested review from rzarzynski and vshankar July 11, 2023 12:34
@batrick
Copy link
Member Author

batrick commented Aug 1, 2023

ping @rzarzynski

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @batrick - mgr/DaemonServer.cc source changes are really meaningful!

Needs @rzarzynski approval though.

Copy link
Contributor

@rzarzynski rzarzynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks sane.

I very like how the refactor and fix have been split apart. It should make backporting straightforward. Thanks!

@batrick batrick merged commit a9f08d4 into ceph:main Aug 28, 2023
@batrick batrick deleted the i61874 branch August 28, 2023 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants