Skip to content

osd: add watch ping timeout count in osd#56976

Merged
yuriw merged 1 commit intoceph:mainfrom
YiteGu:add-watcher-timeout-discover
Jul 31, 2024
Merged

osd: add watch ping timeout count in osd#56976
yuriw merged 1 commit intoceph:mainfrom
YiteGu:add-watcher-timeout-discover

Conversation

@YiteGu
Copy link
Member

@YiteGu YiteGu commented Apr 18, 2024

For example, rbd send a watch ping to the header object every 5 seconds to keep watch, if the primary OSD is unable to receive the watch ping of the header object due to rbd network interruption, this means that rbd's I/O has already been hang. This way, we can quickly detect disconnection rbds on the osd, and reflected in metrics.

Sign-off-by: Yite Gu yitegu0@gmail.com

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@YiteGu YiteGu requested a review from a team as a code owner April 18, 2024 08:48
@github-actions github-actions bot added the core label Apr 18, 2024
@YiteGu YiteGu force-pushed the add-watcher-timeout-discover branch from 5677c7e to 78c4397 Compare April 18, 2024 09:49
@YiteGu YiteGu force-pushed the add-watcher-timeout-discover branch from 78c4397 to aed21fa Compare April 19, 2024 03:22
@YiteGu
Copy link
Member Author

YiteGu commented Apr 19, 2024

jenkins test api

@YiteGu YiteGu requested a review from idryomov April 19, 2024 09:21
Copy link
Contributor

@idryomov idryomov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but it should be reviewed by @ceph/core as well.

For example, rbd send a watch ping to the header object
every 5 seconds to keep watch, if the primary OSD is unable
to receive the watch ping of the header object due to rbd
network interruption, this means that rbd's I/O has already been
hang. This way, we can quickly detect disconnection rbds on the osd,
and reflected in metrics.

Signed-off-by: Yite Gu <yitegu0@gmail.com>
@YiteGu YiteGu force-pushed the add-watcher-timeout-discover branch from aed21fa to 34b086e Compare April 19, 2024 09:33
@YiteGu
Copy link
Member Author

YiteGu commented Apr 22, 2024

jenkins test make check

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Jun 21, 2024
@idryomov
Copy link
Contributor

@rzarzynski Ping -- this should be trivial to review.

@idryomov
Copy link
Contributor

jenkins test windows

@idryomov idryomov added feature and removed stale labels Jun 21, 2024
@ljflores
Copy link
Member

@yuriw yuriw merged commit 3c8ccf6 into ceph:main Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants