Skip to content

qa: mds Enable multimds killpoint tests#41969

Closed
lxbsz wants to merge 3 commits intoceph:mainfrom
lxbsz:import_export_killpoint
Closed

qa: mds Enable multimds killpoint tests#41969
lxbsz wants to merge 3 commits intoceph:mainfrom
lxbsz:import_export_killpoint

Conversation

@lxbsz
Copy link
Member

@lxbsz lxbsz commented Jun 22, 2021

This version has been impoved a lot, including hadfailover_rank(),
to make it have the same logic with hadfailover().

And also keeps retry by sleeping 5 seconds every time instead of
hard code waiting 150 seconds to speed up the test. And also some
others small fixings.

Fixes: http://tracker.ceph.com/issues/17835
Signed-off-by: Sidharth Anupkrishnan sanupkri@redhat.com
Signed-off-by: Xiubo Li xiubli@redhat.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@github-actions github-actions bot added cephfs Ceph File System tests labels Jun 22, 2021
@lxbsz lxbsz requested review from a team and batrick June 22, 2021 09:10
@lxbsz
Copy link
Member Author

lxbsz commented Jun 22, 2021

This is based @sidharthanup's previous PR #28004 and have some improvements about it.

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lxbsz please run through QA to verify it works when you're done adjusting the code.

Thanks a lot for picking this up.

#all matching
return False

def hadfailover_rank(self, fscid, status, rank):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does an implicit ceph fs dump in Filesystem.get_rank.

I think a better place for this is in FSStatus with this signature:

def had_failover_rank(self, fscid, rank, status2):

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Will update it.


try:
# This should kill either or both MDS process
self.mount_a.setfattr("abc", "ceph.dir.pin", "1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's preventing the balancer from doing part of this export before reaching this point? (I think we need to pin the directory to rank 0 before populating it.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Yeah, actually we cannot be sure the abc dir is already pinned or auth in rank 0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lxbsz please run through QA to verify it works when you're done adjusting the code.

Sure, will do that.

@lxbsz lxbsz force-pushed the import_export_killpoint branch from ed2ed9f to 3273fec Compare June 28, 2021 06:33
@lxbsz
Copy link
Member Author

lxbsz commented Jun 28, 2021

jenkins test make check arm64

@lxbsz
Copy link
Member Author

lxbsz commented Jun 30, 2021

Today I test it more and found several bugs in MDS migrate code, I need more time to debug and fix it in late future.

@lxbsz lxbsz force-pushed the import_export_killpoint branch from 3273fec to 17618a9 Compare July 28, 2021 12:02
@lxbsz
Copy link
Member Author

lxbsz commented Jul 29, 2021

jenkins test make check arm64

@lxbsz lxbsz force-pushed the import_export_killpoint branch 3 times, most recently from 462cbe0 to 64e7eb2 Compare August 5, 2021 08:52
@lxbsz lxbsz requested a review from a team as a code owner August 5, 2021 08:52
@lxbsz lxbsz force-pushed the import_export_killpoint branch 2 times, most recently from 1a6fafc to fe92d5c Compare August 5, 2021 11:07
Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure mds: set session state to _CLOSED when replaying the EImportStart is correct. The MDSMap contains an export_targets set to get clients to connect to potential importers of subtrees.

@lxbsz lxbsz force-pushed the import_export_killpoint branch from fe92d5c to b5cff8b Compare August 6, 2021 05:00
@lxbsz
Copy link
Member Author

lxbsz commented Aug 6, 2021

I'm not sure mds: set session state to _CLOSED when replaying the EImportStart is correct. The MDSMap contains an export_targets set to get clients to connect to potential importers of subtrees.

Since when writing journal the session was in _CLOSED state, after that in some killpoints the session maybe still in _CLOSED state, in some later it will be _OPENED, so just set it to _OPENED is not correct IMO. When replaying it just set it to the initial state, which is the same with when the journal is flushed.

I tried that weeks ago, it didn't work for me. If the above won't work for your I will try it again next week.

@lxbsz lxbsz force-pushed the import_export_killpoint branch 3 times, most recently from 24c9e52 to 97152c3 Compare August 9, 2021 01:22
@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@lxbsz lxbsz force-pushed the import_export_killpoint branch from 580e292 to d274277 Compare October 25, 2021 02:22
@lxbsz lxbsz requested a review from vshankar October 25, 2021 02:23
@lxbsz
Copy link
Member Author

lxbsz commented Oct 25, 2021

Rebased to the upstream and remove the first 3 commits, which have been merged in another PR.

@lxbsz
Copy link
Member Author

lxbsz commented Nov 17, 2021

jenkins retest this please

When the importer mds crashes just after the EImportStart journal
was flushed, the standby mds will replay it later, and when replaying
the EImportStart the standby mds will wait the client to reconnect,
but actually the client may not open the session yet.

So we need to make sure the export_targets to mdsmap is updated
just before the EImportStart log is flushed, then in the Client
side we can use this info to reconnect the export target mds.

And when the exporter mds crashes and is replaced by a standby mds
the export_targets in the mdsmap will be cleaned, so we need to
record it by adding EExportStart logevent.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
This version has been impoved a lot, including hadfailover_rank(),
to make it have the same logic with hadfailover().

And also keeps retry by sleeping 5 seconds every time instead of
hard code waiting 150 seconds to speed up the test. And also some
others small fixings.

Fixes: http://tracker.ceph.com/issues/17835
Signed-off-by: Sidharth Anupkrishnan <sanupkri@redhat.com>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
@lxbsz lxbsz force-pushed the import_export_killpoint branch from d274277 to 7f1aa5f Compare November 18, 2021 05:11
@tchaikov tchaikov removed the crimson label Dec 22, 2021
@tchaikov tchaikov removed the request for review from a team December 22, 2021 23:59
@djgalloway djgalloway changed the base branch from master to main July 3, 2022 00:00
@github-actions
Copy link

github-actions bot commented Sep 1, 2022

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Sep 1, 2022
@github-actions
Copy link

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@github-actions github-actions bot closed this Nov 28, 2022
@lxbsz lxbsz removed the stale label Jan 17, 2023
@lxbsz lxbsz reopened this Jan 17, 2023
@lxbsz lxbsz requested a review from a team January 17, 2023 02:43
@lxbsz
Copy link
Member Author

lxbsz commented Jan 17, 2023

@vshankar I think we still need this to test the exporting.

@vshankar
Copy link
Contributor

@vshankar I think we still need this to test the exporting.

I was not tracking this and I haven't gone through the changes. Any work that is pending?

@lxbsz
Copy link
Member Author

lxbsz commented Jan 17, 2023

@vshankar I think we still need this to test the exporting.

I was not tracking this and I haven't gone through the changes. Any work that is pending?

No work is pending.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Mar 26, 2023
@github-actions
Copy link

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@github-actions github-actions bot closed this Apr 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build/ops cephfs Ceph File System common rbd rgw stale tests wip-vshankar-backlog Backlog of CephFS PRs to pick for testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants