Conversation
|
This is based @sidharthanup's previous PR #28004 and have some improvements about it. |
qa/tasks/cephfs/filesystem.py
Outdated
| #all matching | ||
| return False | ||
|
|
||
| def hadfailover_rank(self, fscid, status, rank): |
There was a problem hiding this comment.
This does an implicit ceph fs dump in Filesystem.get_rank.
I think a better place for this is in FSStatus with this signature:
def had_failover_rank(self, fscid, rank, status2):
There was a problem hiding this comment.
Looks good to me. Will update it.
qa/tasks/cephfs/test_exports.py
Outdated
|
|
||
| try: | ||
| # This should kill either or both MDS process | ||
| self.mount_a.setfattr("abc", "ceph.dir.pin", "1") |
There was a problem hiding this comment.
What's preventing the balancer from doing part of this export before reaching this point? (I think we need to pin the directory to rank 0 before populating it.)
There was a problem hiding this comment.
Good point. Yeah, actually we cannot be sure the abc dir is already pinned or auth in rank 0.
There was a problem hiding this comment.
@lxbsz please run through QA to verify it works when you're done adjusting the code.
Sure, will do that.
ed2ed9f to
3273fec
Compare
|
jenkins test make check arm64 |
|
Today I test it more and found several bugs in MDS migrate code, I need more time to debug and fix it in late future. |
3273fec to
17618a9
Compare
|
jenkins test make check arm64 |
462cbe0 to
64e7eb2
Compare
1a6fafc to
fe92d5c
Compare
batrick
left a comment
There was a problem hiding this comment.
I'm not sure mds: set session state to _CLOSED when replaying the EImportStart is correct. The MDSMap contains an export_targets set to get clients to connect to potential importers of subtrees.
fe92d5c to
b5cff8b
Compare
Since when writing journal the session was in _CLOSED state, after that in some killpoints the session maybe still in _CLOSED state, in some later it will be _OPENED, so just set it to _OPENED is not correct IMO. When replaying it just set it to the initial state, which is the same with when the journal is flushed. I tried that weeks ago, it didn't work for me. If the above won't work for your I will try it again next week. |
24c9e52 to
97152c3
Compare
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
580e292 to
d274277
Compare
|
Rebased to the upstream and remove the first 3 commits, which have been merged in another PR. |
|
jenkins retest this please |
When the importer mds crashes just after the EImportStart journal was flushed, the standby mds will replay it later, and when replaying the EImportStart the standby mds will wait the client to reconnect, but actually the client may not open the session yet. So we need to make sure the export_targets to mdsmap is updated just before the EImportStart log is flushed, then in the Client side we can use this info to reconnect the export target mds. And when the exporter mds crashes and is replaced by a standby mds the export_targets in the mdsmap will be cleaned, so we need to record it by adding EExportStart logevent. Signed-off-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
This version has been impoved a lot, including hadfailover_rank(), to make it have the same logic with hadfailover(). And also keeps retry by sleeping 5 seconds every time instead of hard code waiting 150 seconds to speed up the test. And also some others small fixings. Fixes: http://tracker.ceph.com/issues/17835 Signed-off-by: Sidharth Anupkrishnan <sanupkri@redhat.com> Signed-off-by: Xiubo Li <xiubli@redhat.com>
d274277 to
7f1aa5f
Compare
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |
|
@vshankar I think we still need this to test the exporting. |
I was not tracking this and I haven't gone through the changes. Any work that is pending? |
No work is pending. |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |
This version has been impoved a lot, including hadfailover_rank(),
to make it have the same logic with hadfailover().
And also keeps retry by sleeping 5 seconds every time instead of
hard code waiting 150 seconds to speed up the test. And also some
others small fixings.
Fixes: http://tracker.ceph.com/issues/17835
Signed-off-by: Sidharth Anupkrishnan sanupkri@redhat.com
Signed-off-by: Xiubo Li xiubli@redhat.com
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox