Skip to content

qa: enable MDS export killpoint tests#28004

Closed
sidharthanup wants to merge 1 commit intoceph:masterfrom
sidharthanup:wip-multimdss-killpoint-test
Closed

qa: enable MDS export killpoint tests#28004
sidharthanup wants to merge 1 commit intoceph:masterfrom
sidharthanup:wip-multimdss-killpoint-test

Conversation

@sidharthanup
Copy link
Contributor

Export Path Killpoint test for multimds recovery
Fixes: http://tracker.ceph.com/issues/17835
Signed-off-by: Sidharth Anupkrishnan sanupkri@redhat.com

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

@sidharthanup sidharthanup added the cephfs Ceph File System label May 7, 2019
@sidharthanup sidharthanup force-pushed the wip-multimdss-killpoint-test branch 2 times, most recently from 6893b36 to bcc8b22 Compare May 7, 2019 13:28
@batrick batrick self-assigned this Jun 14, 2019
@batrick batrick self-requested a review June 14, 2019 19:38
@stale
Copy link

stale bot commented Sep 7, 2019

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added the stale label Sep 7, 2019
@batrick batrick removed the stale label Sep 14, 2019
@batrick batrick assigned sidharthanup and unassigned batrick Sep 14, 2019
@batrick
Copy link
Member

batrick commented Sep 14, 2019

ping

@sidharthanup
Copy link
Contributor Author

sidharthanup commented Sep 26, 2019

@batrick Sorry for the delay. Was caught up in the export ephemeral pin work . Will update next week.

@stale
Copy link

stale bot commented Nov 25, 2019

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added the stale label Nov 25, 2019
@batrick
Copy link
Member

batrick commented Dec 5, 2019

ping :)

@stale stale bot removed the stale label Dec 5, 2019
@sidharthanup sidharthanup force-pushed the wip-multimdss-killpoint-test branch 2 times, most recently from 0f91783 to db57b4b Compare December 16, 2019 18:29
@batrick
Copy link
Member

batrick commented Dec 17, 2019

@sidharthanup sidharthanup force-pushed the wip-multimdss-killpoint-test branch 2 times, most recently from a0045aa to bd66a09 Compare December 17, 2019 11:44
@sidharthanup
Copy link
Contributor Author

sidharthanup commented Dec 17, 2019

@sidharthanup sidharthanup force-pushed the wip-multimdss-killpoint-test branch 2 times, most recently from 2ff25a8 to 2d0d381 Compare December 17, 2019 14:01
@batrick batrick changed the title qa: mds Enable multimds killpoint tests qa: enable MDS export killpoint tests Dec 17, 2019
@sidharthanup sidharthanup force-pushed the wip-multimdss-killpoint-test branch from 2d0d381 to ef0229a Compare December 18, 2019 21:32
@sidharthanup sidharthanup force-pushed the wip-multimdss-killpoint-test branch from ef0229a to c3e055a Compare December 27, 2019 21:22
@batrick batrick added this to the octopus milestone Jan 24, 2020
@sidharthanup sidharthanup force-pushed the wip-multimdss-killpoint-test branch from 8a62f5e to 4ee87ab Compare April 29, 2020 11:31
@sidharthanup sidharthanup force-pushed the wip-multimdss-killpoint-test branch 5 times, most recently from 980888f to 890feb1 Compare May 14, 2020 08:57
@batrick
Copy link
Member

batrick commented Jul 8, 2020

Please rebase and run through teuthology (--filter test_exports --suite multimds).

@sidharthanup sidharthanup force-pushed the wip-multimdss-killpoint-test branch from 890feb1 to 422e3f5 Compare July 8, 2020 11:12
Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flake8 run-test: commands[0] | flake8 --select=F,E9 --exclude=venv,.tox
./tasks/cephfs/filesystem.py:858:72: F821 undefined name 'cmp'
./tasks/cephfs/test_exports.py:526:9: F841 local variable 'all_daemons' is assigned to but never used
./tasks/cephfs/test_exports.py:551:56: F821 undefined name 'org_files'
./tasks/cephfs/test_exports.py:552:56: F821 undefined name 'out'

Signed-off-by: Sidharth Anupkrishnan <sanupkri@redhat.com>
@sidharthanup sidharthanup force-pushed the wip-multimdss-killpoint-test branch from 422e3f5 to 0c751c2 Compare July 9, 2020 12:12
@sidharthanup
Copy link
Contributor Author

https://pulpito.ceph.com/sidharthanup-2020-07-09_12:17:09-multimds-octopus-distro-basic-smithi/ - seems like its failing for (import, export) killpoints (7, 10) and (9, 13). Its stuck during verify_data() waiting on ls here: https://github.com/ceph/ceph/pull/28004/files#diff-d5f17ebd745250b57be2b89d4ba48efbR545 when called here: https://github.com/ceph/ceph/pull/28004/files#diff-d5f17ebd745250b57be2b89d4ba48efbR609 . Its passing for most other pairs of killpoints. I've scheduled a run just for (7,10) - https://pulpito.ceph.com/sidharthanup-2020-07-09_20:05:33-multimds-octopus-distro-basic-smithi/ . Let me confirm if its the same behaviour.

@sidharthanup
Copy link
Contributor Author

@batrick Import killpoint = 7 will cause test failure. The reason for this is that before this killpoint (https://github.com/sidharthanup/ceph/blob/wip-multimdss-killpoint-test/src/mds/Migrator.cc#L3026) is hit, there is a prepare_force_open_sessions() method being called(https://github.com/sidharthanup/ceph/blob/wip-multimdss-killpoint-test/src/mds/Migrator.cc#L2699) in handle_export_dir() and this call marks a dirty open session which later gets persisted as
part of the journal event EImportStart. Now during journal replay of the new MDS, this information is relayed to the new MDS and the new MDS thinks that there is an open session with the client whereas in reality, that session was closed. Now during up:reconnect, it tries to reconnect with the client and gets no response and ends up blacklisting the client. Don't you think this is a bug?

@batrick
Copy link
Member

batrick commented Jul 13, 2020

jenkins test make check

@batrick
Copy link
Member

batrick commented Jul 13, 2020

@batrick Import killpoint = 7 will cause test failure. The reason for this is that before this killpoint (https://github.com/sidharthanup/ceph/blob/wip-multimdss-killpoint-test/src/mds/Migrator.cc#L3026) is hit, there is a prepare_force_open_sessions() method being called(https://github.com/sidharthanup/ceph/blob/wip-multimdss-killpoint-test/src/mds/Migrator.cc#L2699) in handle_export_dir() and this call marks a dirty open session which later gets persisted as
part of the journal event EImportStart. Now during journal replay of the new MDS, this information is relayed to the new MDS and the new MDS thinks that there is an open session with the client whereas in reality, that session was closed. Now during up:reconnect, it tries to reconnect with the client and gets no response and ends up blacklisting the client. Don't you think this is a bug?

Yes, this is a genuine bug. Open a tracker ticket. It'd great your tests found a new bug!

@sidharthanup
Copy link
Contributor Author

@batrick Import killpoint = 7 will cause test failure. The reason for this is that before this killpoint (https://github.com/sidharthanup/ceph/blob/wip-multimdss-killpoint-test/src/mds/Migrator.cc#L3026) is hit, there is a prepare_force_open_sessions() method being called(https://github.com/sidharthanup/ceph/blob/wip-multimdss-killpoint-test/src/mds/Migrator.cc#L2699) in handle_export_dir() and this call marks a dirty open session which later gets persisted as
part of the journal event EImportStart. Now during journal replay of the new MDS, this information is relayed to the new MDS and the new MDS thinks that there is an open session with the client whereas in reality, that session was closed. Now during up:reconnect, it tries to reconnect with the client and gets no response and ends up blacklisting the client. Don't you think this is a bug?

Yes, this is a genuine bug. Open a tracker ticket. It'd great your tests found a new bug!

Ack. Yea, last week I was wondering whether it's something wrong with my tests or not. I'ts nice to know it caught undesirable behavior!

@batrick
Copy link
Member

batrick commented Jul 17, 2020

@batrick
Copy link
Member

batrick commented Jul 17, 2020

jenkins test dashboard backend

@sidharthanup
Copy link
Contributor Author

@batrick It's the same issue with killpoint 9. Client hasn't started the connection with the mds yet so when it goes down, client gets blacklisted on replay on the new MDS. Should be fixed with the patch that I'm working on.

@stale
Copy link

stale bot commented Sep 17, 2020

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added the stale label Sep 17, 2020
@batrick batrick removed the stale label Sep 17, 2020
@stale
Copy link

stale bot commented Nov 22, 2020

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@batrick
Copy link
Member

batrick commented Apr 26, 2021

Blocked on #36227

@batrick
Copy link
Member

batrick commented Jun 25, 2021

Superseded by #41969

@batrick batrick closed this Jun 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cephfs Ceph File System

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants