cephfs_mirror: avoid latest changes on the source fs to enable mirroring#61539
cephfs_mirror: avoid latest changes on the source fs to enable mirroring#61539
Conversation
|
jenkins test api |
2 similar comments
|
jenkins test api |
|
jenkins test api |
61492b3 to
f0b52c3
Compare
|
QA test case in progress. |
f0b52c3 to
fbd1e7b
Compare
fbd1e7b to
767b9f0
Compare
|
jenkins test make check |
|
jenkins test windows |
|
jenkins test make check |
767b9f0 to
11be04e
Compare
|
Putting DNM, as this needs more fix for purged_snapshot failure. |
11be04e to
09ef1b4
Compare
|
Passes the entire fs:mirror and fs:valgrind tests: |
|
@vshankar |
09ef1b4 to
f36485f
Compare
|
The new qa test code updated and @vshankar This PR is ready for review. |
Thanks. I will have a look and put it to test. |
| CEPH_STATX_MODE | CEPH_STATX_UID | CEPH_STATX_GID | | ||
| CEPH_STATX_SIZE | CEPH_STATX_ATIME | CEPH_STATX_MTIME, | ||
| AT_STATX_DONT_SYNC | AT_SYMLINK_NOFOLLOW); | ||
| npath = entry_path(epath, nname); |
There was a problem hiding this comment.
@joscollin Could you share a sample log snippet involving these paths without and with the fix?
There was a problem hiding this comment.
I don't have it handy. I need to reproduce it with old code then.
There was a problem hiding this comment.
@joscollin - if possible please share this, otherwise I will check it when I pull in this change to test the file block diff integration with the mirror daemon.
There was a problem hiding this comment.
@vshankar Let me see, if I could reproduce it. I don't have a build now. So please don't wait for me. If you are testing block diff, please pull this PR.
| break | ||
| self.disable_mirroring(self.primary_fs_name, self.primary_fs_id) | ||
|
|
||
| def test_cephfs_mirror_sync_already_existing_snapshots(self): |
There was a problem hiding this comment.
Isn't this a redundant test? I believe there is at least one test that checks if snapshots are synchronized as expected.
There was a problem hiding this comment.
No, this should be the only test that does add_directory after snapshot-ting. Is there any function does that already?
There was a problem hiding this comment.
But, how is this test relevant to this fix?
There was a problem hiding this comment.
@vshankar
This test checks if the changes are taken from the previous snapshot or not, while adding a directory.
There are multiple things gets fixed by this PR.
- Incremental sync after adding a directory for mirroring -> We have test for this.
- Incremental sync of the already existing snapshots while adding a directory for mirroring -> We don't have test for this.
- switch to remote dir_root when local prev snapshot is unavailable -> We have test for this.
Hope that clarifies.
There was a problem hiding this comment.
Isn't (2) pretty much covered with (1)? Do we really need a separate test for it?
There was a problem hiding this comment.
@vshankar I think it's better to have a separate test for it (add_directory after snapshot-ting), as the same thing is reported here: #54633 (comment).
| } | ||
|
|
||
| // Determines if the source is the local (previous) snapshot or the remote dir_root | ||
| int PeerReplayer::validate_source(const std::string &dir_root, |
There was a problem hiding this comment.
Sorry - I don't understand why is this required. PeerReplayer would use local snapshot comparison (and hence local mount handles) when it's sure that the data on the remote file system (live) exactly matches the last snapshot transferred. This is determined using an xattr (from the remote file system directory).
There was a problem hiding this comment.
Right. But this change is made to resolve the test failure https://tracker.ceph.com/issues/69671. This happens only when the source snapshot is deleted during an incremental sync. Then the incremental sync has to check and make the switch to remote dir_root as per test_cephfs_mirror_sync_with_purged_snapshot.
Previously, before snapdiff, this was taken care of as everything was a full sync. But with incremental sync, as per the test (test_cephfs_mirror_sync_with_purged_snapshot), the source has to be validated each time. Hope this clarifies.
There was a problem hiding this comment.
So, the bug was that the mirror daemon was (incorrectly) using incremental sync with remote directory snapshot comparison?
There was a problem hiding this comment.
@vshankar
There's no issue with incremental sync with remote directory snapshot comparison. But the snapdiff code (the new do_synchronize function) wasn't making the switch to the remote dir_root. There was no such mechanism in the new do_synchronize. I think I've missed it back then. That's the reason for this failure https://tracker.ceph.com/issues/69671. With this PR, the switch happens, as per the original design.
There was a problem hiding this comment.
OK. So, if a snapshot is deleted when it was used for incremental sync, the mirror daemon code was still incorrectly using incremental sync in the next iteration, yes? If that's the case, then can you put this up in the commit message please?
There was a problem hiding this comment.
Yes it was.
Updated the commit message @vshankar.
There was a problem hiding this comment.
@vshankar There's no issue with
incremental sync with remote directory snapshot comparison. But the snapdiff code (the new do_synchronize function) wasn't making the switch to the remotedir_root. There was no such mechanism in the new do_synchronize. I think I've missed it back then. That's the reason for this failure https://tracker.ceph.com/issues/69671. With this PR, the switch happens, as per the original design.
Sorry for the wrong explanation. I didn't miss to implement the switching. This is a regression caused by 7a747bc. That's the reason test_cephfs_mirror_sync_with_purged_snapshot fails now. It was passing earlier, but now it fails.
There was a problem hiding this comment.
Updated the commit message with the details @vshankar
|
@vshankar ping |
|
@vshankar ping :) |
|
@joscollin a suggestion - please do not mark a comment as resolved until the participants in a comment have agreed to either incorporate the fix or leave the change as status-quo. |
@vshankar I was following what you've suggested earlier. You wanted it to mark the comment resolved after replying. Anyway, I'll stop clicking resolve button. |
That's not what I meant - If there is agreement on the comment then its fine to mark the comment as resolved. |
f36485f to
e7aff00
Compare
This avoids considering latest changes from the source filesystem for the mirroring of already existing snapshots. Thus the destination filesystem and snapshots would be created based only on the source snapshots. The destination fs would be a replica of the last snapshot taken. Fixes: https://tracker.ceph.com/issues/68567 Signed-off-by: Jos Collin <jcollin@redhat.com>
…napshot is unavailable Fixes the incremental sync not switching the source to remote dir_root, when the local previous snapshot is deleted. This is a regression caused by ceph@7a747bc. Fixes: https://tracker.ceph.com/issues/69671 Signed-off-by: Jos Collin <jcollin@redhat.com>
Fixes: https://tracker.ceph.com/issues/68567 Signed-off-by: Jos Collin <jcollin@redhat.com>
e7aff00 to
956642d
Compare
|
jenkins test make check arm64 |
|
Relevant commits pulled into PR #62250 |
This avoids considering latest changes from the source filesystem for
the mirroring of already existing snapshots. Thus the destination filesystem
and snapshots would be created based only on the source snapshots.
The destination fs would be a replica of the last snapshot taken.
To check this, do:
Make some changes and take a snapshot:
Make some more changes & take another snapshot:
Make a change and don't take snapshot:
$ rm /mnt/cephfs1/dir/d1/file2Add
dirfor mirroring:$
./bin/ceph fs snapshot mirror add a /dirResults:
Fixes: https://tracker.ceph.com/issues/68567
Fixes: https://tracker.ceph.com/issues/69671
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e