Skip to content

squid: mgr/vol: show progress and stats for the subvolume snapshot clones#61415

Draft
rishabh-d-dave wants to merge 11 commits intoceph:squidfrom
rishabh-d-dave:wip-69561-squid
Draft

squid: mgr/vol: show progress and stats for the subvolume snapshot clones#61415
rishabh-d-dave wants to merge 11 commits intoceph:squidfrom
rishabh-d-dave:wip-69561-squid

Conversation

@rishabh-d-dave
Copy link
Contributor

@rishabh-d-dave rishabh-d-dave commented Jan 16, 2025

backport tracker: https://tracker.ceph.com/issues/69561, https://tracker.ceph.com/issues/69562

backport of #54620, #59712
parent tracker: https://tracker.ceph.com/issues/61904, https://tracker.ceph.com/issues/67988

this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh


updated using ceph-backport.sh version 16.0.0.6848

@kotreshhr
Copy link
Contributor

This PR is under test in https://tracker.ceph.com/issues/71055.

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

List volumes returns a list of of 1-member dictionaries where each
member is "{'name': <volname>}". This format is useful printing output
of the command "ceph fs volume ls" but for internal use this format is
hinderance since before using it, this list needs to be resolved  from
list of dictionaries to list of strings.

Return a list of strings and move the code for converting it to list of
dictionaries to method "list_fs_volumes()" (which is method run by "ceph
fs volume ls" command).

This triggers change in async_job.py where we need volnames. The code
here converts list of dicts to list of string. This needs to removed.

Second, don't create unnecessary temporary variable "fs_map" in
"list_volumes()" since it is not required. To get fs names we can
directly iterate.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 9f355b6)
Output of command "ceph fs clone status" will now show the progress made
by that specific clone. The output will show cloning completed in terms
of percentage and amount of bytes that have been cloned and number of
files that have been cloned.

Fixes: https://tracker.ceph.com/issues/61904
Signed-off-by: Rishabh Dave <ridave@redhat.com>

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit d7bc828)

Conflicts:
src/pybind/mgr/volumes/fs/volume.py
- minor conflict: line where open_subvol_in_group is imported is
  slightly different in squid compared to main.
Print a progress bar for ongoing clone job in output of "ceph status".

When multiple clones are ongoing, show 1 progress bar in output of
"ceph status" shows average of progress made by each clone.

When number of clone job is more than number of clone threads, print 2
progress bars in output of "ceph status"  command; one for ongoing clone
jobs and other for ongoing+pending clone jobs.

Fixes: https://tracker.ceph.com/issues/61904
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 65b789e)
1. Let caller check for multiple states. It might happen that clone
   finishes while it is being cancelled, in such cases user might want
    to check for both.
2. Add a helper method to check if clone is in pending state and add a
   separate method to check if clone is in cancelled state.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 10949bf)
Add a helper method that accepts command arguments (along with rest of
paramters accepted by the method run_shell()) and return the stdout of
the command.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 9f60848)
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit db0e736)
TestVolumesHelper._do_subvolume_io() is a helper method that allows
users to generate data for testing. mgr/vol code that reports progress
made by clone jobs depends on the value set for xattr rbytes. It takes
a bit of a time for rbytes to be set.

And, therefore, all tests in TestCloneProgressReporter needs to wait for
subvolume's rbytes xattr's value to be set to the actual amount of data
present on the subvolume before proceeding to actually testing.

So that this can be achieved make _do_subvolume_io() return size of the
data it has generated.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 92aecab)
Clone progress is shown to user through "ceph fs clone status" output
and through "ceph status" output. Test both these features.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit e0c85b8)
Update docs and add release notes about the progress report that is
printed in output of "ceph fs clone status" command and progress bars
that is/are printed in output of "ceph status" command.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 645cc6e)
Test name is test_subvolume_snapshot_info_if_clone_pending_for_no_group,
located in class TestSubvolumeSnapshotClones in test_volumes.py

5 seconds can (sometimes) be insufficient as value of the config option
"snapshot_clone_delay" in this. Increase it to avoid unnecessary race
conditions which leads to irrelevant failures.

Following is an example where 5 seconds was insufficient as waiting
period since instead it took 8 seconds -

2024-07-28T18:16:10.088 DEBUG:teuthology.orchestra.run.smithi064:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph config set mgr mgr/volumes/snapshot_clone_no_wait False
...
2024-07-28T18:16:18.694 DEBUG:teuthology.orchestra.run.smithi064:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph fs subvolume snapshot info cephfs subvol79370 subvol_snap40980

This issue was seen during testing of PR to which this commit belongs.

This commit has been separated from the commit that adds tests for clone
progress reporting so that it's easy to document need for this code
patch and also track it.

This commit is not being moved to a different PR and been kept on the
same PR since it can't be reproduced otherwise. This also ensures that
commit is backported to older release along with code that caused this
issue, causing no one to need to find this commit while backporting
effort.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit a6b95a5)
snapshot is retained despite of deletion (using --retain-snapshots
option of "subvolume rm" command).

Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 9e34499)
@rishabh-d-dave
Copy link
Contributor Author

jenkins test api

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rebase

@batrick batrick marked this pull request as draft March 18, 2026 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants