squid: qa/cephfs: backport mgr/vol improvements for smooth backports in future by rishabh-d-dave · Pull Request #61416 · ceph/ceph

rishabh-d-dave · 2025-01-16T14:10:25Z

Fixes: https://tracker.ceph.com/issues/69560

Depends on #61415

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e

Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 8c536f7)

List volumes returns a list of of 1-member dictionaries where each member is "{'name': <volname>}". This format is useful printing output of the command "ceph fs volume ls" but for internal use this format is hinderance since before using it, this list needs to be resolved from list of dictionaries to list of strings. Return a list of strings and move the code for converting it to list of dictionaries to method "list_fs_volumes()" (which is method run by "ceph fs volume ls" command). This triggers change in async_job.py where we need volnames. The code here converts list of dicts to list of string. This needs to removed. Second, don't create unnecessary temporary variable "fs_map" in "list_volumes()" since it is not required. To get fs names we can directly iterate. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 9f355b6)

Output of command "ceph fs clone status" will now show the progress made by that specific clone. The output will show cloning completed in terms of percentage and amount of bytes that have been cloned and number of files that have been cloned. Fixes: https://tracker.ceph.com/issues/61904 Signed-off-by: Rishabh Dave <ridave@redhat.com> Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit d7bc828)

Print a progress bar for ongoing clone job in output of "ceph status". When multiple clones are ongoing, show 1 progress bar in output of "ceph status" shows average of progress made by each clone. When number of clone job is more than number of clone threads, print 2 progress bars in output of "ceph status" command; one for ongoing clone jobs and other for ongoing+pending clone jobs. Fixes: https://tracker.ceph.com/issues/61904 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 65b789e)

1. Let caller check for multiple states. It might happen that clone finishes while it is being cancelled, in such cases user might want to check for both. 2. Add a helper method to check if clone is in pending state and add a separate method to check if clone is in cancelled state. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 10949bf)

Add a helper method that accepts command arguments (along with rest of paramters accepted by the method run_shell()) and return the stdout of the command. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 9f60848)

Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit db0e736)

TestVolumesHelper._do_subvolume_io() is a helper method that allows users to generate data for testing. mgr/vol code that reports progress made by clone jobs depends on the value set for xattr rbytes. It takes a bit of a time for rbytes to be set. And, therefore, all tests in TestCloneProgressReporter needs to wait for subvolume's rbytes xattr's value to be set to the actual amount of data present on the subvolume before proceeding to actually testing. So that this can be achieved make _do_subvolume_io() return size of the data it has generated. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 92aecab)

Clone progress is shown to user through "ceph fs clone status" output and through "ceph status" output. Test both these features. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit e0c85b8)

Update docs and add release notes about the progress report that is printed in output of "ceph fs clone status" command and progress bars that is/are printed in output of "ceph status" command. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 645cc6e)

Test name is test_subvolume_snapshot_info_if_clone_pending_for_no_group, located in class TestSubvolumeSnapshotClones in test_volumes.py 5 seconds can (sometimes) be insufficient as value of the config option "snapshot_clone_delay" in this. Increase it to avoid unnecessary race conditions which leads to irrelevant failures. Following is an example where 5 seconds was insufficient as waiting period since instead it took 8 seconds - 2024-07-28T18:16:10.088 DEBUG:teuthology.orchestra.run.smithi064:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph config set mgr mgr/volumes/snapshot_clone_no_wait False ... 2024-07-28T18:16:18.694 DEBUG:teuthology.orchestra.run.smithi064:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph fs subvolume snapshot info cephfs subvol79370 subvol_snap40980 This issue was seen during testing of PR to which this commit belongs. This commit has been separated from the commit that adds tests for clone progress reporting so that it's easy to document need for this code patch and also track it. This commit is not being moved to a different PR and been kept on the same PR since it can't be reproduced otherwise. This also ensures that commit is backported to older release along with code that caused this issue, causing no one to need to find this commit while backporting effort. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit a6b95a5)

snapshot is retained despite of deletion (using --retain-snapshots option of "subvolume rm" command). Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 9e34499)

Currently timeout is set to 5. But hardcoding this is unnecessary since the class already defines a attribute for this purpose. Use that instead. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit ccd5878)

Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 7239588)

Also add log messages for in these helper methods to allow tracking when and why more threads were spawned. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit a982495)

In class RTimer in mgr_util.py, "self.finished.set()" is run even though the event self.finished was set just now. If it wasn't set, the while loop the precedes it would've never finished running. Therefore, remove this redundant line of code. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 21cf769)

When an exception occurs in class RTimer of mgr_util.py, only the exception message is logged but not only this is insufficient for debugging but also it is hard to spot in logs. This should not be the case, especially for an occurring exception. Therefore, add code to log traceback and exception name as well along with exception's message. Log entry before this patch - 2024-09-27T00:22:38.656+0530 7f05c7e006c0 0 [volumes ERROR mgr_util] task exception: dummy exception for testing Log entry with this patch - 2024-09-27T00:40:26.509+0530 7f61d64006c0 0 [volumes ERROR mgr_util] exception encountered in RTimer instance "<RTimer(Thread-7, started daemon 140058183075520)>": Traceback (most recent call last): File "/home/rishabh/repos/ceph/minor3/src/pybind/mgr/mgr_util.py", line 91, in run self.function(*self.args, **self.kwargs) File "/home/rishabh/repos/ceph/minor3/src/pybind/mgr/volumes/fs/stats_util.py", line 232, in _update_progress_bars raise RuntimeError('dummy exception for testing') RuntimeError: dummy exception for testing Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit b96d714)

Orignally, when the feature was in development, IDs for clone progress bars were set to randomly generated UUID strings. But, eventually, it was decided to assign fixed strings to them. Unlike UUIDs, these strings stay the same even when progress bars are destroyed and re-created. Therefore, instead of re-assigning the same strings every time initiate_reporting() is called, move them to __init__() so that both the IDs are defined only once. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 3482ebc)

It is a good practice and a common convention to call base class's __init__() method at the beginning of derived class's __init__(). It ensures proper initialization of base class's attributes and methods. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit ad9ff38)

Add some comment to explains the purpose and details of data structures that are used for queuing of async jobs since right now it's not obvious from initialization. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 8f7c5f0)

Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit ce5d62d)

rishabh-d-dave · 2025-04-23T05:30:54Z

jenkins test pull_request_target

kotreshhr · 2025-04-24T12:11:10Z

This PR is under test in https://tracker.ceph.com/issues/71055.

github-actions · 2025-09-23T13:18:22Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

joscollin · 2025-11-25T06:00:28Z

@rishabh-d-dave Ping for rebase.

rishabh-d-dave · 2025-11-25T11:06:52Z

@rishabh-d-dave Ping for rebase.

This backport depends on #61415, #61415 needs to merge first. I've rebased that PR.

batrick

please rebase

rishabh-d-dave added 21 commits January 16, 2025 19:28

mgr/vol: move reusable parts from async_cloner

7a55727

Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 8c536f7)

qa/cephfs: use safe_while in test_volumes.py

5b26463

Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit db0e736)

qa/cephfs: add tests for clone progress

bcb4c18

Clone progress is shown to user through "ceph fs clone status" output and through "ceph status" output. Test both these features. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit e0c85b8)

qa/cephfs: test clone progress reporter after subvol is deleted but...

f25002e

snapshot is retained despite of deletion (using --retain-snapshots option of "subvolume rm" command). Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 9e34499)

mgr/vol: reuse code for spawning threads

2356bd5

Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 7239588)

mgr/vol: add helpers to spawn all threads and more threads

f8a8a87

Also add log messages for in these helper methods to allow tracking when and why more threads were spawned. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit a982495)

mgr/vol: remove unused variable

846edd3

Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit ce5d62d)

rishabh-d-dave added cephfs Ceph File System needs-review labels Jan 16, 2025

rishabh-d-dave requested a review from a team January 16, 2025 14:10

github-actions bot added documentation mgr pybind tests labels Jan 16, 2025

github-actions bot added this to the squid milestone Jan 16, 2025

kotreshhr added the wip-khiremat-squid-testing-2 label Apr 24, 2025

github-actions bot added the needs-rebase label Sep 23, 2025

batrick requested changes Mar 18, 2026

View reviewed changes

batrick marked this pull request as draft March 18, 2026 14:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

squid: qa/cephfs: backport mgr/vol improvements for smooth backports in future#61416

squid: qa/cephfs: backport mgr/vol improvements for smooth backports in future#61416
rishabh-d-dave wants to merge 21 commits intoceph:squidfrom
rishabh-d-dave:wip-69561-squid-improvements

rishabh-d-dave commented Jan 16, 2025 •

edited

Loading

Uh oh!

rishabh-d-dave commented Apr 23, 2025

Uh oh!

kotreshhr commented Apr 24, 2025

Uh oh!

github-actions bot commented Sep 23, 2025

Uh oh!

joscollin commented Nov 25, 2025

Uh oh!

rishabh-d-dave commented Nov 25, 2025

Uh oh!

batrick left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

rishabh-d-dave commented Jan 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contribution Guidelines

Checklist

Uh oh!

rishabh-d-dave commented Apr 23, 2025

Uh oh!

kotreshhr commented Apr 24, 2025

Uh oh!

github-actions bot commented Sep 23, 2025

Uh oh!

joscollin commented Nov 25, 2025

Uh oh!

rishabh-d-dave commented Nov 25, 2025

Uh oh!

batrick left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rishabh-d-dave commented Jan 16, 2025 •

edited

Loading