squid: qa/cephfs: backport mgr/vol improvements for smooth backports in future#61416
Draft
rishabh-d-dave wants to merge 21 commits intoceph:squidfrom
Draft
squid: qa/cephfs: backport mgr/vol improvements for smooth backports in future#61416rishabh-d-dave wants to merge 21 commits intoceph:squidfrom
rishabh-d-dave wants to merge 21 commits intoceph:squidfrom
Conversation
Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 8c536f7)
List volumes returns a list of of 1-member dictionaries where each
member is "{'name': <volname>}". This format is useful printing output
of the command "ceph fs volume ls" but for internal use this format is
hinderance since before using it, this list needs to be resolved from
list of dictionaries to list of strings.
Return a list of strings and move the code for converting it to list of
dictionaries to method "list_fs_volumes()" (which is method run by "ceph
fs volume ls" command).
This triggers change in async_job.py where we need volnames. The code
here converts list of dicts to list of string. This needs to removed.
Second, don't create unnecessary temporary variable "fs_map" in
"list_volumes()" since it is not required. To get fs names we can
directly iterate.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 9f355b6)
Output of command "ceph fs clone status" will now show the progress made by that specific clone. The output will show cloning completed in terms of percentage and amount of bytes that have been cloned and number of files that have been cloned. Fixes: https://tracker.ceph.com/issues/61904 Signed-off-by: Rishabh Dave <ridave@redhat.com> Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit d7bc828)
Print a progress bar for ongoing clone job in output of "ceph status". When multiple clones are ongoing, show 1 progress bar in output of "ceph status" shows average of progress made by each clone. When number of clone job is more than number of clone threads, print 2 progress bars in output of "ceph status" command; one for ongoing clone jobs and other for ongoing+pending clone jobs. Fixes: https://tracker.ceph.com/issues/61904 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 65b789e)
1. Let caller check for multiple states. It might happen that clone
finishes while it is being cancelled, in such cases user might want
to check for both.
2. Add a helper method to check if clone is in pending state and add a
separate method to check if clone is in cancelled state.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 10949bf)
Add a helper method that accepts command arguments (along with rest of paramters accepted by the method run_shell()) and return the stdout of the command. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 9f60848)
Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit db0e736)
TestVolumesHelper._do_subvolume_io() is a helper method that allows users to generate data for testing. mgr/vol code that reports progress made by clone jobs depends on the value set for xattr rbytes. It takes a bit of a time for rbytes to be set. And, therefore, all tests in TestCloneProgressReporter needs to wait for subvolume's rbytes xattr's value to be set to the actual amount of data present on the subvolume before proceeding to actually testing. So that this can be achieved make _do_subvolume_io() return size of the data it has generated. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 92aecab)
Clone progress is shown to user through "ceph fs clone status" output and through "ceph status" output. Test both these features. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit e0c85b8)
Update docs and add release notes about the progress report that is printed in output of "ceph fs clone status" command and progress bars that is/are printed in output of "ceph status" command. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 645cc6e)
Test name is test_subvolume_snapshot_info_if_clone_pending_for_no_group, located in class TestSubvolumeSnapshotClones in test_volumes.py 5 seconds can (sometimes) be insufficient as value of the config option "snapshot_clone_delay" in this. Increase it to avoid unnecessary race conditions which leads to irrelevant failures. Following is an example where 5 seconds was insufficient as waiting period since instead it took 8 seconds - 2024-07-28T18:16:10.088 DEBUG:teuthology.orchestra.run.smithi064:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph config set mgr mgr/volumes/snapshot_clone_no_wait False ... 2024-07-28T18:16:18.694 DEBUG:teuthology.orchestra.run.smithi064:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph fs subvolume snapshot info cephfs subvol79370 subvol_snap40980 This issue was seen during testing of PR to which this commit belongs. This commit has been separated from the commit that adds tests for clone progress reporting so that it's easy to document need for this code patch and also track it. This commit is not being moved to a different PR and been kept on the same PR since it can't be reproduced otherwise. This also ensures that commit is backported to older release along with code that caused this issue, causing no one to need to find this commit while backporting effort. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit a6b95a5)
snapshot is retained despite of deletion (using --retain-snapshots option of "subvolume rm" command). Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 9e34499)
Currently timeout is set to 5. But hardcoding this is unnecessary since the class already defines a attribute for this purpose. Use that instead. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit ccd5878)
Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 7239588)
Also add log messages for in these helper methods to allow tracking when and why more threads were spawned. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit a982495)
In class RTimer in mgr_util.py, "self.finished.set()" is run even though the event self.finished was set just now. If it wasn't set, the while loop the precedes it would've never finished running. Therefore, remove this redundant line of code. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 21cf769)
When an exception occurs in class RTimer of mgr_util.py, only the
exception message is logged but not only this is insufficient for
debugging but also it is hard to spot in logs. This should not be the
case, especially for an occurring exception. Therefore, add code to log
traceback and exception name as well along with exception's message.
Log entry before this patch -
2024-09-27T00:22:38.656+0530 7f05c7e006c0 0 [volumes ERROR mgr_util] task exception: dummy exception for testing
Log entry with this patch -
2024-09-27T00:40:26.509+0530 7f61d64006c0 0 [volumes ERROR mgr_util] exception encountered in RTimer instance "<RTimer(Thread-7, started daemon 140058183075520)>":
Traceback (most recent call last):
File "/home/rishabh/repos/ceph/minor3/src/pybind/mgr/mgr_util.py", line 91, in run
self.function(*self.args, **self.kwargs)
File "/home/rishabh/repos/ceph/minor3/src/pybind/mgr/volumes/fs/stats_util.py", line 232, in _update_progress_bars
raise RuntimeError('dummy exception for testing')
RuntimeError: dummy exception for testing
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit b96d714)
Orignally, when the feature was in development, IDs for clone progress bars were set to randomly generated UUID strings. But, eventually, it was decided to assign fixed strings to them. Unlike UUIDs, these strings stay the same even when progress bars are destroyed and re-created. Therefore, instead of re-assigning the same strings every time initiate_reporting() is called, move them to __init__() so that both the IDs are defined only once. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 3482ebc)
It is a good practice and a common convention to call base class's __init__() method at the beginning of derived class's __init__(). It ensures proper initialization of base class's attributes and methods. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit ad9ff38)
Add some comment to explains the purpose and details of data structures that are used for queuing of async jobs since right now it's not obvious from initialization. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 8f7c5f0)
Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit ce5d62d)
Contributor
Author
|
jenkins test pull_request_target |
Contributor
|
This PR is under test in https://tracker.ceph.com/issues/71055. |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
Member
|
@rishabh-d-dave Ping for rebase. |
Contributor
Author
This backport depends on #61415, #61415 needs to merge first. I've rebased that PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes: https://tracker.ceph.com/issues/69560
Depends on #61415
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e