mds: add ceph.dir.bal.mask vxattr for MDS Balancer#52373
mds: add ceph.dir.bal.mask vxattr for MDS Balancer#52373yongseokoh wants to merge 3 commits intoceph:mainfrom
Conversation
batrick
left a comment
There was a problem hiding this comment.
Few things:
- Please add the PR discussion to your commit message.
- Because the bitset is 256 bits, it's not generally easy to compute the xattr value for the caller. I think it would be helpful to have the MDS compute the value by allowing something like
setfattr -n ceph.dir.bal.mak -v 0,1,3,15 dir/such that the MDS will do the bitwise or of those bits. - I'd like to see some tests in
qa/tasks/cephfs/test_exports.py. You can usevstart_runner.pyto test. - There should be some docs added to explain this and the MDSMap bal rank mask. Which should users prefer and when? Is it valid to set the rank mask on the root directory? Are values inherited or override-able?
src/mon/FSCommands.cc
Outdated
| if (r != 0) { | ||
| return r; | ||
| } | ||
| std::bitset<MAX_MDS> rank_mask = std::bitset<MAX_MDS>(bin_string); |
There was a problem hiding this comment.
Please carve out this into a separate commit since this is through the fs set interface (for mdsmap) rather than the vxattr interface proposal.
There was a problem hiding this comment.
I will split the PR once the discussion about restrictionon rank0 is resolved.
#52373 (comment)
ab04b9d to
5499c23
Compare
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
5499c23 to
bfef169
Compare
bfef169 to
108e4ad
Compare
|
@batrick @vshankar Please review my changes.
Done.
I agree. It is not easy to calculate numerous bits and configure the bitfield.
Test cases for ceph.dir.bal.mask were implemented in test_exports.py.
How to use ceph.dir.bal.mask is explained in the document. |
|
jenkins retest this please |
4aca9f4 to
53a786c
Compare
|
jenkins test make check |
|
jenkins test make check arm64 |
|
jenkins test make check |
|
I'll try again, thanks. |
src/mds/CInode.cc
Outdated
|
|
||
| CInode *CInode::get_rank_mask_inode(bool inherit) | ||
| { | ||
| if (!g_conf().get_val<bool>("mds_bal_export_pin")) |
There was a problem hiding this comment.
This will be too expensive to run so frequently. Please cache the config variable in MDCache (as we do elsewhere).
|
jenkins test make check |
|
@batrick Could you please review this updated PR when you get a moment? |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
anthonyeleven
left a comment
There was a problem hiding this comment.
Various nitpicky docs suggestions.
There was a problem hiding this comment.
Does "nicely" have specific meaning here? Since it's in a function name I suspect so
There was a problem hiding this comment.
@anthonyeleven Could you clarify which part of the code you're referring to?
There was a problem hiding this comment.
Odd that this doesn't appear to be anchored to the line in the file.
dout(7) << "try to export nicely " << cd->get_path() << " auth " << cd->authority().first << " to " << target << " mask " << bitmask_to_str(rank_mask_bitset) << dendl;
mds->mdcache->migrator->export_dir_nicely(cd, target);
What does it mean to export "nicely"?
There was a problem hiding this comment.
@anthonyeleven
“nicely” means performing a graceful export — the directory is transferred to the target MDS without forcing or interrupting ongoing operations.
I didn’t modify this function in this change.
There was a problem hiding this comment.
Should the error string here and below include more detail, like perhaps an encoded representation of the string?
There was a problem hiding this comment.
@anthonyeleven Could you clarify which part you’re referring to?
There was a problem hiding this comment.
I think you may have subsequently updated the commit. I saw something like an parsing error reported, without saying what the error was and what the actual string value was.
There was a problem hiding this comment.
@anthonyeleven Could you please point me to the specific line of code you’re referring to?
Could you please confirm if this is the line you were referring to:
https://github.com/ceph/ceph/pull/52373/files#diff-729d5135082091929c032d8d6a0552bd2a5c658006aa7cb409764d85e64f9431R631
There was a problem hiding this comment.
I no longer see the line to which I was referring, so nevermind
|
@anthonyeleven Please feel free to add any further comments on the MDS code section or the code block formatting — I’ll make the updates accordingly. |
There was a problem hiding this comment.
I think you may have subsequently updated the commit. I saw something like an parsing error reported, without saying what the error was and what the actual string value was.
There was a problem hiding this comment.
I no longer see the line to which I was referring, so nevermind
|
@anthonyeleven Changes applied as suggested. Let me know if I can rebase now. |
|
Docs look good to me; others need to approve the code, and I see conflicts reported. |
That introduces the ceph.dir.bal.mask vxattr, which is an option to rebalance a subtree within specific active MDSs. Similar to the CPU mask, this feature enables load balancing of specific directories across multiple MDS ranks. It is especially useful for fine-tuning and improving performance in various scenarios. Previously, the bal_rank_mask in ceph#43284 supports isolating unpinned subtrees under the root directory ('/') to a specific MDS rank. However, with this new option vxattr, it becomes possible to isolate specific subdirectories to designated MDS ranks. By introducing the ceph.dir.bal.mask vxattr, this PR empowers Ceph administrators with enhanced control and flexibility for optimizing performance and fine-tuning their deployments. trakcer: https://tracker.ceph.com/issues/61777 Signed-off-by: Yongseok Oh <yongseok.oh@linecorp.com>
Signed-off-by: Yongseok Oh <yongseok.oh@linecorp.com>
Signed-off-by: Yongseok Oh <yongseok.oh@linecorp.com>
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
Can one of the admins verify this patch? |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
tracker: https://tracker.ceph.com/issues/61777
Introduction
This PR introduces the ceph.dir.bal.mask vxattr, which is an option to rebalance a subtree within specific active MDSs. Similar to the CPU mask, this feature enables load balancing of specific directories across multiple MDS ranks. It is especially useful for fine-tuning and improving performance in various scenarios. Previously, the bal_rank_mask in #43284 supports isolating unpinned subtrees under the root directory ('/') to a specific MDS rank. However, with this new option vxattr, it becomes possible to isolate specific subdirectories to designated MDS ranks. By introducing the ceph.dir.bal.mask vxattr, this PR empowers Ceph administrators with enhanced control and flexibility for optimizing performance and fine-tuning their deployments.
Use Cases
The first is when it is difficult to pin a subdir to one MDS rank. The /home/images directory exists. There are /0 to /99 directories under it, and 10 million image files are stored in each directory. In this case, it is difficult to pin the entire images directory to one MDS rank. Also, pinning the huge 100 directories manually or using ephemeral pinning is not an easy task. Therefore, efficient resource management is possible by using ceph.dir.bal.mask.
Second, when there are several large directories such as /home/images, performance can be optimized by distributing them to different MDS rank groups using ceph.dir.bal.mask. Since the existing mdsmap’s bal_rank_mask isolated the entire ‘/’ directory to specific ranks, it can affect performance due to each other's migration overhead. For example, mdsmap’s bal_rank_mask is set to 0xf and /home/images and /home/backups large directories exist. If the load on /home/images instantaneously increases, metadata distribution occurs across ranks 0 to 3. Thus, users of /home/backups may be affected by noisy neighbors unnecessarily. If the two directories are set to MDS rank 0-1 (ceph.dir.bal.mask 0x3) and 2-3 (ceph.dir.bal.mask 0xC) respectively, the effect on each other can be minimized. Like this, it can be used efficiently for various directories.
How to use
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows