mds: add mds_bal_rank_mask config option#43284
Conversation
|
@varshar16 Ping? |
|
I think this needs an explanation of the benefit it offers. When would I set this option, and how? Is this for testing, or some other purpose? It's not clear to me how you expect this to be used. |
|
@jtlayton The main benefit of this feature is that it allows administrators to manage dynamic subtree partitioning and static pining schemes in separate active MDS ranks. Specifically, in our production cluster, a lot of sub-volumes are allotted to multiple VMs. For management convenience, most subvolumes are not accessed frequently, so dynamic balancing is considered, and some subvolumes that require high performance are managed by static partitioning. However, the existing dynamic balancing spreads evenly metadata workloads to all active MDS ranks. Performances of static pinned subvolumes inevitably may be degraded. Therefore, if this approach is applied, static/dynamic partitioning can be applied separately to different MDS ranks. For instance, assume that we have four active MDS ranks, 0 and 1 active ranks can only be used for dynamic balancing and the remaining 2 and 3 active ranks can be used for static pinning. Active 0 and 1 ranks are used for dynamic balancing with the below command line. if there are no pined subdirs, metadata requests are submitted to only active 0 and 1 ranks. You can see it with ceph fs status. |
|
Ping? |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
This functionality looks useful for certain use-cases as you mention. Thanks for explaining. Could you rebase this PR please? |
cdd5cae to
f608fad
Compare
|
jenkins test make check |
|
This PR has been rebased. Could you please take a look at it? |
|
thanks @yongseokoh - I'll review and put it to test next week. |
|
@vshankar I am looking forward to your feedback. |
vshankar
left a comment
There was a problem hiding this comment.
I'm still playing with this change. Providing my initial set of comments.
@yongseokoh It would be good add some tests too. Also, FWIW, have you looked or tried using mantle0 to replicate this functionality?
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
f608fad to
caeb592
Compare
caeb592 to
a141541
Compare
|
@yongseokoh - I'll take a look. Thanks! BTW, have you looked at mantle (https://docs.ceph.com/en/latest/cephfs/mantle/) as an alternate way to achieve this functionality? |
|
@vshankar Thanks for your valuable recommendation. I will go over its feasibility. |
|
@vshankar Do you have any comments? |
Hey @yongseokoh - I was on year end PTO, sorry for the delay. Were you able to look into my comment - #43284 (comment) and check its feasibility? |
This new configuration option will allow to define the maximum size for a filesystem xattrs blob. This is a filesystem-wide knob that will replace the per-MDS mds_max_xattr_pairs_size option. Note: The kernel client patch to handle this new configuration was merged before the corresponding ceph-side pull-request. This was unfortunate because in the meantime PR ceph#43284 was merged and the encoding/decoding of 'bal_rank_mask' got in between. Hence the 'max_xattr_size' is being encoding/decoded before 'bal_rank_mask'. URL: https://tracker.ceph.com/issues/55725 Signed-off-by: Luís Henriques <lhenriques@suse.de>
This new configuration option will allow to define the maximum size for a filesystem xattrs blob. This is a filesystem-wide knob that will replace the per-MDS mds_max_xattr_pairs_size option. Note: The kernel client patch to handle this new configuration was merged before the corresponding ceph-side pull-request. This was unfortunate because in the meantime PR ceph#43284 was merged and the encoding/decoding of 'bal_rank_mask' got in between. Hence the 'max_xattr_size' is being encoding/decoded before 'bal_rank_mask'. URL: https://tracker.ceph.com/issues/55725 Signed-off-by: Luís Henriques <lhenriques@suse.de> (cherry picked from commit 7b8def5) Conflicts: src/mds/MDSMap.cc: The change has been backported as it is but the lines surrounding the change are different in Quincy compared to main branch. src/mon/MonCommands.h: Same as above -- the change has been backported as it is but the lines surrounding the change are different in Quincy compared to main branch.
This new configuration option will allow to define the maximum size for a filesystem xattrs blob. This is a filesystem-wide knob that will replace the per-MDS mds_max_xattr_pairs_size option. Note: The kernel client patch to handle this new configuration was merged before the corresponding ceph-side pull-request. This was unfortunate because in the meantime PR ceph#43284 was merged and the encoding/decoding of 'bal_rank_mask' got in between. Hence the 'max_xattr_size' is being encoding/decoded before 'bal_rank_mask'. URL: https://tracker.ceph.com/issues/55725 Signed-off-by: Luís Henriques <lhenriques@suse.de> (cherry picked from commit 7b8def5)
That introduces the ceph.dir.bal.mask vxattr, which is an option to rebalance a subtree within specific active MDSs. Similar to the CPU mask, this feature enables load balancing of specific directories across multiple MDS ranks. It is especially useful for fine-tuning and improving performance in various scenarios. Previously, the bal_rank_mask in ceph#43284 supports isolating unpinned subtrees under the root directory ('/') to a specific MDS rank. However, with this new option vxattr, it becomes possible to isolate specific subdirectories to designated MDS ranks. By introducing the ceph.dir.bal.mask vxattr, this PR empowers Ceph administrators with enhanced control and flexibility for optimizing performance and fine-tuning their deployments. trakcer: https://tracker.ceph.com/issues/61777 Signed-off-by: Yongseok Oh <yongseok.oh@linecorp.com>
That introduces the ceph.dir.bal.mask vxattr, which is an option to rebalance a subtree within specific active MDSs. Similar to the CPU mask, this feature enables load balancing of specific directories across multiple MDS ranks. It is especially useful for fine-tuning and improving performance in various scenarios. Previously, the bal_rank_mask in ceph#43284 supports isolating unpinned subtrees under the root directory ('/') to a specific MDS rank. However, with this new option vxattr, it becomes possible to isolate specific subdirectories to designated MDS ranks. By introducing the ceph.dir.bal.mask vxattr, this PR empowers Ceph administrators with enhanced control and flexibility for optimizing performance and fine-tuning their deployments. trakcer: https://tracker.ceph.com/issues/61777 Signed-off-by: Yongseok Oh <yongseok.oh@linecorp.com>
That introduces the ceph.dir.bal.mask vxattr, which is an option to rebalance a subtree within specific active MDSs. Similar to the CPU mask, this feature enables load balancing of specific directories across multiple MDS ranks. It is especially useful for fine-tuning and improving performance in various scenarios. Previously, the bal_rank_mask in ceph#43284 supports isolating unpinned subtrees under the root directory ('/') to a specific MDS rank. However, with this new option vxattr, it becomes possible to isolate specific subdirectories to designated MDS ranks. By introducing the ceph.dir.bal.mask vxattr, this PR empowers Ceph administrators with enhanced control and flexibility for optimizing performance and fine-tuning their deployments. trakcer: https://tracker.ceph.com/issues/61777 Signed-off-by: Yongseok Oh <yongseok.oh@linecorp.com>
This new configuration option will allow to define the maximum size for a filesystem xattrs blob. This is a filesystem-wide knob that will replace the per-MDS mds_max_xattr_pairs_size option. Note: The kernel client patch to handle this new configuration was merged before the corresponding ceph-side pull-request. This was unfortunate because in the meantime PR ceph#43284 was merged and the encoding/decoding of 'bal_rank_mask' got in between. Hence the 'max_xattr_size' is being encoding/decoded before 'bal_rank_mask'. URL: https://tracker.ceph.com/issues/55725 Signed-off-by: Luís Henriques <lhenriques@suse.de> (cherry picked from commit 7b8def5) Conflicts: src/mds/MDSMap.cc: The change has been backported as it is but the lines surrounding the change are different in Quincy compared to main branch. src/mon/MonCommands.h: Same as above -- the change has been backported as it is but the lines surrounding the change are different in Quincy compared to main branch.
This new configuration option will allow to define the maximum size for a filesystem xattrs blob. This is a filesystem-wide knob that will replace the per-MDS mds_max_xattr_pairs_size option. Note: The kernel client patch to handle this new configuration was merged before the corresponding ceph-side pull-request. This was unfortunate because in the meantime PR ceph#43284 was merged and the encoding/decoding of 'bal_rank_mask' got in between. Hence the 'max_xattr_size' is being encoding/decoded before 'bal_rank_mask'. URL: https://tracker.ceph.com/issues/55725 Signed-off-by: Luís Henriques <lhenriques@suse.de> (cherry picked from commit 7b8def5) src/mds/MDSMap.cc - The change has been backported as it is but the lines surrounding the change are different in Quincy compared to main branch. src/mon/MonCommands.h - The change has been backported as it is but the lines surrounding the change are different in Quincy compared to main branch.
This new configuration option will allow to define the maximum size for a filesystem xattrs blob. This is a filesystem-wide knob that will replace the per-MDS mds_max_xattr_pairs_size option. Note: The kernel client patch to handle this new configuration was merged before the corresponding ceph-side pull-request. This was unfortunate because in the meantime PR ceph#43284 was merged and the encoding/decoding of 'bal_rank_mask' got in between. Hence the 'max_xattr_size' is being encoding/decoded before 'bal_rank_mask'. URL: https://tracker.ceph.com/issues/55725 Signed-off-by: Luís Henriques <lhenriques@suse.de> (cherry picked from commit 7b8def5)
This new configuration option will allow to define the maximum size for a filesystem xattrs blob. This is a filesystem-wide knob that will replace the per-MDS mds_max_xattr_pairs_size option. Note: The kernel client patch to handle this new configuration was merged before the corresponding ceph-side pull-request. This was unfortunate because in the meantime PR ceph#43284 was merged and the encoding/decoding of 'bal_rank_mask' got in between. Hence the 'max_xattr_size' is being encoding/decoded before 'bal_rank_mask'. URL: https://tracker.ceph.com/issues/55725 Signed-off-by: Luís Henriques <lhenriques@suse.de> (cherry picked from commit 7b8def5)
This new configuration option will allow to define the maximum size for a filesystem xattrs blob. This is a filesystem-wide knob that will replace the per-MDS mds_max_xattr_pairs_size option. Note: The kernel client patch to handle this new configuration was merged before the corresponding ceph-side pull-request. This was unfortunate because in the meantime PR ceph#43284 was merged and the encoding/decoding of 'bal_rank_mask' got in between. Hence the 'max_xattr_size' is being encoding/decoded before 'bal_rank_mask'. URL: https://tracker.ceph.com/issues/55725 Signed-off-by: Luís Henriques <lhenriques@suse.de> (cherry picked from commit 7b8def5)
That introduces the ceph.dir.bal.mask vxattr, which is an option to rebalance a subtree within specific active MDSs. Similar to the CPU mask, this feature enables load balancing of specific directories across multiple MDS ranks. It is especially useful for fine-tuning and improving performance in various scenarios. Previously, the bal_rank_mask in ceph#43284 supports isolating unpinned subtrees under the root directory ('/') to a specific MDS rank. However, with this new option vxattr, it becomes possible to isolate specific subdirectories to designated MDS ranks. By introducing the ceph.dir.bal.mask vxattr, this PR empowers Ceph administrators with enhanced control and flexibility for optimizing performance and fine-tuning their deployments. trakcer: https://tracker.ceph.com/issues/61777 Signed-off-by: Yongseok Oh <yongseok.oh@linecorp.com>
That introduces the ceph.dir.bal.mask vxattr, which is an option to rebalance a subtree within specific active MDSs. Similar to the CPU mask, this feature enables load balancing of specific directories across multiple MDS ranks. It is especially useful for fine-tuning and improving performance in various scenarios. Previously, the bal_rank_mask in ceph#43284 supports isolating unpinned subtrees under the root directory ('/') to a specific MDS rank. However, with this new option vxattr, it becomes possible to isolate specific subdirectories to designated MDS ranks. By introducing the ceph.dir.bal.mask vxattr, this PR empowers Ceph administrators with enhanced control and flexibility for optimizing performance and fine-tuning their deployments. trakcer: https://tracker.ceph.com/issues/61777 Signed-off-by: Yongseok Oh <yongseok.oh@linecorp.com>
This new configuration option will allow to define the maximum size for a filesystem xattrs blob. This is a filesystem-wide knob that will replace the per-MDS mds_max_xattr_pairs_size option. Note: The kernel client patch to handle this new configuration was merged before the corresponding ceph-side pull-request. This was unfortunate because in the meantime PR ceph#43284 was merged and the encoding/decoding of 'bal_rank_mask' got in between. Hence the 'max_xattr_size' is being encoding/decoded before 'bal_rank_mask'. URL: https://tracker.ceph.com/issues/55725 Signed-off-by: Luís Henriques <lhenriques@suse.de> (cherry picked from commit 7b8def5)
This new configuration option will allow to define the maximum size for a filesystem xattrs blob. This is a filesystem-wide knob that will replace the per-MDS mds_max_xattr_pairs_size option. Note: The kernel client patch to handle this new configuration was merged before the corresponding ceph-side pull-request. This was unfortunate because in the meantime PR ceph#43284 was merged and the encoding/decoding of 'bal_rank_mask' got in between. Hence the 'max_xattr_size' is being encoding/decoded before 'bal_rank_mask'. URL: https://tracker.ceph.com/issues/55725 Signed-off-by: Luís Henriques <lhenriques@suse.de> (cherry picked from commit 7b8def5)
This is required for a reef client to work with a higher revision
MDS, since essentially, this happens:
reef(client):
if (version >=17) {
decode(bal_rank_mask, p);
}
and higher-revision MDS (say, upcoming squid):
version = 17
encode(version, bl);
...
...
encode(max_xattr_size, bl);
encode(bal_rank_mask, bl);
The client incorrectly decodes max_xattr_size (type: uint64_t) into
bal_rank_mask (type: string).
This situation ended up due to a couple of reasons:
* the kclient patchset hanlding `max_xattr_size` was merged early on
and another MDS side change that bumped the MDSMap encoding version
to 17 got merged in the midst (PR ceph#43284). Details in comment:
ceph#46357 (comment)
* The reef backport for PR ceph#46357 got delayed (and, reef branched out).
Which means reef(18.2.0) user-space clients are broken with higher version
MDSs.
Fixes: https://tracker.ceph.com/issues/63713
Signed-off-by: Venky Shankar <vshankar@redhat.com>
This is required for a reef client to work with a higher revision
MDS, since essentially, this happens:
reef(client):
if (version >=17) {
decode(bal_rank_mask, p);
}
and higher-revision MDS (say, upcoming squid):
version = 17
encode(version, bl);
...
...
encode(max_xattr_size, bl);
encode(bal_rank_mask, bl);
The client incorrectly decodes max_xattr_size (type: uint64_t) into
bal_rank_mask (type: string).
This situation ended up due to a couple of reasons:
* the kclient patchset hanlding `max_xattr_size` was merged early on
and another MDS side change that bumped the MDSMap encoding version
to 17 got merged in the midst (PR ceph#43284). Details in comment:
ceph#46357 (comment)
* The reef backport for PR ceph#46357 got delayed (and, reef branched out).
Which means reef(18.2.0) user-space clients are broken with higher version
MDSs.
Fixes: https://tracker.ceph.com/issues/63713
Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit 36ee8e7)
That introduces the ceph.dir.bal.mask vxattr, which is an option to rebalance a subtree within specific active MDSs. Similar to the CPU mask, this feature enables load balancing of specific directories across multiple MDS ranks. It is especially useful for fine-tuning and improving performance in various scenarios. Previously, the bal_rank_mask in ceph#43284 supports isolating unpinned subtrees under the root directory ('/') to a specific MDS rank. However, with this new option vxattr, it becomes possible to isolate specific subdirectories to designated MDS ranks. By introducing the ceph.dir.bal.mask vxattr, this PR empowers Ceph administrators with enhanced control and flexibility for optimizing performance and fine-tuning their deployments. trakcer: https://tracker.ceph.com/issues/61777 Signed-off-by: Yongseok Oh <yongseok.oh@linecorp.com>
That introduces the ceph.dir.bal.mask vxattr, which is an option to rebalance a subtree within specific active MDSs. Similar to the CPU mask, this feature enables load balancing of specific directories across multiple MDS ranks. It is especially useful for fine-tuning and improving performance in various scenarios. Previously, the bal_rank_mask in ceph#43284 supports isolating unpinned subtrees under the root directory ('/') to a specific MDS rank. However, with this new option vxattr, it becomes possible to isolate specific subdirectories to designated MDS ranks. By introducing the ceph.dir.bal.mask vxattr, this PR empowers Ceph administrators with enhanced control and flexibility for optimizing performance and fine-tuning their deployments. trakcer: https://tracker.ceph.com/issues/61777 Signed-off-by: Yongseok Oh <yongseok.oh@linecorp.com>
That introduces the ceph.dir.bal.mask vxattr, which is an option to rebalance a subtree within specific active MDSs. Similar to the CPU mask, this feature enables load balancing of specific directories across multiple MDS ranks. It is especially useful for fine-tuning and improving performance in various scenarios. Previously, the bal_rank_mask in ceph#43284 supports isolating unpinned subtrees under the root directory ('/') to a specific MDS rank. However, with this new option vxattr, it becomes possible to isolate specific subdirectories to designated MDS ranks. By introducing the ceph.dir.bal.mask vxattr, this PR empowers Ceph administrators with enhanced control and flexibility for optimizing performance and fine-tuning their deployments. trakcer: https://tracker.ceph.com/issues/61777 Signed-off-by: Yongseok Oh <yongseok.oh@linecorp.com>
This new configuration option will allow to define the maximum size for a filesystem xattrs blob. This is a filesystem-wide knob that will replace the per-MDS mds_max_xattr_pairs_size option. Note: The kernel client patch to handle this new configuration was merged before the corresponding ceph-side pull-request. This was unfortunate because in the meantime PR ceph#43284 was merged and the encoding/decoding of 'bal_rank_mask' got in between. Hence the 'max_xattr_size' is being encoding/decoded before 'bal_rank_mask'. URL: https://tracker.ceph.com/issues/55725 Signed-off-by: Luís Henriques <lhenriques@suse.de> (cherry picked from commit 7b8def5)
This new configuration option will allow to define the maximum size for a filesystem xattrs blob. This is a filesystem-wide knob that will replace the per-MDS mds_max_xattr_pairs_size option. Note: The kernel client patch to handle this new configuration was merged before the corresponding ceph-side pull-request. This was unfortunate because in the meantime PR ceph#43284 was merged and the encoding/decoding of 'bal_rank_mask' got in between. Hence the 'max_xattr_size' is being encoding/decoded before 'bal_rank_mask'. URL: https://tracker.ceph.com/issues/55725 Signed-off-by: Luís Henriques <lhenriques@suse.de> (cherry picked from commit 7b8def5)
That introduces the ceph.dir.bal.mask vxattr, which is an option to rebalance a subtree within specific active MDSs. Similar to the CPU mask, this feature enables load balancing of specific directories across multiple MDS ranks. It is especially useful for fine-tuning and improving performance in various scenarios. Previously, the bal_rank_mask in ceph#43284 supports isolating unpinned subtrees under the root directory ('/') to a specific MDS rank. However, with this new option vxattr, it becomes possible to isolate specific subdirectories to designated MDS ranks. By introducing the ceph.dir.bal.mask vxattr, this PR empowers Ceph administrators with enhanced control and flexibility for optimizing performance and fine-tuning their deployments. trakcer: https://tracker.ceph.com/issues/61777 Signed-off-by: Yongseok Oh <yongseok.oh@linecorp.com>
That introduces the ceph.dir.bal.mask vxattr, which is an option to rebalance a subtree within specific active MDSs. Similar to the CPU mask, this feature enables load balancing of specific directories across multiple MDS ranks. It is especially useful for fine-tuning and improving performance in various scenarios. Previously, the bal_rank_mask in ceph#43284 supports isolating unpinned subtrees under the root directory ('/') to a specific MDS rank. However, with this new option vxattr, it becomes possible to isolate specific subdirectories to designated MDS ranks. By introducing the ceph.dir.bal.mask vxattr, this PR empowers Ceph administrators with enhanced control and flexibility for optimizing performance and fine-tuning their deployments. trakcer: https://tracker.ceph.com/issues/61777 Signed-off-by: Yongseok Oh <yongseok.oh@linecorp.com>
That introduces the ceph.dir.bal.mask vxattr, which is an option to rebalance a subtree within specific active MDSs. Similar to the CPU mask, this feature enables load balancing of specific directories across multiple MDS ranks. It is especially useful for fine-tuning and improving performance in various scenarios. Previously, the bal_rank_mask in ceph#43284 supports isolating unpinned subtrees under the root directory ('/') to a specific MDS rank. However, with this new option vxattr, it becomes possible to isolate specific subdirectories to designated MDS ranks. By introducing the ceph.dir.bal.mask vxattr, this PR empowers Ceph administrators with enhanced control and flexibility for optimizing performance and fine-tuning their deployments. trakcer: https://tracker.ceph.com/issues/61777 Signed-off-by: Yongseok Oh <yongseok.oh@linecorp.com>
That introduces the ceph.dir.bal.mask vxattr, which is an option to rebalance a subtree within specific active MDSs. Similar to the CPU mask, this feature enables load balancing of specific directories across multiple MDS ranks. It is especially useful for fine-tuning and improving performance in various scenarios. Previously, the bal_rank_mask in ceph#43284 supports isolating unpinned subtrees under the root directory ('/') to a specific MDS rank. However, with this new option vxattr, it becomes possible to isolate specific subdirectories to designated MDS ranks. By introducing the ceph.dir.bal.mask vxattr, this PR empowers Ceph administrators with enhanced control and flexibility for optimizing performance and fine-tuning their deployments. trakcer: https://tracker.ceph.com/issues/61777 Signed-off-by: Yongseok Oh <yongseok.oh@linecorp.com>
That introduces the ceph.dir.bal.mask vxattr, which is an option to rebalance a subtree within specific active MDSs. Similar to the CPU mask, this feature enables load balancing of specific directories across multiple MDS ranks. It is especially useful for fine-tuning and improving performance in various scenarios. Previously, the bal_rank_mask in ceph#43284 supports isolating unpinned subtrees under the root directory ('/') to a specific MDS rank. However, with this new option vxattr, it becomes possible to isolate specific subdirectories to designated MDS ranks. By introducing the ceph.dir.bal.mask vxattr, this PR empowers Ceph administrators with enhanced control and flexibility for optimizing performance and fine-tuning their deployments. trakcer: https://tracker.ceph.com/issues/61777 Signed-off-by: Yongseok Oh <yongseok.oh@linecorp.com>
That introduces the ceph.dir.bal.mask vxattr, which is an option to rebalance a subtree within specific active MDSs. Similar to the CPU mask, this feature enables load balancing of specific directories across multiple MDS ranks. It is especially useful for fine-tuning and improving performance in various scenarios. Previously, the bal_rank_mask in ceph#43284 supports isolating unpinned subtrees under the root directory ('/') to a specific MDS rank. However, with this new option vxattr, it becomes possible to isolate specific subdirectories to designated MDS ranks. By introducing the ceph.dir.bal.mask vxattr, this PR empowers Ceph administrators with enhanced control and flexibility for optimizing performance and fine-tuning their deployments. trakcer: https://tracker.ceph.com/issues/61777 Signed-off-by: Yongseok Oh <yongseok.oh@linecorp.com>
This PR introduces mds_bal_rank_mask option to rebalance subtrees within certain active MDSs. With this option, some active ranks are used for statically pinned subdirs, whereas the rest ranks are for subtrees to be rebalanced dynamically by MDBalancer.
fixes: https://tracker.ceph.com/issues/52720
Signed-off-by: Yongseok Oh yongseok.oh@linecorp.com
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox