Skip to content

mds: add option mds_bal_overload_epochs#53332

Merged
vshankar merged 1 commit intoceph:mainfrom
zhsgao:mds_overload_epochs
Oct 4, 2023
Merged

mds: add option mds_bal_overload_epochs#53332
vshankar merged 1 commit intoceph:mainfrom
zhsgao:mds_overload_epochs

Conversation

@zhsgao
Copy link
Contributor

@zhsgao zhsgao commented Sep 8, 2023

Add an option to configure the number of epochs the overload lasts before migrating,
setting it to a higher value can avoid frequent migrations caused by load fluctuations.

Signed-off-by: Zhansong Gao zhsgao@hotmail.com

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@zhsgao zhsgao requested a review from a team as a code owner September 8, 2023 10:11
@vshankar
Copy link
Contributor

Add an option to configure the number of epochs the overload lasts before migrating.

Could you elaborate about the intended use this config option @zhsgao?

@zhsgao
Copy link
Contributor Author

zhsgao commented Sep 15, 2023

Add an option to configure the number of epochs the overload lasts before migrating.

Could you elaborate about the intended use this config option @zhsgao?

Use this option to adjust the duration of mds overload before migration, setting it to a higher value can reduce frequent migrations caused by load fluctuations.

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhsgao I'm curious is you ran into an issue that prompted this hardcoded value to be made configurable?

default: 2
services:
- mds
fmt_desc: The number of epochs the overload lasts before Ceph migrates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a long description for the config.

@zhsgao zhsgao force-pushed the mds_overload_epochs branch from 6c0af22 to fb274ae Compare September 19, 2023 08:36
@zhsgao
Copy link
Contributor Author

zhsgao commented Sep 19, 2023

@zhsgao I'm curious is you ran into an issue that prompted this hardcoded value to be made configurable?

The loads of multiple MDS daemons are unbalanced and fluctuate greatly in a short period of time in my Ceph cluster, and migration is very frequent when using the default configurations. So I set mds_bal_min_rebalance to a larger value to reduce migration(large enough can disable it), it decides whether to migrate based on the degree of overload. So I think there should be an option to decide whether to migrate based on the duration of overload.

@vshankar
Copy link
Contributor

@zhsgao I'm curious is you ran into an issue that prompted this hardcoded value to be made configurable?

The loads of multiple MDS daemons are unbalanced and fluctuate greatly in a short period of time in my Ceph cluster, and migration is very frequent when using the default configurations. So I set mds_bal_min_rebalance to a larger value to reduce migration(large enough can disable it), it decides whether to migrate based on the degree of overload. So I think there should be an option to decide whether to migrate based on the duration of overload.

Fair enough. Please add a long desc to the option. Otherwise LGTM.

Add an option to configure the number of epochs the overload lasts before migrating,
setting it to a higher value can avoid frequent migrations caused by load fluctuations.

Signed-off-by: Zhansong Gao <zhsgao@hotmail.com>
@zhsgao zhsgao force-pushed the mds_overload_epochs branch from fb274ae to 17ae57d Compare September 20, 2023 05:55
@zhsgao
Copy link
Contributor Author

zhsgao commented Sep 20, 2023

@zhsgao I'm curious is you ran into an issue that prompted this hardcoded value to be made configurable?

The loads of multiple MDS daemons are unbalanced and fluctuate greatly in a short period of time in my Ceph cluster, and migration is very frequent when using the default configurations. So I set mds_bal_min_rebalance to a larger value to reduce migration(large enough can disable it), it decides whether to migrate based on the degree of overload. So I think there should be an option to decide whether to migrate based on the duration of overload.

Fair enough. Please add a long desc to the option. Otherwise LGTM.

I have modified fmt_desc and the commit message.

@vshankar
Copy link
Contributor

jenkins retest this please

@vshankar
Copy link
Contributor

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vshankar vshankar merged commit acd7f82 into ceph:main Oct 4, 2023
vshankar added a commit to vshankar/ceph that referenced this pull request Oct 7, 2023
* refs/pull/53332/head:
	mds: add option mds_bal_overload_epochs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants