Mon: Add health warning for blaum-roth EC profile when w+1 is not prime#65940
Conversation
There was a problem hiding this comment.
Left several suggestions. Please also change the commit/PR titles to be a present-tense imperative, i.e.
"mon: Add health warning for blaum-roth EC profile when w+1 is not prime"
Also, might it be possible to enhance the ceph osd erasure-code-profile set command to not create such a profile unless --yes-i-really-mean-it and -force are passed?
We would want to keep the health warning still to cover existing profiles/pools and cases where someone modifies existing defs via other means.
bill-scales
left a comment
There was a problem hiding this comment.
Changes are looking good. This review is just very minor issues - mainly not following the coding standard that I've highlighted to get you into good habits :-)
src/mon/HealthMonitor.cc
Outdated
| dout(20) << erasure_code_profile << dendl; | ||
|
|
||
| //This will look at the erasure code profiles technique is blaum_roth and will check that the w key exists | ||
| if (erasure_code_profile.second.at("technique") == "blaum_roth" && erasure_code_profile.second.count("w") == 1){ |
There was a problem hiding this comment.
nit: coding standard has space before {
nit: coding standard is 80 character lines (a few exceptions are ok if its hard to break the line) - you could break the line after && here. There are several places in this PR where you need to fix line lengths
|
The commit message signed-off-by lines are missing |
a071968 to
228a244
Compare
|
Fixed all the commit messages so they now are all in the present tense and the email tom.sollers@ibm.com is now inside <> |
228a244 to
36dfa83
Compare
|
Amended a commit with a change that should have been a part of it |
|
jenkins test make check |
|
jenkins test make check arm64 |
36dfa83 to
11f1bc3
Compare
|
Updated all the commit messages again to remove src from the title and to remove the use of I. |
11f1bc3 to
c88965d
Compare
…th a w+1 that is not prime This commit adds a new health warning for when a user has an erasure code profile using the blaum-roth technique which has a w+1 value that is not prime. Fixes: http://tracker.ceph.com/issues/64419 Signed-off-by: Tom Sollers <tom.sollers@ibm.com>
…alth warn This commit adds a new test to test-erasure-code-plugins.sh that tests for the health warning caused by having a erasure-code-profile with the blaum-roth technique and a w+1 value that is not prime. Fixes: http://tracker.ceph.com/issues/64419 Signed-off-by: Tom Sollers <tom.sollers@ibm.com>
c88965d to
256e277
Compare
|
jenkins test make check arm64 |
1 similar comment
|
jenkins test make check arm64 |
|
jenkins test make check arm64 |
|
jenkins test make check arm64 |
1 similar comment
|
jenkins test make check arm64 |
|
This is an automated message by src/script/redmine-upkeep.py. I have resolved the following tracker ticket due to the merge of this PR: No backports are pending for the ticket. If this is incorrect, please update the tracker Update Log: https://github.com/ceph/ceph/actions/runs/18777419898 |
|
@Tom-Sollers that PR was merged before it went to QA, there are some issue that causing ceph-mon to segfault: please check: /a/nmordech-2025-10-27_18:05:22-rados-wip-sjust-rocky10-mgr-serialize-distro-default-smithi/8570732 |
|
|
||
| //This is a loop that will go through all the erasure code profiles | ||
| for (auto& erasure_code_profile : mon.osdmon()->osdmap.get_erasure_code_profiles()) { | ||
| dout(20) << "check_erasure_code_profiles" << "checking" << erasure_code_profile << dendl; |
There was a problem hiding this comment.
nit: add space so the string won't looks like one word:
2025-10-27T21:03:56.953+0000 7fa2013b5640 20 mon.a@0(leader).health check_erasure_code_profilescheckingdefault,{crush-failure-domain=osd,k=2,m=1,plugin=isa,technique=reed_sol_van}
| dout(20) << "check_erasure_code_profiles" << "checking" << erasure_code_profile << dendl; | ||
|
|
||
| //This will look at the erasure code profiles technique is blaum_roth and will check that the w key exists | ||
| if (erasure_code_profile.second.at("technique") == "blaum_roth" && |
There was a problem hiding this comment.
We didn't have a technique in lrcprofile, that caused the fail:
2025-10-27T21:03:56.953+0000 7fa2013b5640 20 mon.a@0(leader).health check_erasure_code_profilescheckinglrcprofile,{crush-device-class=,crush-failure-domain=osd,crush-num-failure-domains=0,crush-osds-per-failure-domain=0,crush-root=default,k=4,l=3,m=2,name=lrcprofile,plugin=lrc}
…files with a w+1 that is not prime" This reverts commit b9aca15. See: ceph/ceph#65940 (comment)
…r new health warn" This reverts commit 256e277. See: ceph/ceph#65940 (comment) DISTROS: rocky10 ARCHS: x86_64 FLAVORS: default CEPH-BUILD-BRANCH: r10-container
…r new health warn" This reverts commit 256e277. See: ceph/ceph#65940 (comment)
|
I am reverting this PR in #66114. This should have been QA'd by the core team and reviewed by a Monitor SME. Please do not merge your own PRs without consulting with a PTL. |
This PR adds a health warning for when a Erasure code profile uses the blaum-roth technique and has a w value such that w+1 is not prime, which can cause data corruption. It also improves the validation for when the user attempts to create a new erasure code profile by throwing a more descriptive warning when an invalid w value is used, and allows the user to override this validation using the --yes-i-really-mean-it flag. In addition to the previous changes this PR also adds a new test to /qa/standalone/erasure-code/test-erasure-code-plugins.sh to check that the health warning is functioning correctly.
Fixes: http://tracker.ceph.com/issues/64419
Signed-off-by: Tom Sollers tom.sollers@ibm.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job DefinitionYou must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.