Mon: Add new health warning for non prime w+1 in blaum-roth EC profiles#66129
Conversation
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
b96a598 to
9299700
Compare
| An EC pool is using the ``blaum_roth`` technique and ``w + 1`` is not a prime number. | ||
| This can result in data corruption. | ||
|
|
||
| To check the list of Erasure Code Profiles use the command: |
|
|
||
| ceph osd erasure-code-profile ls | ||
|
|
||
| Then to check the ``w`` value for a particular profile use the command: |
There was a problem hiding this comment.
a command of the following form:
batrick
left a comment
There was a problem hiding this comment.
Did you locate the cause of the QA error in
Yeah so in HealthMonitor.cc line 1332. I have added a check for if the "technique" attribute exists in the metadata for the EC profile. The error was caused by line 1333 checking the value of "technique" when there wasn't one, this is now fixed. |
15ba199 to
d91344b
Compare
|
jenkins test make check |
|
jenkins test make check arm64 |
|
jenkins test |
e7609ec to
a10026c
Compare
|
To avoid making the same mistake from the last PR, I have ran a teuthology test running a subset of runs from the RADOS tests suite to make sure there are no new regressions caused by the change. |
a10026c to
d71d61f
Compare
d71d61f to
74a8ab3
Compare
…th a w+1 that is not prime This commit adds a new health warning for when a user has an erasure code profile using the blaum-roth technique which has a w+1 value that is not prime. Fixes: http://tracker.ceph.com/issues/64419 Signed-off-by: Tom Sollers <tom.sollers@ibm.com>
…alth warn This commit adds a new test to test-erasure-code-plugins.sh that tests for the health warning caused by having a erasure-code-profile with the blaum-roth technique and a w+1 value that is not prime. Fixes: http://tracker.ceph.com/issues/64419 Signed-off-by: Tom Sollers <tom.sollers@ibm.com>
74a8ab3 to
e61450c
Compare
|
Pushed what should hopefully be the final version of the PR |
bill-scales
left a comment
There was a problem hiding this comment.
LGTM. Once Patrick approves we can set the needs-qa flag and get a full run of the rados suite to double check that there are no regressions.
|
RADOS Approved: https://tracker.ceph.com/issues/73968#note-3 |
This PR is the second attempt of the original PR, which was merged to early and reverted in #66114
This PR adds a health warning for when a Erasure code profile uses the blaum-roth technique and has a w value such that w+1 is not prime, which can cause data corruption. It also improves the validation for when the user attempts to create a new erasure code profile by throwing a more descriptive warning when an invalid w value is used, and allows the user to override this validation using the --yes-i-really-mean-it flag. In addition to the previous changes this PR also adds a new test to /qa/standalone/erasure-code/test-erasure-code-plugins.sh to check that the health warning is functioning correctly.
Fixes: http://tracker.ceph.com/issues/64419
Signed-off-by: Tom Sollers tom.sollers@ibm.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job DefinitionYou must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.