Bug #64419
closedMake the blaum_roth technique experimental for ec profiles
0%
Description
As this technique is shown to cause data corruption in some scenarios (see https://tracker.ceph.com/issues/63876), it makes sense to make this an experimental feature until the bug is addressed.
Updated by Laura Flores about 2 years ago
- Related to Bug #63876: rados/cephfs: apparent file corruption on cephfs on an EC pool added
Updated by - aschmitz 6 months ago
I'm the one who was bit by this, and it appears to have been caused by the fact that I apparently didn't investigate Blaum-Roth parameters sufficiently, and used the default value of w=7.
Page 29 of the Jerasure manual (https://github.com/ceph/jerasure/blob/de1739cc8483696506829b52e7fda4f6bb195e6a/Manual.pdf) indicates that w+1 must be prime, but the default value therefore results in an invalid configuration.
Back in #6754 (12 years ago!), an attempt to validate this was made, but the practical effect was quickly reverted in #6572 (which, despite the numbering, came after the first change).
I understand the desire for backwards compatibility (and certainly it is better to have some data accessible than none, if Ceph refuses to start), but the default value for w here causes data corruption (with low probability on any given EC object, but high probability overall for large enough datasets).
If at all possible, I would suggest changing the default value of w to 6, and, ideally, producing a noticeable warning about w=7 (perhaps as a health issue in ceph status): w=7 almost certainly means that data corruption is a real possibility if a disk dies, or rebalancing occurs.
I suspect that changing the default is as simple as adding back in the default value that was removed in https://github.com/ceph/ceph/pull/2556 , but I'd like someone more experienced in Ceph's EC code to verify that doing so won't change the expected values for existing pools that were using the defaults. (Adding a status warning is likely more complicated, and I don't know how to get from the check_w function to a Ceph status entry.)
As a note for anyone reading this issue who finds themselves in a similar situation, I wrote some rough code to recover data from my blaum_roth w=7 pool by exhaustively combining all EC shards and taking the correct entries. It is at https://github.com/aschmitz/ceph-ec-recovery , but may or may not work in all situations, so I'm mostly providing it as a pointer rather than a complete solution.
Updated by - aschmitz 6 months ago
Oops, I typo'd that second issue number, which explains why I thought they were out of order. The issue reverting the validation and fix to blaum_roth defaults was actually #9572.
Updated by Radoslaw Zarzynski 6 months ago
- Assignee changed from Laura Flores to Bill Scales
This is clearly EC (but not FastEC :-) related.
@Bill Scales, what do you think on improving the validator?
Updated by Jamie Pryde 6 months ago
Thanks for the updates @aschmitz - really interesting read on github and I'm glad you managed to recover the data.
We should change the default value for w when creating a new blaum_roth profile to be 6 instead of 7.
I think the CLI already returns an error message when creating a new blaum_roth profile if you specify a w value where w + 1 is not a prime number. We should check this is true and add this policing if it doesn't already exist.
We should also add a new warning to the health monitor to warn users who have EC pools using a blaum_roth profile if w + 1 is not a prime number. Something like "An EC pool is using the blaum_roth technique and w + 1 is not a prime number. This can result in data corruption."
Updated by Radoslaw Zarzynski 5 months ago
Let's start from just changing the default.
Updated by Bill Scales 5 months ago
- Status changed from New to In Progress
- Target version set to v21.0.0
- Pull request ID set to 65940
Tom's been working on fixing this, a PR with the change is nearly ready for final review - just need to squash the commits:
- Default value has been changed to 6
- Creating an erasure code profile with a value of w+1 that is not prime fails, this can be overridden with --force + --yes-i-really-mean-it with appropriate warnings about data corruption
- Existing erasure code profiles with w=7 (w+1 is not prime) is tolerated - so upgrades are not affected
- Health check will warn if there is a badly configured erasure code profile
- Teuthology test case added to cover all of this
- We already have some tests that validate data integrity with blaum roth with w=6
Updated by Upkeep Bot 5 months ago
- Status changed from In Progress to Resolved
- Merge Commit set to a22278724154bfcf225c6fe420bd305424559e00
- Fixed In set to v20.3.0-3786-ga222787241
- Upkeep Timestamp set to 2025-10-24T10:46:55+00:00
Updated by Yaarit Hatuka 5 months ago
- Related to Bug #73661: mon: check_erasure_code_profiles causing crash added
Updated by Patrick Donnelly 5 months ago
- Status changed from Resolved to Need More Info
- Pull request ID deleted (
65940)
Reverted by https://github.com/ceph/ceph/commit/022e05265435466a542c92322f8ffafe4c66ac7d
Please remake the PR so it may be QA'd and reviewed.
Updated by Radoslaw Zarzynski 5 months ago
- Status changed from Need More Info to In Progress
Updated by Upkeep Bot 5 months ago
- Merge Commit deleted (
a22278724154bfcf225c6fe420bd305424559e00) - Fixed In deleted (
v20.3.0-3786-ga222787241) - Upkeep Timestamp changed from 2025-10-24T10:46:55+00:00 to 2025-11-05T00:53:34+00:00
Updated by Patrick Donnelly 5 months ago
- Assignee changed from Bill Scales to Tom Sollers
- Pull request ID set to 66129
Updated by Patrick Donnelly 5 months ago
- Status changed from In Progress to Fix Under Review
Updated by Laura Flores 4 months ago
PR in draft state; PR under review (2nd round of review needed)
Updated by Laura Flores 3 months ago
- Status changed from Fix Under Review to Resolved
Updated by Upkeep Bot 3 months ago
- Merge Commit set to 551ac2ba463943520b5fb7443c9a68fbfe3e9f75
- Fixed In set to v20.3.0-4612-g551ac2ba46
- Upkeep Timestamp changed from 2025-11-05T00:53:34+00:00 to 2025-12-15T19:40:57+00:00