Skip to content

erasure-code: restore jerasure BlaumRoth default w#2556

Merged
1 commit merged intogiantfrom
unknown repository
Sep 29, 2014
Merged

erasure-code: restore jerasure BlaumRoth default w#2556
1 commit merged intogiantfrom
unknown repository

Conversation

@ghost
Copy link

@ghost ghost commented Sep 23, 2014

Changing from W=7 to W=6 by default for the BlaumRoth technique is
correct but introduces a regression. The content that was encoded with
the previous version cannot be read again. Although the prime(w+1)
constraint was not obeyed by W=7, the encoded content was useable and
should keep being readable.

The W=7 remains the default for backward compatibility and an exception
to the prime(w+1) check.

http://tracker.ceph.com/issues/9572 Fixes: #9572

Signed-off-by: Loic Dachary loic-201408@dachary.org

Changing from W=7 to W=6 by default for the BlaumRoth technique is
correct but introduces a regression. The content that was encoded with
the previous version cannot be read again. Although the prime(w+1)
constraint was not obeyed by W=7, the encoded content was useable and
should keep being readable.

The W=7 remains the default for backward compatibility and an exception
to the prime(w+1) check.

http://tracker.ceph.com/issues/9572 Fixes: #9572

Signed-off-by: Loic Dachary <loic-201408@dachary.org>
@ghost
Copy link
Author

ghost commented Sep 23, 2014

@apeters1971 I overlooked backward compatibility in the previous fix enforcing BlaumRoth constraints. Does this backward compatibility fix look ok to you ?

@apeters1971
Copy link
Contributor

Hi Loic, so you are sure that en-coding and de-coding is working with w=7 ... what was the Jerasure library doing in this case? Using another matrix?

@ghost
Copy link
Author

ghost commented Sep 23, 2014

@apeters1971 I'm sure it works: content is encoded and decoded as one would expect. I'm not sure what the prime(w+1) constraint enforces but it does cause this problem. I'm also tempted to not overthink this one beyond the default because it is extremely unlikely that anyone is using it for real. And I would be amazed if anyone does use it with a custom w. Do you think I'm too lazy ? :-)

@ghost
Copy link
Author

ghost commented Sep 24, 2014

@apeters1971 Sorry to insist : this is blocking http://tracker.ceph.com/issues/9420 but I would not want to commit something that you don't feel comfortable with. Thanks for your understanding :-)

@apeters1971
Copy link
Contributor

Hi Loic, I agree that I imagine nobody ever used that and you can certainly keep it like that for backward compatibility. But if there is a restriction on w+1 to be prime the reason for that is probably that it does not work if it is not prime. So you shouldn't have this as a default!

@ghost
Copy link
Author

ghost commented Sep 24, 2014

@apeters1971 apparently it works for w=7 despite the fact that w=8 is not prime. I encoded / decoded random content, including recovering from erasure, successfully with w=7. But maybe I was just lucky and other content would have failed. In which case we should declare this a bug and add something in the release notes to say that all blaumroth encoded content is potentialy corrupted. It's worth checking if that is a possibility though.

@ghost
Copy link
Author

ghost commented Sep 24, 2014

@apeters1971

for w in 7 11 13 17 19 ; do for k in $(seq 2 $w) ; do for m in $(seq 1 $k) ; do for erasures in $(seq 1 $m) ; do ./ceph_erasure_code_benchmark --plugin jerasure --workload decoded --iterations 1 --size 4096 --erasures $erasures --parameter w=$w --parameter k=$k --parameter m=2 --parameter technique=blaum_roth ; done ; done ; done ; done

all check out. Whatever consequence the unmatched constraint, it does not cause an encoding/decoding problem.

@apeters1971
Copy link
Contributor

Hi Loic,
how can you assume that with one iteration you test all failure scenarios? You have to run something like 1000 or better more iteratioms to be sure to catch all possible failure modes with your random failure selection. Or did the benchmark tool changed and I didnt' notice yet?

Cheers Andreas.


From: Loic Dachary [notifications@github.com]
Sent: 25 September 2014 01:10
To: ceph/ceph
Cc: Andreas Joachim Peters
Subject: Re: [ceph] erasure-code: restore jerasure BlaumRoth default w (#2556)

for w in 7 11 13 17 19 ; do for k in $(seq 2 $w) ; do for m in $(seq 1 $k) ; do for erasures in $(seq 1 $m) ; do ./ceph_erasure_code_benchmark --plugin jerasure --workload decoded --iterations 1 --size 4096 --erasures $erasures --parameter w=$w --parameter
k=$k --parameter m=2 --parameter technique=blaum_roth ; done ; done ; done ; done

all check out. Whatever consequence the unmatched constraint, it does not cause an encoding/decoding problem.


Reply to this email directly or view it on GitHubhttps://github.com//pull/2556#issuecomment-56753093.

@ghost
Copy link
Author

ghost commented Sep 25, 2014

@apeters1971 you are correct, my mistake. I'll change the benchmark tool to allow non random exploration, that will be convenient in the future.

@ghost
Copy link
Author

ghost commented Sep 25, 2014

for w in 7 11 13 17 19 ; do for k in $(seq 2 $w) ; do for m in $(seq 1 2) ; do for erasures in $(seq 1 $m) ; do ./ceph_erasure_code_benchmark --plugin jerasure --workload decoded --iterations 1 --size 4096 --erasures $erasures --erasures-generation exhaustive --parameter w=$w --parameter k=$k --parameter m=$m --parameter technique=blaum_roth ; done ; done ; done ; done

claims all is well, using the --erasures-generation exhaustive implemented at https://github.com/dachary/ceph/commit/648b7bccc2cab91e7b12889ad60133263f118a82

@liewegas
Copy link
Member

It seems dangerous to leave the default as something that is not "supposed" to work. I would rather change the default, break compatibility, and put in an upgrade note about it. If we find that someone is using 7, can we simply ask them to put w=7 in their ec profile or something to make their pool continue to function?

@ghost
Copy link
Author

ghost commented Sep 25, 2014

Ok. If Kevin Greenan finds out that using w=7 is harmless, we can leave it. If we're not sure it's probably better to add a note. The probability that someone is using this technique is extremely low anyway.

@ghost
Copy link
Author

ghost commented Sep 29, 2014

@apeters1971 I've added a verbose output that shows the recursive implementation to retrieve all combinations of the erasure actually works, as shown in http://tracker.ceph.com/issues/9572#note-4

@liewegas it turns out that w=7 is valid for all combinations of k. This is not proven in theory but a brute force exploration of all erasure scenario proves that it actually works. My understanding is that it is proven to work in theory for all w where w+1 is prime. But it does not mean that other values of w do not work, only that the proof that they work must be made via brute force exploration instead of a mathematical proof.

@ghost
Copy link
Author

ghost commented Sep 29, 2014

@liewegas I think it is safe to leave w=7 as an exception to the rule.

@liewegas
Copy link
Member

SOunds okay to me!

On Mon, 29 Sep 2014, Loic Dachary wrote:

@liewegas I think it is safe to leave w=7 as an exception to the rule.

?
Reply to this email directly or view it onGitHub.[433031__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcyNzYwMTIzOSwiZGF0YSI6eyJpZCI6NDMzOTQ1OTh9fQ==--9717c2b
dc94c2d1ba932ea523b79b3ce79430d71.gif]

ghost pushed a commit that referenced this pull request Sep 29, 2014
erasure-code: restore jerasure BlaumRoth default w

Reviewed-by: Sage Weil <sage@redhat.com>
@ghost ghost merged commit 9c4616d into ceph:giant Sep 29, 2014
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants