osd: full-object read CRC mismatch due to 'truncate' modifying oi.size w/o clearing 'data_digest'#55008
Conversation
|
@NitzanMordhai - commit test/pybind: check crc fail after append zero |
e12c5c1 to
6c1aec3
Compare
src/osd/PrimaryLogPG.cc
Outdated
| op.extent.truncate_size); | ||
| } | ||
| //truncate modify oi.size, need clear old data_digest and DIGEST flag | ||
| oi.clear_data_digest(); |
There was a problem hiding this comment.
Seems like this should be in the immediately above if block? If so, perhaps we should just move it into truncate_update_size_and_usage?
There was a problem hiding this comment.
@athanatos Jia, who wrote the original fix added that note: https://tracker.ceph.com/issues/53240#note-1
this is case 2, write before truncating
There was a problem hiding this comment.
Hmm, having read that it's still not clear to me why we clear_data_digest() here even if truncate_size == oi.size.
There was a problem hiding this comment.
@athanatos agree, going to move it into the if
381433a to
01d1f5b
Compare
| } | ||
|
|
||
| TEST_F(LibRadosIoPP, CrcZeroWrite) { | ||
| char buf[128]; |
There was a problem hiding this comment.
crimson lacks the fix, so I expect this test to fail there. Might we worth adding at a comment at least.
There was a problem hiding this comment.
It's a simple fix -- can we just go ahead and fix it in crimson rather than letting the test fail?
There was a problem hiding this comment.
actually, i tested it on crimson and it pass
There was a problem hiding this comment.
This test fails sporadically with ENOTSUP:
$ ./ceph_test_rados_api_io_pp
# ...
/mnt/data/workspace/ceph_linux/src/test/librados/io_cxx.cc:888: Failure
Expected equality of these values:
0
ioctx.write("foo", bl, 0, sizeof(buf))
Which is: -95
[ FAILED ] LibRadosIoECPP.CrcZeroWrite (767 ms)
There was a problem hiding this comment.
I'm thinking about skipping it on Windows for the time being: ceph/ceph-win32-tests#19
| bufferlist bl; | ||
|
|
||
| ASSERT_EQ(0, ioctx.write("foo", bl, 0, 0)); | ||
| ASSERT_EQ(0, ioctx.write("foo", bl, 0, sizeof(buf))); |
There was a problem hiding this comment.
Hmm, the only usage of the 0xcc-filled buf seems to be the sizeof(). Is that truly the intention?
There was a problem hiding this comment.
@rzarzynski yes, i had in mind to check that we didn't had any override above zero, but i didn't added it. i can remove the memset.
01d1f5b to
e83ce0c
Compare
| truncate_update_size_and_usage(ctx->delta_stats, | ||
| oi, | ||
| op.extent.truncate_size); | ||
| //truncate modify oi.size, need clear old data_digest and DIGEST flag |
There was a problem hiding this comment.
In this case the logic is a little bit different than in CEPH_OSD_OP_TRUNCATE:
if (op.extent.truncate_seq) {
ceph_assert(op.extent.offset == op.extent.truncate_size); // <------------------
if (op.extent.truncate_seq <= oi.truncate_seq) {
dout(10) << " truncate seq " << op.extent.truncate_seq << " <= current " << oi.truncate_seq
<< ", no-op" << dendl;
break; // old
}
dout(10) << " truncate seq " << op.extent.truncate_seq << " > current " << oi.truncate_seq
<< ", truncating" << dendl;
oi.truncate_seq = op.extent.truncate_seq;
oi.truncate_size = op.extent.truncate_size;
}
// ...
if (op.extent.offset != oi.size) { // <----------
truncate_update_size_and_usage(ctx->delta_stats,
oi,
op.extent.offset);
}
ctx->delta_stats.num_wr++;
// do no set exists, or we will break above DELETE -> TRUNCATE munging.
oi.clear_data_digest(); // <-------The difference is what happens in the op.extent.truncate_size == oi.size case: CEPH_OSD_OP_TRUNCATE clears the digest; CEPH_OSD_OP_WRITE – does not.
I think the reason is that the trunc-like write differs from regular trunc. The former puts truncate into the transaction when expanding the object; for shrinking (or keeping same) only nop + update of object_info happens – the data is untouched. Therefore preserving the digest for op.extent.truncate_size == oi.size should be fine.
if (op.extent.length == 0) {
if (op.extent.offset > oi.size) {
t->truncate(
soid, op.extent.offset);
truncate_update_size_and_usage(ctx->delta_stats, oi,
op.extent.offset);
} else {
t->nop(soid);
}
} else {
t->write(
soid, op.extent.offset, op.extent.length, osd_op.indata, op.flags);
}
src/osd/PrimaryLogPG.cc
Outdated
| soid, op.extent.offset); | ||
| truncate_update_size_and_usage(ctx->delta_stats, oi, | ||
| op.extent.offset); | ||
| oi.clear_data_digest(); |
|
This PR had some unit test problems. Ref: https://trello.com/c/SFxA3U2X/1976-wip-yuri3-testing-2024-03-25-0744-old-wip-yuri3-testing-2024-03-24-1519-old-wip-yuri3-testing-2024-03-12-0851 |
e83ce0c to
a6e427f
Compare
the unit-test with EC needed allow_ec_overwrites, but since those test unit doesn't refresh the pool (delete\create) for each test, we can't remove the overwrites setting which causes the other unittest to fail. |
|
jenkins test windows |
|
@yuriw: may we retake this one into QA? |
|
jenkins test windows |
|
QA in progress: https://tracker.ceph.com/issues/65349. |
|
jenkins test windows |
|
@NitzanMordhai pls merge when all checks passed |
|
jenkins test windows |
1 similar comment
|
jenkins test windows |
|
jenkins retest this please |
|
jenkins test api |
|
jenkins test windows |
|
jenkins test api |
1 similar comment
|
jenkins test api |
…e and forget to clear data_digest when write before truncate, need trim length, if truncate is to 0, write is [0~128k], write change to [0~0], do nothing, oi.size is 0, x1 = set_data_digest(crc32(-1)). write is [128k~128k], write change to [128k~0], truncate oi.size to offset 128k, x2 = set_data_digest(crc32(x1)). write is [256k~128k], write change to [256k~0], truncate oi.size to offset 256k, x3 = set_data_digest(crc32(x2)). ... write is [4063232~128k], write change to [4063232~0], truncate oi.size to offset 4063232, xn = set_data_digest(crs32(xn-1)) Now, we can see oi.size is 4063232, and data_digest is 0xffffffff, because thelength of in_data of crc is 0 every time. when read verify crc will reply EIO. (EC pool). so, when truncate in write, need clear data_digest and DIGEST flag, when write before truncate, need to trim length, when offset over than oi.size, don't truncate oi.size to offset. Fixes: https://tracker.ceph.com/issues/53240 Signed-off-by: jiawd <jiawendong@xtaotech.com>
Fixes: https://tracker.ceph.com/issues/53240 Signed-off-by: jiawd <jiawendong@xtaotech.com>
Fixes: https://tracker.ceph.com/issues/53240 Signed-off-by: jiawd <jiawendong@xtaotech.com>
Add test for zero crc check failed. Fixes: https://tracker.ceph.com/issues/53240 Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
1. adding allow_ec_overwrite option for cxx test 2. adding new test for crc failuer check with append zero length Fixes: https://tracker.ceph.com/issues/53240 Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
415a1b8 to
06e4c6f
Compare
|
I think the Windows failure was actually legit, one of the tests fails randomly with The |
This is continue of PR:
#43902
the commits cherry-picked and i added some tests that recreate the issue without the fix
when write before truncate, need trim length, if truncate is to 0,
write is [0128k], write change to [00], do nothing, oi.size is 0, x1 = set_data_digest(crc32(-1)).
write is [128k128k], write change to [128k0], truncate oi.size to offset 128k, x2 = set_data_digest(crc32(x1)).
write is [256k128k], write change to [256k0], truncate oi.size to offset 256k, x3 = set_data_digest(crc32(x2)).
...
write is [4063232128k], write change to [40632320], truncate oi.size to offset 4063232, xn = set_data_digest(crs32(xn-1))
Now, we can see oi.size is 4063232, and data_digest is 0xffffffff, because thelength of in_data of crc is 0 every time.
when read verify crc will reply EIO. (EC pool).
so, when truncate in write, need clear data_digest and DIGEST flag,
when write before truncate, need to trim length, when offset over than oi.size, don't truncate oi.size to offset.
Fixes: https://tracker.ceph.com/issues/53240
Signed-off-by: jiawd jiawendong@xtaotech.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e