Skip to content

osd: full-object read CRC mismatch due to 'truncate' modifying oi.size w/o clearing 'data_digest'#55008

Merged
NitzanMordhai merged 5 commits intoceph:mainfrom
NitzanMordhai:wip-nitzan-clear-data-digest-for-crc-check
May 19, 2024
Merged

osd: full-object read CRC mismatch due to 'truncate' modifying oi.size w/o clearing 'data_digest'#55008
NitzanMordhai merged 5 commits intoceph:mainfrom
NitzanMordhai:wip-nitzan-clear-data-digest-for-crc-check

Conversation

@NitzanMordhai
Copy link
Contributor

@NitzanMordhai NitzanMordhai commented Dec 26, 2023

This is continue of PR:
#43902
the commits cherry-picked and i added some tests that recreate the issue without the fix

when write before truncate, need trim length, if truncate is to 0,
write is [0128k], write change to [00], do nothing, oi.size is 0, x1 = set_data_digest(crc32(-1)).
write is [128k128k], write change to [128k0], truncate oi.size to offset 128k, x2 = set_data_digest(crc32(x1)).
write is [256k128k], write change to [256k0], truncate oi.size to offset 256k, x3 = set_data_digest(crc32(x2)).
...
write is [4063232128k], write change to [40632320], truncate oi.size to offset 4063232, xn = set_data_digest(crs32(xn-1))
Now, we can see oi.size is 4063232, and data_digest is 0xffffffff, because thelength of in_data of crc is 0 every time.
when read verify crc will reply EIO. (EC pool).

so, when truncate in write, need clear data_digest and DIGEST flag,
when write before truncate, need to trim length, when offset over than oi.size, don't truncate oi.size to offset.

Fixes: https://tracker.ceph.com/issues/53240
Signed-off-by: jiawd jiawendong@xtaotech.com

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@NitzanMordhai NitzanMordhai requested a review from a team as a code owner December 26, 2023 12:54
@NitzanMordhai NitzanMordhai requested review from idryomov and removed request for a team December 26, 2023 12:54
@NitzanMordhai NitzanMordhai requested review from a team and rzarzynski December 26, 2023 12:55
@ronen-fr
Copy link
Contributor

@NitzanMordhai - commit test/pybind: check crc fail after append zero
isn't signed

@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-clear-data-digest-for-crc-check branch 2 times, most recently from e12c5c1 to 6c1aec3 Compare December 27, 2023 05:59
op.extent.truncate_size);
}
//truncate modify oi.size, need clear old data_digest and DIGEST flag
oi.clear_data_digest();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this should be in the immediately above if block? If so, perhaps we should just move it into truncate_update_size_and_usage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@athanatos Jia, who wrote the original fix added that note: https://tracker.ceph.com/issues/53240#note-1
this is case 2, write before truncating

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, having read that it's still not clear to me why we clear_data_digest() here even if truncate_size == oi.size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@athanatos agree, going to move it into the if

@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-clear-data-digest-for-crc-check branch 2 times, most recently from 381433a to 01d1f5b Compare January 4, 2024 14:00
}

TEST_F(LibRadosIoPP, CrcZeroWrite) {
char buf[128];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crimson lacks the fix, so I expect this test to fail there. Might we worth adding at a comment at least.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a simple fix -- can we just go ahead and fix it in crimson rather than letting the test fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, i tested it on crimson and it pass

Copy link
Contributor

@petrutlucian94 petrutlucian94 May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test fails sporadically with ENOTSUP:

$ ./ceph_test_rados_api_io_pp
# ...
/mnt/data/workspace/ceph_linux/src/test/librados/io_cxx.cc:888: Failure
Expected equality of these values:
  0
  ioctx.write("foo", bl, 0, sizeof(buf))
    Which is: -95
[  FAILED  ] LibRadosIoECPP.CrcZeroWrite (767 ms)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about skipping it on Windows for the time being: ceph/ceph-win32-tests#19

bufferlist bl;

ASSERT_EQ(0, ioctx.write("foo", bl, 0, 0));
ASSERT_EQ(0, ioctx.write("foo", bl, 0, sizeof(buf)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the only usage of the 0xcc-filled buf seems to be the sizeof(). Is that truly the intention?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rzarzynski yes, i had in mind to check that we didn't had any override above zero, but i didn't added it. i can remove the memset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rzarzynski please review

@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-clear-data-digest-for-crc-check branch from 01d1f5b to e83ce0c Compare February 7, 2024 06:22
truncate_update_size_and_usage(ctx->delta_stats,
oi,
op.extent.truncate_size);
//truncate modify oi.size, need clear old data_digest and DIGEST flag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the logic is a little bit different than in CEPH_OSD_OP_TRUNCATE:

          if (op.extent.truncate_seq) {
           ceph_assert(op.extent.offset == op.extent.truncate_size);  // <------------------
            if (op.extent.truncate_seq <= oi.truncate_seq) {
              dout(10) << " truncate seq " << op.extent.truncate_seq << " <= current " << oi.truncate_seq
                       << ", no-op" << dendl;
              break; // old
            }
            dout(10) << " truncate seq " << op.extent.truncate_seq << " > current " << oi.truncate_seq
                     << ", truncating" << dendl;
            oi.truncate_seq = op.extent.truncate_seq;
            oi.truncate_size = op.extent.truncate_size;
          }
          // ...
         if (op.extent.offset != oi.size) { // <----------
            truncate_update_size_and_usage(ctx->delta_stats,
                                           oi,
                                           op.extent.offset);
          }
          ctx->delta_stats.num_wr++;
          // do no set exists, or we will break above DELETE -> TRUNCATE munging.
  
          oi.clear_data_digest(); // <-------

The difference is what happens in the op.extent.truncate_size == oi.size case: CEPH_OSD_OP_TRUNCATE clears the digest; CEPH_OSD_OP_WRITE – does not.

I think the reason is that the trunc-like write differs from regular trunc. The former puts truncate into the transaction when expanding the object; for shrinking (or keeping same) only nop + update of object_info happens – the data is untouched. Therefore preserving the digest for op.extent.truncate_size == oi.size should be fine.

          if (op.extent.length == 0) {
            if (op.extent.offset > oi.size) {
              t->truncate(
                soid, op.extent.offset);
              truncate_update_size_and_usage(ctx->delta_stats, oi,
                                             op.extent.offset);
            } else {
              t->nop(soid);
            }
          } else {
            t->write(
              soid, op.extent.offset, op.extent.length, osd_op.indata, op.flags);
          }

soid, op.extent.offset);
truncate_update_size_and_usage(ctx->delta_stats, oi,
op.extent.offset);
oi.clear_data_digest();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK.

@ljflores
Copy link
Member

@ljflores ljflores added TESTED and removed needs-qa labels Mar 25, 2024
@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-clear-data-digest-for-crc-check branch from e83ce0c to a6e427f Compare April 1, 2024 04:53
@NitzanMordhai
Copy link
Contributor Author

This PR had some unit test problems. Ref: https://trello.com/c/SFxA3U2X/1976-wip-yuri3-testing-2024-03-25-0744-old-wip-yuri3-testing-2024-03-24-1519-old-wip-yuri3-testing-2024-03-12-0851

the unit-test with EC needed allow_ec_overwrites, but since those test unit doesn't refresh the pool (delete\create) for each test, we can't remove the overwrites setting which causes the other unittest to fail.

@ronen-fr ronen-fr changed the title osd: full-object read crc is mismatch, because truncate modify oi.size and forget to clear data_digest osd: full-object read CRC mismatch due to 'truncate' modifying oi.size w/o clearing 'data_digest' Apr 1, 2024
@rzarzynski
Copy link
Contributor

jenkins test windows

@rzarzynski
Copy link
Contributor

@yuriw: may we retake this one into QA?

@rzarzynski
Copy link
Contributor

jenkins test windows

@rzarzynski
Copy link
Contributor

QA in progress: https://tracker.ceph.com/issues/65349.

@yuriw
Copy link
Contributor

yuriw commented May 6, 2024

jenkins test windows

@yuriw
Copy link
Contributor

yuriw commented May 6, 2024

@yuriw
Copy link
Contributor

yuriw commented May 6, 2024

@NitzanMordhai pls merge when all checks passed

ref: https://tracker.ceph.com/issues/65349

@NitzanMordhai
Copy link
Contributor Author

jenkins test windows

1 similar comment
@NitzanMordhai
Copy link
Contributor Author

jenkins test windows

@NitzanMordhai
Copy link
Contributor Author

jenkins retest this please

@ljflores
Copy link
Member

jenkins test api

@ljflores
Copy link
Member

jenkins test windows

@NitzanMordhai
Copy link
Contributor Author

jenkins test api

1 similar comment
@NitzanMordhai
Copy link
Contributor Author

jenkins test api

jiawd and others added 5 commits May 19, 2024 05:03
…e and forget to clear data_digest

when write before truncate, need trim length, if truncate is to 0,
write is [0~128k], write change to [0~0], do nothing, oi.size is 0, x1 = set_data_digest(crc32(-1)).
write is [128k~128k], write change to [128k~0], truncate oi.size to offset 128k, x2 = set_data_digest(crc32(x1)).
write is [256k~128k], write change to [256k~0], truncate oi.size to offset 256k, x3 = set_data_digest(crc32(x2)).
...
write is [4063232~128k], write change to [4063232~0], truncate oi.size to offset 4063232, xn = set_data_digest(crs32(xn-1))
Now, we can see oi.size is 4063232, and data_digest is 0xffffffff, because thelength of in_data of crc is 0 every time.
when read verify crc will reply EIO. (EC pool).

so, when truncate in write, need clear data_digest and DIGEST flag,
when write before truncate, need to trim length, when offset over than oi.size, don't truncate oi.size to offset.

Fixes: https://tracker.ceph.com/issues/53240
Signed-off-by: jiawd <jiawendong@xtaotech.com>
Fixes: https://tracker.ceph.com/issues/53240
Signed-off-by: jiawd <jiawendong@xtaotech.com>
Add test for zero crc check failed.

Fixes: https://tracker.ceph.com/issues/53240
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
1. adding allow_ec_overwrite option for cxx test
2. adding new test for crc failuer check with append zero length

Fixes: https://tracker.ceph.com/issues/53240
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
@petrutlucian94
Copy link
Contributor

I think the Windows failure was actually legit, one of the tests fails randomly with ENOTSUP.

The make check job doesn't actually run this test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants