Project

General

Profile

Actions

Bug #66226

closed

rados: ceph_test_rados_api_io_pp 4 ec tests fail ( CrcZeroWrite, CmpExtPP, CmpExtDNEPP, CmpExtMismatchPP)

Added by Lucian Petrut almost 2 years ago. Updated 5 months ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
-
Target version:
-
% Done:

100%

Source:
Backport:
quincy,reef,squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Tags (freeform):
backport_processed
Fixed In:
v19.3.0-2724-g119096189b
Released In:
v20.2.0~2712
Upkeep Timestamp:
2025-11-01T01:33:35+00:00

Description

One of the recently introduced librados tests fails sporadically with ENOTSUP:

$ ./ceph_test_rados_api_io_pp
# ...
/mnt/data/workspace/ceph_linux/src/test/librados/io_cxx.cc:888: Failure
Expected equality of these values:
  0
  ioctx.write("foo", bl, 0, sizeof(buf))
    Which is: -95
[  FAILED  ] LibRadosIoECPP.CrcZeroWrite (767 ms)

The test was introduced here: https://github.com/ceph/ceph/pull/55008


Related issues 3 (0 open3 closed)

Copied to RADOS - Backport #66531: reef: rados: ceph_test_rados_api_io_pp 4 ec tests fail ( CrcZeroWrite, CmpExtPP, CmpExtDNEPP, CmpExtMismatchPP)ResolvedNitzan MordechaiActions
Copied to RADOS - Backport #66532: quincy: rados: ceph_test_rados_api_io_pp 4 ec tests fail ( CrcZeroWrite, CmpExtPP, CmpExtDNEPP, CmpExtMismatchPP)RejectedSamuel JustActions
Copied to RADOS - Backport #66533: squid: rados: ceph_test_rados_api_io_pp 4 ec tests fail ( CrcZeroWrite, CmpExtPP, CmpExtDNEPP, CmpExtMismatchPP)ResolvedNitzan MordechaiActions
Actions #1

Updated by Ilya Dryomov almost 2 years ago

  • Project changed from rgw to RADOS
  • Subject changed from flaky librados test to flaky LibRadosIoECPP.CrcZeroWrite test
Actions #2

Updated by Samuel Just almost 2 years ago

2024-05-31T02:31:56.450 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-4022-gd1d833e5/rpm/el9/BUILD/ceph-19.0.0-4022-gd1d833e5/src/test/librados/io_cxx.cc:888: Failure
2024-05-31T02:31:56.450 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: Expected equality of these values:
2024-05-31T02:31:56.450 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp:   0
2024-05-31T02:31:56.450 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp:   ioctx.write("foo", bl, 0, sizeof(buf))
2024-05-31T02:31:56.450 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp:     Which is: -95
2024-05-31T02:31:56.450 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: [  FAILED  ] LibRadosIoECPP.CrcZeroWrite (804 ms)
2024-05-31T02:31:56.450 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: [ RUN      ] LibRadosIoECPP.XattrListPP
2024-05-31T02:31:56.450 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: [       OK ] LibRadosIoECPP.XattrListPP (71 ms)
2024-05-31T02:31:56.450 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: [ RUN      ] LibRadosIoECPP.CmpExtPP
2024-05-31T02:31:56.451 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-4022-gd1d833e5/rpm/el9/BUILD/ceph-19.0.0-4022-gd1d833e5/src/test/librados/testcase_cxx.cc:392: Failure
2024-05-31T02:31:56.451 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: Value of: req
2024-05-31T02:31:56.451 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp:   Actual: false
2024-05-31T02:31:56.451 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: Expected: true
2024-05-31T02:31:56.451 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: [  FAILED  ] LibRadosIoECPP.CmpExtPP (0 ms)
2024-05-31T02:31:56.451 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: [ RUN      ] LibRadosIoECPP.CmpExtDNEPP
2024-05-31T02:31:56.451 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-4022-gd1d833e5/rpm/el9/BUILD/ceph-19.0.0-4022-gd1d833e5/src/test/librados/testcase_cxx.cc:392: Failure
2024-05-31T02:31:56.451 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: Value of: req
2024-05-31T02:31:56.451 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp:   Actual: false
2024-05-31T02:31:56.451 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: Expected: true
2024-05-31T02:31:56.451 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: [  FAILED  ] LibRadosIoECPP.CmpExtDNEPP (0 ms)
2024-05-31T02:31:56.451 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: [ RUN      ] LibRadosIoECPP.CmpExtMismatchPP
2024-05-31T02:31:56.451 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.0.0-4022-gd1d833e5/rpm/el9/BUILD/ceph-19.0.0-4022-gd1d833e5/src/test/librados/testcase_cxx.cc:392: Failure
2024-05-31T02:31:56.451 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: Value of: req
2024-05-31T02:31:56.452 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp:   Actual: false
2024-05-31T02:31:56.452 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: Expected: true
2024-05-31T02:31:56.452 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: [  FAILED  ] LibRadosIoECPP.CmpExtMismatchPP (0 ms)
2024-05-31T02:31:56.452 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: [----------] 18 tests from LibRadosIoECPP (2106 ms total)
2024-05-31T02:31:56.452 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp:
2024-05-31T02:31:56.455 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: [----------] Global test environment tear-down
2024-05-31T02:31:56.455 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: [==========] 40 tests from 2 test suites ran. (10207 ms total)
2024-05-31T02:31:56.455 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: [  PASSED  ] 36 tests.
2024-05-31T02:31:56.455 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: [  FAILED  ] 4 tests, listed below:
2024-05-31T02:31:56.455 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: [  FAILED  ] LibRadosIoECPP.CrcZeroWrite
2024-05-31T02:31:56.455 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: [  FAILED  ] LibRadosIoECPP.CmpExtPP
2024-05-31T02:31:56.455 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: [  FAILED  ] LibRadosIoECPP.CmpExtDNEPP
2024-05-31T02:31:56.455 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp: [  FAILED  ] LibRadosIoECPP.CmpExtMismatchPP
2024-05-31T02:31:56.455 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp:
2024-05-31T02:31:56.455 INFO:tasks.workunit.client.0.smithi049.stdout:                api_io_pp:  4 FAILED TESTS

It's not just that test, there are 3 others. Moreover, it fails consistently even with vstart:

[ RUN      ] LibRadosIoECPP.CrcZeroWrite
/home/sam/git-checkouts/ceph-workspace/main/src/test/librados/io_cxx.cc:888: Failure
Expected equality of these values:
  0
  ioctx.write("foo", bl, 0, sizeof(buf))
    Which is: -95
[  FAILED  ] LibRadosIoECPP.CrcZeroWrite (70 ms)
[ RUN      ] LibRadosIoECPP.XattrListPP
[       OK ] LibRadosIoECPP.XattrListPP (114 ms)
[ RUN      ] LibRadosIoECPP.CmpExtPP
/home/sam/git-checkouts/ceph-workspace/main/src/test/librados/testcase_cxx.cc:392: Failure
Value of: req
  Actual: false
Expected: true
[  FAILED  ] LibRadosIoECPP.CmpExtPP (0 ms)
[ RUN      ] LibRadosIoECPP.CmpExtDNEPP
/home/sam/git-checkouts/ceph-workspace/main/src/test/librados/testcase_cxx.cc:392: Failure
Value of: req
  Actual: false
Expected: true
[  FAILED  ] LibRadosIoECPP.CmpExtDNEPP (0 ms)
[ RUN      ] LibRadosIoECPP.CmpExtMismatchPP
/home/sam/git-checkouts/ceph-workspace/main/src/test/librados/testcase_cxx.cc:392: Failure
Value of: req
  Actual: false
Expected: true
[  FAILED  ] LibRadosIoECPP.CmpExtMismatchPP (0 ms)
[----------] 18 tests from LibRadosIoECPP (2243 ms total)
Actions #3

Updated by Samuel Just almost 2 years ago

  • Subject changed from flaky LibRadosIoECPP.CrcZeroWrite test to rados: ceph_test_rados_api_io_pp 4 ec tests fail ( CrcZeroWrite, CmpExtPP, CmpExtDNEPP, CmpExtMismatchPP)
  • Assignee set to Samuel Just
  • Priority changed from Normal to Immediate
Actions #4

Updated by Samuel Just almost 2 years ago · Edited

The three not added by https://github.com/ceph/ceph/pull/55008 (CmpExtPP, CmpExtDNEPP, CmpExtMismatchPP) start working again if we comment out CrcZeroWrite. The fix is the recreate the pool in TearDown, not at the end of the test. Still working out why CrcZeroWrite is failing.

Actions #5

Updated by Samuel Just almost 2 years ago

It seems to succeed if run alone.

Actions #6

Updated by Samuel Just almost 2 years ago

It's the OSDMap propogation after set_allow_ec_overwrites. If the primary gets the map with the pool info enabling ec overwrites before the IO arrives, it succeeds. The pool info update isn't an interval change so the primary doesn't try to catch up to the epoch on the client's submitted IO.

Actions #7

Updated by Samuel Just almost 2 years ago · Edited

Updated set_allow_ec_overwrites to repeatedly attempt an overwrite until it works, testing (wip-sjust-balanced-read-testing-2024-05-31)

Actions #8

Updated by Radoslaw Zarzynski almost 2 years ago

Note from bug scrub: any progress here?

Actions #9

Updated by Samuel Just almost 2 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 57856

Yep, waiting for review at https://github.com/ceph/ceph/pull/57856 -- forgot to update the tracker.

Actions #10

Updated by Laura Flores almost 2 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to quincy,reef,squid

@Samuel Just based on https://tracker.ceph.com/issues/53240, Radek and I discussed in bug scrub and it looks like this should be backported too.

Actions #11

Updated by Upkeep Bot almost 2 years ago

  • Copied to Backport #66531: reef: rados: ceph_test_rados_api_io_pp 4 ec tests fail ( CrcZeroWrite, CmpExtPP, CmpExtDNEPP, CmpExtMismatchPP) added
Actions #12

Updated by Upkeep Bot almost 2 years ago

  • Copied to Backport #66532: quincy: rados: ceph_test_rados_api_io_pp 4 ec tests fail ( CrcZeroWrite, CmpExtPP, CmpExtDNEPP, CmpExtMismatchPP) added
Actions #13

Updated by Upkeep Bot almost 2 years ago

  • Copied to Backport #66533: squid: rados: ceph_test_rados_api_io_pp 4 ec tests fail ( CrcZeroWrite, CmpExtPP, CmpExtDNEPP, CmpExtMismatchPP) added
Actions #15

Updated by Sridhar Seshasayee over 1 year ago

Found in Reef.
/a/yuriw-2024-11-21_15:35:28-rados-wip-yuri5-testing-2024-11-20-0816-reef-distro-default-smithi/8003602

Actions #16

Updated by Upkeep Bot over 1 year ago

  • Tags (freeform) set to backport_processed
Actions #17

Updated by Laura Flores about 1 year ago

/a/yuriw-2025-01-14_16:14:11-rados-wip-yuri6-testing-2025-01-13-1111-reef-distro-default-smithi/8076283

Actions #18

Updated by Konstantin Shalygin about 1 year ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 0 to 100
Actions #19

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to 119096189b9aa0f2cae728e642c65ff513425701
  • Fixed In set to v19.3.0-2724-g119096189b9
  • Upkeep Timestamp set to 2025-07-11T08:43:27+00:00
Actions #20

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v19.3.0-2724-g119096189b9 to v19.3.0-2724-g119096189b
  • Upkeep Timestamp changed from 2025-07-11T08:43:27+00:00 to 2025-07-14T22:43:39+00:00
Actions #21

Updated by Upkeep Bot 5 months ago

  • Released In set to v20.2.0~2712
  • Upkeep Timestamp changed from 2025-07-14T22:43:39+00:00 to 2025-11-01T01:33:35+00:00
Actions

Also available in: Atom PDF