Project

General

Profile

Actions

Bug #68628

open

./src/test/osd/RadosModel.h: 1077: ceph_abort_msg("racing read got wrong version")

Added by Adam Kupczyk over 1 year ago. Updated 5 months ago.

Status:
Pending Backport
Priority:
Normal
Category:
Tiering
Target version:
-
% Done:

0%

Source:
Backport:
squid,reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Tags (freeform):
backport_processed
Fixed In:
v20.0.0-188-g5c9094d9f6
Released In:
v20.2.0~915
Upkeep Timestamp:
2025-11-01T01:01:51+00:00

Description

Observed in:
/yuriw-2024-10-16_17:46:23-rados-aclamk-testing-ganymede-2024-10-15-1410-squid-distro-default-smithi/7951928/

2024-10-16T23:29:14.406 INFO:tasks.rados.rados.0.smithi186.stdout:4612:  oid 4612 version 3988 is already newer than 3987
2024-10-16T23:29:14.406 INFO:tasks.rados.rados.0.smithi186.stdout:4612:  oid 4612 version 3988 is already newer than 3986
2024-10-16T23:29:14.406 INFO:tasks.rados.rados.0.smithi186.stdout:update_object_version oid 4612 v 3988 (ObjNum 4611 snap 0 seq_num 4611) dirty exists
2024-10-16T23:29:14.407 INFO:tasks.rados.rados.0.smithi186.stderr:Error: racing read on 4612 returned version 3985 rather than version 3988
2024-10-16T23:29:14.407 INFO:tasks.rados.rados.0.smithi186.stderr:./src/test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7ff843fff640 time 2024-10-16T23:29:14.402658+0000
2024-10-16T23:29:14.407 INFO:tasks.rados.rados.0.smithi186.stderr:./src/test/osd/RadosModel.h: 1077: ceph_abort_msg("racing read got wrong version")
2024-10-16T23:29:14.407 INFO:tasks.rados.rados.0.smithi186.stderr: ceph version 19.2.0-496-gec70565b (ec70565bc50dbb1dc4327c403b4c8b5e721db8a9) squid (stable)
2024-10-16T23:29:14.407 INFO:tasks.rados.rados.0.smithi186.stderr: 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xd3) [0x7ff854e2e33e]
2024-10-16T23:29:14.407 INFO:tasks.rados.rados.0.smithi186.stderr: 2: ceph_test_rados(+0x22b01) [0x561ef52c6b01]
2024-10-16T23:29:14.407 INFO:tasks.rados.rados.0.smithi186.stderr: 3: (write_callback(void*, void*)+0x1e) [0x561ef52dfd2e]
2024-10-16T23:29:14.407 INFO:tasks.rados.rados.0.smithi186.stderr: 4: /lib/librados.so.2(+0xb1029) [0x7ff8556ec029]
2024-10-16T23:29:14.407 INFO:tasks.rados.rados.0.smithi186.stderr: 5: /lib/librados.so.2(+0xb20be) [0x7ff8556ed0be]
2024-10-16T23:29:14.407 INFO:tasks.rados.rados.0.smithi186.stderr: 6: /lib/librados.so.2(+0xb2313) [0x7ff8556ed313]
2024-10-16T23:29:14.408 INFO:tasks.rados.rados.0.smithi186.stderr: 7: /lib/librados.so.2(+0x11dbce) [0x7ff855758bce]
2024-10-16T23:29:14.408 INFO:tasks.rados.rados.0.smithi186.stderr: 8: /lib/librados.so.2(+0xc7b8f) [0x7ff855702b8f]
2024-10-16T23:29:14.408 INFO:tasks.rados.rados.0.smithi186.stderr: 9: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc2b3) [0x7ff85471a2b3]
2024-10-16T23:29:14.408 INFO:tasks.rados.rados.0.smithi186.stderr: 10: /lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7ff854488b43]
2024-10-16T23:29:14.408 INFO:tasks.rados.rados.0.smithi186.stderr: 11: /lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7ff85451aa00]


Related issues 3 (3 open0 closed)

Related to RADOS - Bug #56028: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) in src/test/osd/RadosModel.hFix Under ReviewLaura Flores

Actions
Copied to RADOS - Backport #70382: reef: ./src/test/osd/RadosModel.h: 1077: ceph_abort_msg("racing read got wrong version")In ProgressNitzan MordechaiActions
Copied to RADOS - Backport #70383: squid: ./src/test/osd/RadosModel.h: 1077: ceph_abort_msg("racing read got wrong version")In ProgressNitzan MordechaiActions
Actions #1

Updated by Laura Flores over 1 year ago

  • Related to Bug #56028: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) in src/test/osd/RadosModel.h added
Actions #2

Updated by Laura Flores over 1 year ago

Might be under cache tiering; possibly a candidate for removal. Will take a look.

Actions #3

Updated by Laura Flores over 1 year ago

  • Assignee set to Laura Flores
Actions #4

Updated by Laura Flores over 1 year ago

Bump up

Actions #5

Updated by Laura Flores over 1 year ago

Bump up

Actions #6

Updated by Laura Flores over 1 year ago

Bump up

Actions #7

Updated by Laura Flores about 1 year ago

  • Status changed from New to In Progress
Actions #8

Updated by Radoslaw Zarzynski about 1 year ago

scrub note: bump up.

Actions #9

Updated by Radoslaw Zarzynski about 1 year ago

  • Assignee changed from Laura Flores to Nitzan Mordechai

Hi Nitzan! Would you mind taking a look?

Actions #10

Updated by Nitzan Mordechai about 1 year ago

the error: Error: racing read on 4612 returned version 3985 rather than version 3988

greping only 4612:

nmordech@folio02  /a/yuriw-2024-10-16_17:46:23-rados-aclamk-testing-ganymede-2024-10-15-1410-squid-distro-default-smithi/7951928  zgrep "4612:" teuthology.log.gz
2024-10-16T23:29:14.049 INFO:tasks.rados.rados.0.smithi186.stdout:4612: write initial oid 4612
2024-10-16T23:29:14.049 INFO:tasks.rados.rados.0.smithi186.stdout:4612:  seq_num 4611 ranges {697077=683057,1840589=777280,3057067=440010}
2024-10-16T23:29:14.056 INFO:tasks.rados.rados.0.smithi186.stdout:4612:  writing smithi18628385-4612 from 697077 to 1380134 tid 1
2024-10-16T23:29:14.063 INFO:tasks.rados.rados.0.smithi186.stdout:4612:  writing smithi18628385-4612 from 1840589 to 2617869 tid 2
2024-10-16T23:29:14.068 INFO:tasks.rados.rados.0.smithi186.stdout:4612:  writing smithi18628385-4612 from 3057067 to 3497077 tid 3
2024-10-16T23:29:14.189 INFO:tasks.rados.rados.0.smithi186.stdout:4612:  finishing write tid 1 to smithi18628385-4612
2024-10-16T23:29:14.296 INFO:tasks.rados.rados.0.smithi186.stdout:4612:  finishing write tid 2 to smithi18628385-4612
2024-10-16T23:29:14.296 INFO:tasks.rados.rados.0.smithi186.stdout:4612:  finishing write tid 3 to smithi18628385-4612
2024-10-16T23:29:14.296 INFO:tasks.rados.rados.0.smithi186.stdout:4612:  finishing write tid 4 to smithi18628385-4612
2024-10-16T23:29:14.406 INFO:tasks.rados.rados.0.smithi186.stdout:4612:  finishing write tid 5 to smithi18628385-4612
2024-10-16T23:29:14.406 INFO:tasks.rados.rados.0.smithi186.stdout:4612:  oid 4612 updating version 0 to 3988
2024-10-16T23:29:14.406 INFO:tasks.rados.rados.0.smithi186.stdout:4612:  oid 4612 version 3988 is already newer than 3985
2024-10-16T23:29:14.406 INFO:tasks.rados.rados.0.smithi186.stdout:4612:  oid 4612 version 3988 is already newer than 3987
2024-10-16T23:29:14.406 INFO:tasks.rados.rados.0.smithi186.stdout:4612:  oid 4612 version 3988 is already newer than 3986

looks like the version did updated. but the check against completion failed.

if (rcompletion->get_version64() != version) {
    std::cerr << "Error: racing read on " << oid << " returned version " 
          << rcompletion->get_version64() << " rather than version " 
          << version << std::endl;
    ceph_abort_msg("racing read got wrong version");
      }

it looks like the issue is that osd_op_reply return latest version and then return with earlier version which cause the tire code use old version.

Actions #11

Updated by Nitzan Mordechai about 1 year ago

i was able to recreate the race on tier pools only, tagging it as tier issue

Actions #12

Updated by Nitzan Mordechai about 1 year ago

  • Category set to Tiering
Actions #13

Updated by Laura Flores about 1 year ago

Can we consider removing/disabling the tiering aspects of this test?

Actions #14

Updated by Nitzan Mordechai about 1 year ago

@Laura Flores @Radoslaw Zarzynski
if the tier is no longer supported, I don't see why we need to keep it..
it's part of:
rados/singleton/all/thrash_cache_writeback_proxy_none.yaml

maybe its also good idea to get rid of the others yaml files?
/home/nmordech/ceph/qa/suites/rados/singleton/all/thrash_cache_writeback_proxy_none.yaml
31,9: - sudo ceph osd tier add base cache

/home/nmordech/ceph/qa/suites/rados/thrash/workloads/cache-agent-big.yaml
19,9: - sudo ceph osd tier add base cache

/home/nmordech/ceph/qa/suites/rados/thrash/workloads/cache-agent-small.yaml
17,9: - sudo ceph osd tier add base cache

/home/nmordech/ceph/qa/suites/rados/thrash/workloads/cache-pool-snaps-readproxy.yaml
17,9: - sudo ceph osd tier add base cache

/home/nmordech/ceph/qa/suites/rados/thrash/workloads/cache-pool-snaps.yaml
17,9: - sudo ceph osd tier add base cache

/home/nmordech/ceph/qa/suites/rados/thrash/workloads/cache-snaps.yaml
17,9: - sudo ceph osd tier add base cache

/home/nmordech/ceph/qa/suites/rados/thrash/workloads/cache.yaml
17,9: - sudo ceph osd tier add base cache

/home/nmordech/ceph/qa/suites/rados/thrash-old-clients/workloads/cache-snaps.yaml
11,9: - sudo ceph osd tier add base cache

/home/nmordech/ceph/qa/suites/smoke/basic/tasks/test/rados_cache_snaps.yaml
25,7: - sudo ceph osd tier add base cache

Actions #15

Updated by Radoslaw Zarzynski about 1 year ago

Yes, the cache tiering is long (since reef) deprecated. I agree no need for those tests.

Actions #16

Updated by Nitzan Mordechai about 1 year ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 61934
Actions #17

Updated by Shraddha Agrawal about 1 year ago

/a/skanta-2025-02-27_06:37:29-rados-wip-bharath5-testing-2025-02-26-1218-reef-distro-default-smithi/8157431

Actions #18

Updated by Radoslaw Zarzynski about 1 year ago

scrub note: fix approved and picked up by QA.

Actions #19

Updated by Radoslaw Zarzynski about 1 year ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to squid,reef
Actions #20

Updated by Upkeep Bot about 1 year ago

  • Copied to Backport #70382: reef: ./src/test/osd/RadosModel.h: 1077: ceph_abort_msg("racing read got wrong version") added
Actions #21

Updated by Upkeep Bot about 1 year ago

  • Copied to Backport #70383: squid: ./src/test/osd/RadosModel.h: 1077: ceph_abort_msg("racing read got wrong version") added
Actions #22

Updated by Upkeep Bot about 1 year ago

  • Tags (freeform) set to backport_processed
Actions #23

Updated by Upkeep Bot 9 months ago

  • Merge Commit set to 5c9094d9f65cb94e9efe13b1a41666f602ad0688
  • Fixed In set to v20.0.0-188-g5c9094d9f65
  • Upkeep Timestamp set to 2025-07-08T18:34:35+00:00
Actions #24

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v20.0.0-188-g5c9094d9f65 to v20.0.0-188-g5c9094d9f65c
  • Upkeep Timestamp changed from 2025-07-08T18:34:35+00:00 to 2025-07-14T15:44:39+00:00
Actions #25

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v20.0.0-188-g5c9094d9f65c to v20.0.0-188-g5c9094d9f6
  • Upkeep Timestamp changed from 2025-07-14T15:44:39+00:00 to 2025-07-14T21:09:21+00:00
Actions #26

Updated by Upkeep Bot 5 months ago

  • Released In set to v20.2.0~915
  • Upkeep Timestamp changed from 2025-07-14T21:09:21+00:00 to 2025-11-01T01:01:51+00:00
Actions

Also available in: Atom PDF