Bug #68628
open./src/test/osd/RadosModel.h: 1077: ceph_abort_msg("racing read got wrong version")
0%
Description
Observed in:
/yuriw-2024-10-16_17:46:23-rados-aclamk-testing-ganymede-2024-10-15-1410-squid-distro-default-smithi/7951928/
2024-10-16T23:29:14.406 INFO:tasks.rados.rados.0.smithi186.stdout:4612: oid 4612 version 3988 is already newer than 3987
2024-10-16T23:29:14.406 INFO:tasks.rados.rados.0.smithi186.stdout:4612: oid 4612 version 3988 is already newer than 3986
2024-10-16T23:29:14.406 INFO:tasks.rados.rados.0.smithi186.stdout:update_object_version oid 4612 v 3988 (ObjNum 4611 snap 0 seq_num 4611) dirty exists
2024-10-16T23:29:14.407 INFO:tasks.rados.rados.0.smithi186.stderr:Error: racing read on 4612 returned version 3985 rather than version 3988
2024-10-16T23:29:14.407 INFO:tasks.rados.rados.0.smithi186.stderr:./src/test/osd/RadosModel.h: In function 'virtual void WriteOp::_finish(TestOp::CallbackInfo*)' thread 7ff843fff640 time 2024-10-16T23:29:14.402658+0000
2024-10-16T23:29:14.407 INFO:tasks.rados.rados.0.smithi186.stderr:./src/test/osd/RadosModel.h: 1077: ceph_abort_msg("racing read got wrong version")
2024-10-16T23:29:14.407 INFO:tasks.rados.rados.0.smithi186.stderr: ceph version 19.2.0-496-gec70565b (ec70565bc50dbb1dc4327c403b4c8b5e721db8a9) squid (stable)
2024-10-16T23:29:14.407 INFO:tasks.rados.rados.0.smithi186.stderr: 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xd3) [0x7ff854e2e33e]
2024-10-16T23:29:14.407 INFO:tasks.rados.rados.0.smithi186.stderr: 2: ceph_test_rados(+0x22b01) [0x561ef52c6b01]
2024-10-16T23:29:14.407 INFO:tasks.rados.rados.0.smithi186.stderr: 3: (write_callback(void*, void*)+0x1e) [0x561ef52dfd2e]
2024-10-16T23:29:14.407 INFO:tasks.rados.rados.0.smithi186.stderr: 4: /lib/librados.so.2(+0xb1029) [0x7ff8556ec029]
2024-10-16T23:29:14.407 INFO:tasks.rados.rados.0.smithi186.stderr: 5: /lib/librados.so.2(+0xb20be) [0x7ff8556ed0be]
2024-10-16T23:29:14.407 INFO:tasks.rados.rados.0.smithi186.stderr: 6: /lib/librados.so.2(+0xb2313) [0x7ff8556ed313]
2024-10-16T23:29:14.408 INFO:tasks.rados.rados.0.smithi186.stderr: 7: /lib/librados.so.2(+0x11dbce) [0x7ff855758bce]
2024-10-16T23:29:14.408 INFO:tasks.rados.rados.0.smithi186.stderr: 8: /lib/librados.so.2(+0xc7b8f) [0x7ff855702b8f]
2024-10-16T23:29:14.408 INFO:tasks.rados.rados.0.smithi186.stderr: 9: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc2b3) [0x7ff85471a2b3]
2024-10-16T23:29:14.408 INFO:tasks.rados.rados.0.smithi186.stderr: 10: /lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7ff854488b43]
2024-10-16T23:29:14.408 INFO:tasks.rados.rados.0.smithi186.stderr: 11: /lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7ff85451aa00]
Updated by Laura Flores over 1 year ago
- Related to Bug #56028: thrash_cache_writeback_proxy_none: FAILED ceph_assert(version == old_value.version) in src/test/osd/RadosModel.h added
Updated by Laura Flores over 1 year ago
Might be under cache tiering; possibly a candidate for removal. Will take a look.
Updated by Radoslaw Zarzynski about 1 year ago
- Assignee changed from Laura Flores to Nitzan Mordechai
Hi Nitzan! Would you mind taking a look?
Updated by Nitzan Mordechai about 1 year ago
the error: Error: racing read on 4612 returned version 3985 rather than version 3988
greping only 4612:
nmordech@folio02 /a/yuriw-2024-10-16_17:46:23-rados-aclamk-testing-ganymede-2024-10-15-1410-squid-distro-default-smithi/7951928 zgrep "4612:" teuthology.log.gz
2024-10-16T23:29:14.049 INFO:tasks.rados.rados.0.smithi186.stdout:4612: write initial oid 4612
2024-10-16T23:29:14.049 INFO:tasks.rados.rados.0.smithi186.stdout:4612: seq_num 4611 ranges {697077=683057,1840589=777280,3057067=440010}
2024-10-16T23:29:14.056 INFO:tasks.rados.rados.0.smithi186.stdout:4612: writing smithi18628385-4612 from 697077 to 1380134 tid 1
2024-10-16T23:29:14.063 INFO:tasks.rados.rados.0.smithi186.stdout:4612: writing smithi18628385-4612 from 1840589 to 2617869 tid 2
2024-10-16T23:29:14.068 INFO:tasks.rados.rados.0.smithi186.stdout:4612: writing smithi18628385-4612 from 3057067 to 3497077 tid 3
2024-10-16T23:29:14.189 INFO:tasks.rados.rados.0.smithi186.stdout:4612: finishing write tid 1 to smithi18628385-4612
2024-10-16T23:29:14.296 INFO:tasks.rados.rados.0.smithi186.stdout:4612: finishing write tid 2 to smithi18628385-4612
2024-10-16T23:29:14.296 INFO:tasks.rados.rados.0.smithi186.stdout:4612: finishing write tid 3 to smithi18628385-4612
2024-10-16T23:29:14.296 INFO:tasks.rados.rados.0.smithi186.stdout:4612: finishing write tid 4 to smithi18628385-4612
2024-10-16T23:29:14.406 INFO:tasks.rados.rados.0.smithi186.stdout:4612: finishing write tid 5 to smithi18628385-4612
2024-10-16T23:29:14.406 INFO:tasks.rados.rados.0.smithi186.stdout:4612: oid 4612 updating version 0 to 3988
2024-10-16T23:29:14.406 INFO:tasks.rados.rados.0.smithi186.stdout:4612: oid 4612 version 3988 is already newer than 3985
2024-10-16T23:29:14.406 INFO:tasks.rados.rados.0.smithi186.stdout:4612: oid 4612 version 3988 is already newer than 3987
2024-10-16T23:29:14.406 INFO:tasks.rados.rados.0.smithi186.stdout:4612: oid 4612 version 3988 is already newer than 3986
looks like the version did updated. but the check against completion failed.
if (rcompletion->get_version64() != version) {
std::cerr << "Error: racing read on " << oid << " returned version "
<< rcompletion->get_version64() << " rather than version "
<< version << std::endl;
ceph_abort_msg("racing read got wrong version");
}
it looks like the issue is that osd_op_reply return latest version and then return with earlier version which cause the tire code use old version.
Updated by Nitzan Mordechai about 1 year ago
i was able to recreate the race on tier pools only, tagging it as tier issue
Updated by Laura Flores about 1 year ago
Can we consider removing/disabling the tiering aspects of this test?
Updated by Nitzan Mordechai about 1 year ago
@Laura Flores @Radoslaw Zarzynski
if the tier is no longer supported, I don't see why we need to keep it..
it's part of:
rados/singleton/all/thrash_cache_writeback_proxy_none.yaml
maybe its also good idea to get rid of the others yaml files?
/home/nmordech/ceph/qa/suites/rados/singleton/all/thrash_cache_writeback_proxy_none.yaml
31,9: - sudo ceph osd tier add base cache
/home/nmordech/ceph/qa/suites/rados/thrash/workloads/cache-agent-big.yaml
19,9: - sudo ceph osd tier add base cache
/home/nmordech/ceph/qa/suites/rados/thrash/workloads/cache-agent-small.yaml
17,9: - sudo ceph osd tier add base cache
/home/nmordech/ceph/qa/suites/rados/thrash/workloads/cache-pool-snaps-readproxy.yaml
17,9: - sudo ceph osd tier add base cache
/home/nmordech/ceph/qa/suites/rados/thrash/workloads/cache-pool-snaps.yaml
17,9: - sudo ceph osd tier add base cache
/home/nmordech/ceph/qa/suites/rados/thrash/workloads/cache-snaps.yaml
17,9: - sudo ceph osd tier add base cache
/home/nmordech/ceph/qa/suites/rados/thrash/workloads/cache.yaml
17,9: - sudo ceph osd tier add base cache
/home/nmordech/ceph/qa/suites/rados/thrash-old-clients/workloads/cache-snaps.yaml
11,9: - sudo ceph osd tier add base cache
/home/nmordech/ceph/qa/suites/smoke/basic/tasks/test/rados_cache_snaps.yaml
25,7: - sudo ceph osd tier add base cache
Updated by Radoslaw Zarzynski about 1 year ago
Yes, the cache tiering is long (since reef) deprecated. I agree no need for those tests.
Updated by Nitzan Mordechai about 1 year ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 61934
Updated by Shraddha Agrawal about 1 year ago
/a/skanta-2025-02-27_06:37:29-rados-wip-bharath5-testing-2025-02-26-1218-reef-distro-default-smithi/8157431
Updated by Radoslaw Zarzynski about 1 year ago
scrub note: fix approved and picked up by QA.
Updated by Radoslaw Zarzynski about 1 year ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to squid,reef
Updated by Upkeep Bot about 1 year ago
- Copied to Backport #70382: reef: ./src/test/osd/RadosModel.h: 1077: ceph_abort_msg("racing read got wrong version") added
Updated by Upkeep Bot about 1 year ago
- Copied to Backport #70383: squid: ./src/test/osd/RadosModel.h: 1077: ceph_abort_msg("racing read got wrong version") added
Updated by Upkeep Bot about 1 year ago
- Tags (freeform) set to backport_processed
Updated by Upkeep Bot 9 months ago
- Merge Commit set to 5c9094d9f65cb94e9efe13b1a41666f602ad0688
- Fixed In set to v20.0.0-188-g5c9094d9f65
- Upkeep Timestamp set to 2025-07-08T18:34:35+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v20.0.0-188-g5c9094d9f65 to v20.0.0-188-g5c9094d9f65c
- Upkeep Timestamp changed from 2025-07-08T18:34:35+00:00 to 2025-07-14T15:44:39+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v20.0.0-188-g5c9094d9f65c to v20.0.0-188-g5c9094d9f6
- Upkeep Timestamp changed from 2025-07-14T15:44:39+00:00 to 2025-07-14T21:09:21+00:00
Updated by Upkeep Bot 5 months ago
- Released In set to v20.2.0~915
- Upkeep Timestamp changed from 2025-07-14T21:09:21+00:00 to 2025-11-01T01:01:51+00:00