Project

General

Profile

Actions

Bug #72600

closed

test_cls_2pc_queue.sh exits randomly?

Added by Casey Bodley 7 months ago. Updated 6 months ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
-
Target version:
-
% Done:

0%

Source:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Tags (freeform):
Merge Commit:
Fixed In:
Released In:
Upkeep Timestamp:

Description

two examples from this week

from https://qa-proxy.ceph.com/teuthology/cbodley-2025-08-14_20:48:33-rgw-wip-72315-distro-default-smithi/8443481/teuthology.log

2025-08-14T21:51:39.064 DEBUG:teuthology.orchestra.run.smithi073:workunit test cls/test_cls_2pc_queue.sh> mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=57e2b3f03c239abdbf836bae81fa4b2e203370e6 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cls/test_cls_2pc_queue.sh
2025-08-14T21:51:39.149 INFO:tasks.workunit.client.0.smithi073.stdout:Running main() from gmock_main.cc
2025-08-14T21:51:39.149 INFO:tasks.workunit.client.0.smithi073.stdout:[==========] Running 24 tests from 1 test suite.
2025-08-14T21:51:39.149 INFO:tasks.workunit.client.0.smithi073.stdout:[----------] Global test environment set-up.
2025-08-14T21:51:39.150 INFO:tasks.workunit.client.0.smithi073.stdout:[----------] 24 tests from TestCls2PCQueue
2025-08-14T21:51:39.150 INFO:tasks.workunit.client.0.smithi073.stdout:[ RUN      ] TestCls2PCQueue.GetCapacity
2025-08-14T21:51:42.042 INFO:tasks.workunit.client.0.smithi073.stdout:[       OK ] TestCls2PCQueue.GetCapacity (2892 ms)
2025-08-14T21:51:42.042 INFO:tasks.workunit.client.0.smithi073.stdout:[ RUN      ] TestCls2PCQueue.AsyncGetCapacity
2025-08-14T21:51:45.061 INFO:tasks.workunit.client.0.smithi073.stdout:[       OK ] TestCls2PCQueue.AsyncGetCapacity (3018 ms)
2025-08-14T21:51:45.061 INFO:tasks.workunit.client.0.smithi073.stdout:[ RUN      ] TestCls2PCQueue.Reserve
2025-08-14T21:51:48.100 INFO:tasks.workunit.client.0.smithi073.stdout:[       OK ] TestCls2PCQueue.Reserve (3039 ms)
2025-08-14T21:51:48.100 INFO:tasks.workunit.client.0.smithi073.stdout:[ RUN      ] TestCls2PCQueue.AsyncReserve
2025-08-15T00:51:39.140 DEBUG:teuthology.orchestra.run:got remote process result: 124
2025-08-15T00:51:39.172 INFO:tasks.workunit:Stopping ['cls/test_cls_lock.sh', 'cls/test_cls_log.sh', 'cls/test_cls_refcount.sh', 'cls/test_cls_rgw.sh', 'cls/test_cls_rgw_gc.sh', 'cls/test_cls_rgw_stats.sh', 'cls/test_cls_cmpomap.sh', 'cls/test_cls_2pc_queue.sh', 'cls/test_cls_user.sh', 'cls/test_cls_sem_set.sh', 'rgw/test_rgw_gc_log.sh', 'rgw/test_rgw_obj.sh', 'rgw/test_rgw_datalog.sh', 'rgw/test_librgw_file.sh', 'rgw/test_awssdkv4_sig.sh', 'rgw/test_gosdk2.sh'] on client.0...

and from https://qa-proxy.ceph.com/teuthology/ivancich-2025-08-13_04:35:40-rgw-wip-alter-bucket-instance-ids-distro-default-smithi/8440157/teuthology.log

2025-08-13T06:05:59.176 DEBUG:teuthology.orchestra.run.smithi106:workunit test cls/test_cls_2pc_queue.sh> mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=0894eb6392982ed14283c1b3e44fc313407454dc TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cls/test_cls_2pc_queue.sh
2025-08-13T06:05:59.240 INFO:tasks.workunit.client.0.smithi106.stdout:Running main() from gmock_main.cc
2025-08-13T06:05:59.240 INFO:tasks.workunit.client.0.smithi106.stdout:[==========] Running 24 tests from 1 test suite.
2025-08-13T06:05:59.240 INFO:tasks.workunit.client.0.smithi106.stdout:[----------] Global test environment set-up.
2025-08-13T06:05:59.240 INFO:tasks.workunit.client.0.smithi106.stdout:[----------] 24 tests from TestCls2PCQueue
2025-08-13T06:05:59.240 INFO:tasks.workunit.client.0.smithi106.stdout:[ RUN      ] TestCls2PCQueue.GetCapacity
2025-08-13T06:06:02.168 INFO:tasks.workunit.client.0.smithi106.stdout:[       OK ] TestCls2PCQueue.GetCapacity (2923 ms)
2025-08-13T06:06:02.168 INFO:tasks.workunit.client.0.smithi106.stdout:[ RUN      ] TestCls2PCQueue.AsyncGetCapacity
2025-08-13T06:06:05.193 INFO:tasks.workunit.client.0.smithi106.stdout:[       OK ] TestCls2PCQueue.AsyncGetCapacity (3029 ms)
2025-08-13T06:06:05.193 INFO:tasks.workunit.client.0.smithi106.stdout:[ RUN      ] TestCls2PCQueue.Reserve
2025-08-13T06:06:08.245 INFO:tasks.workunit.client.0.smithi106.stdout:[       OK ] TestCls2PCQueue.Reserve (3050 ms)
2025-08-13T06:06:08.246 INFO:tasks.workunit.client.0.smithi106.stdout:[ RUN      ] TestCls2PCQueue.AsyncReserve
2025-08-13T06:06:11.269 INFO:tasks.workunit.client.0.smithi106.stdout:[       OK ] TestCls2PCQueue.AsyncReserve (3025 ms)
2025-08-13T06:06:11.269 INFO:tasks.workunit.client.0.smithi106.stdout:[ RUN      ] TestCls2PCQueue.Commit
2025-08-13T09:05:59.231 DEBUG:teuthology.orchestra.run:got remote process result: 124
2025-08-13T09:05:59.275 INFO:tasks.workunit:Stopping ['cls/test_cls_lock.sh', 'cls/test_cls_log.sh', 'cls/test_cls_refcount.sh', 'cls/test_cls_rgw.sh', 'cls/test_cls_rgw_gc.sh', 'cls/test_cls_rgw_stats.sh', 'cls/test_cls_cmpomap.sh', 'cls/test_cls_2pc_queue.sh', 'cls/test_cls_user.sh', 'cls/test_cls_sem_set.sh', 'rgw/test_rgw_gc_log.sh', 'rgw/test_rgw_obj.sh', 'rgw/test_rgw_datalog.sh', 'rgw/test_librgw_file.sh', 'rgw/test_awssdkv4_sig.sh', 'rgw/test_gosdk2.sh'] on client.0...

test doesn't fail or print a crash dump, it just exits? https://tracker.ceph.com/issues/72144 tracks something similar for test_cls_rgw_gc.sh


Related issues 1 (1 open0 closed)

Related to rgw - Bug #72144: rgw: failure in cls_rgw_gc.initNeed More Info

Actions
Actions #1

Updated by J. Eric Ivancich 7 months ago

  • Related to Bug #72144: rgw: failure in cls_rgw_gc.init added
Actions #2

Updated by Casey Bodley 7 months ago

saw this from ceph_test_cls_log too in https://qa-proxy.ceph.com/teuthology/cbodley-2025-08-15_13:13:30-rgw-wip-21128-distro-default-smithi/8444513/teuthology.log

2025-08-15T13:46:18.880 DEBUG:teuthology.orchestra.run.smithi145:workunit test cls/test_cls_log.sh> mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=3ca86b31102423cb7e571d4ef1d6c11e1e20dae8 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cls/test_cls_log.sh
2025-08-15T13:46:18.928 INFO:tasks.workunit.client.0.smithi145.stdout:Running main() from gmock_main.cc
2025-08-15T13:46:18.928 INFO:tasks.workunit.client.0.smithi145.stdout:[==========] Running 4 tests from 1 test suite.
2025-08-15T13:46:18.928 INFO:tasks.workunit.client.0.smithi145.stdout:[----------] Global test environment set-up.
2025-08-15T13:46:18.928 INFO:tasks.workunit.client.0.smithi145.stdout:[----------] 4 tests from cls_log
2025-08-15T13:46:18.928 INFO:tasks.workunit.client.0.smithi145.stdout:[ RUN      ] cls_log.test_log_add_same_time
2025-08-15T13:46:21.859 INFO:tasks.workunit.client.0.smithi145.stdout:[       OK ] cls_log.test_log_add_same_time (2930 ms)
2025-08-15T13:46:21.859 INFO:tasks.workunit.client.0.smithi145.stdout:[ RUN      ] cls_log.test_log_add_different_time
2025-08-15T13:46:24.883 INFO:tasks.workunit.client.0.smithi145.stdout:[       OK ] cls_log.test_log_add_different_time (3024 ms)
2025-08-15T13:46:24.883 INFO:tasks.workunit.client.0.smithi145.stdout:[ RUN      ] cls_log.trim_by_time
2025-08-15T16:46:18.921 DEBUG:teuthology.orchestra.run:got remote process result: 124
2025-08-15T16:46:18.953 INFO:tasks.workunit:Stopping ['cls/test_cls_lock.sh', 'cls/test_cls_log.sh', 'cls/test_cls_refcount.sh', 'cls/test_cls_rgw.sh', 'cls/test_cls_rgw_gc.sh', 'cls/test_cls_rgw_stats.sh', 'cls/test_cls_cmpomap.sh', 'cls/test_cls_2pc_queue.sh', 'cls/test_cls_user.sh', 'cls/test_cls_sem_set.sh', 'rgw/test_rgw_gc_log.sh', 'rgw/test_rgw_obj.sh', 'rgw/test_rgw_datalog.sh', 'rgw/test_librgw_file.sh', 'rgw/test_awssdkv4_sig.sh', 'rgw/test_gosdk2.sh'] on client.0...
Actions #3

Updated by Casey Bodley 7 months ago

looking more closely at the timestamps in these logs:

2025-08-14T21:51:48.100 INFO:tasks.workunit.client.0.smithi073.stdout:[ RUN      ] TestCls2PCQueue.AsyncReserve
2025-08-15T00:51:39.140 DEBUG:teuthology.orchestra.run:got remote process result: 124

the process doesn't exit until ~3 hours after the last line of test output. these tests are hanging and being killed after timeout

Actions #4

Updated by J. Eric Ivancich 6 months ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF