Actions
Bug #72600
closedtest_cls_2pc_queue.sh exits randomly?
Status:
Can't reproduce
Priority:
Urgent
Assignee:
-
Target version:
-
% Done:
0%
Source:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Tags (freeform):
Merge Commit:
Fixed In:
Released In:
Upkeep Timestamp:
Description
two examples from this week
2025-08-14T21:51:39.064 DEBUG:teuthology.orchestra.run.smithi073:workunit test cls/test_cls_2pc_queue.sh> mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=57e2b3f03c239abdbf836bae81fa4b2e203370e6 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cls/test_cls_2pc_queue.sh 2025-08-14T21:51:39.149 INFO:tasks.workunit.client.0.smithi073.stdout:Running main() from gmock_main.cc 2025-08-14T21:51:39.149 INFO:tasks.workunit.client.0.smithi073.stdout:[==========] Running 24 tests from 1 test suite. 2025-08-14T21:51:39.149 INFO:tasks.workunit.client.0.smithi073.stdout:[----------] Global test environment set-up. 2025-08-14T21:51:39.150 INFO:tasks.workunit.client.0.smithi073.stdout:[----------] 24 tests from TestCls2PCQueue 2025-08-14T21:51:39.150 INFO:tasks.workunit.client.0.smithi073.stdout:[ RUN ] TestCls2PCQueue.GetCapacity 2025-08-14T21:51:42.042 INFO:tasks.workunit.client.0.smithi073.stdout:[ OK ] TestCls2PCQueue.GetCapacity (2892 ms) 2025-08-14T21:51:42.042 INFO:tasks.workunit.client.0.smithi073.stdout:[ RUN ] TestCls2PCQueue.AsyncGetCapacity 2025-08-14T21:51:45.061 INFO:tasks.workunit.client.0.smithi073.stdout:[ OK ] TestCls2PCQueue.AsyncGetCapacity (3018 ms) 2025-08-14T21:51:45.061 INFO:tasks.workunit.client.0.smithi073.stdout:[ RUN ] TestCls2PCQueue.Reserve 2025-08-14T21:51:48.100 INFO:tasks.workunit.client.0.smithi073.stdout:[ OK ] TestCls2PCQueue.Reserve (3039 ms) 2025-08-14T21:51:48.100 INFO:tasks.workunit.client.0.smithi073.stdout:[ RUN ] TestCls2PCQueue.AsyncReserve 2025-08-15T00:51:39.140 DEBUG:teuthology.orchestra.run:got remote process result: 124 2025-08-15T00:51:39.172 INFO:tasks.workunit:Stopping ['cls/test_cls_lock.sh', 'cls/test_cls_log.sh', 'cls/test_cls_refcount.sh', 'cls/test_cls_rgw.sh', 'cls/test_cls_rgw_gc.sh', 'cls/test_cls_rgw_stats.sh', 'cls/test_cls_cmpomap.sh', 'cls/test_cls_2pc_queue.sh', 'cls/test_cls_user.sh', 'cls/test_cls_sem_set.sh', 'rgw/test_rgw_gc_log.sh', 'rgw/test_rgw_obj.sh', 'rgw/test_rgw_datalog.sh', 'rgw/test_librgw_file.sh', 'rgw/test_awssdkv4_sig.sh', 'rgw/test_gosdk2.sh'] on client.0...
2025-08-13T06:05:59.176 DEBUG:teuthology.orchestra.run.smithi106:workunit test cls/test_cls_2pc_queue.sh> mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=0894eb6392982ed14283c1b3e44fc313407454dc TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cls/test_cls_2pc_queue.sh 2025-08-13T06:05:59.240 INFO:tasks.workunit.client.0.smithi106.stdout:Running main() from gmock_main.cc 2025-08-13T06:05:59.240 INFO:tasks.workunit.client.0.smithi106.stdout:[==========] Running 24 tests from 1 test suite. 2025-08-13T06:05:59.240 INFO:tasks.workunit.client.0.smithi106.stdout:[----------] Global test environment set-up. 2025-08-13T06:05:59.240 INFO:tasks.workunit.client.0.smithi106.stdout:[----------] 24 tests from TestCls2PCQueue 2025-08-13T06:05:59.240 INFO:tasks.workunit.client.0.smithi106.stdout:[ RUN ] TestCls2PCQueue.GetCapacity 2025-08-13T06:06:02.168 INFO:tasks.workunit.client.0.smithi106.stdout:[ OK ] TestCls2PCQueue.GetCapacity (2923 ms) 2025-08-13T06:06:02.168 INFO:tasks.workunit.client.0.smithi106.stdout:[ RUN ] TestCls2PCQueue.AsyncGetCapacity 2025-08-13T06:06:05.193 INFO:tasks.workunit.client.0.smithi106.stdout:[ OK ] TestCls2PCQueue.AsyncGetCapacity (3029 ms) 2025-08-13T06:06:05.193 INFO:tasks.workunit.client.0.smithi106.stdout:[ RUN ] TestCls2PCQueue.Reserve 2025-08-13T06:06:08.245 INFO:tasks.workunit.client.0.smithi106.stdout:[ OK ] TestCls2PCQueue.Reserve (3050 ms) 2025-08-13T06:06:08.246 INFO:tasks.workunit.client.0.smithi106.stdout:[ RUN ] TestCls2PCQueue.AsyncReserve 2025-08-13T06:06:11.269 INFO:tasks.workunit.client.0.smithi106.stdout:[ OK ] TestCls2PCQueue.AsyncReserve (3025 ms) 2025-08-13T06:06:11.269 INFO:tasks.workunit.client.0.smithi106.stdout:[ RUN ] TestCls2PCQueue.Commit 2025-08-13T09:05:59.231 DEBUG:teuthology.orchestra.run:got remote process result: 124 2025-08-13T09:05:59.275 INFO:tasks.workunit:Stopping ['cls/test_cls_lock.sh', 'cls/test_cls_log.sh', 'cls/test_cls_refcount.sh', 'cls/test_cls_rgw.sh', 'cls/test_cls_rgw_gc.sh', 'cls/test_cls_rgw_stats.sh', 'cls/test_cls_cmpomap.sh', 'cls/test_cls_2pc_queue.sh', 'cls/test_cls_user.sh', 'cls/test_cls_sem_set.sh', 'rgw/test_rgw_gc_log.sh', 'rgw/test_rgw_obj.sh', 'rgw/test_rgw_datalog.sh', 'rgw/test_librgw_file.sh', 'rgw/test_awssdkv4_sig.sh', 'rgw/test_gosdk2.sh'] on client.0...
test doesn't fail or print a crash dump, it just exits? https://tracker.ceph.com/issues/72144 tracks something similar for test_cls_rgw_gc.sh
Updated by J. Eric Ivancich 7 months ago
- Related to Bug #72144: rgw: failure in cls_rgw_gc.init added
Updated by Casey Bodley 7 months ago
saw this from ceph_test_cls_log too in https://qa-proxy.ceph.com/teuthology/cbodley-2025-08-15_13:13:30-rgw-wip-21128-distro-default-smithi/8444513/teuthology.log
2025-08-15T13:46:18.880 DEBUG:teuthology.orchestra.run.smithi145:workunit test cls/test_cls_log.sh> mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=3ca86b31102423cb7e571d4ef1d6c11e1e20dae8 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cls/test_cls_log.sh 2025-08-15T13:46:18.928 INFO:tasks.workunit.client.0.smithi145.stdout:Running main() from gmock_main.cc 2025-08-15T13:46:18.928 INFO:tasks.workunit.client.0.smithi145.stdout:[==========] Running 4 tests from 1 test suite. 2025-08-15T13:46:18.928 INFO:tasks.workunit.client.0.smithi145.stdout:[----------] Global test environment set-up. 2025-08-15T13:46:18.928 INFO:tasks.workunit.client.0.smithi145.stdout:[----------] 4 tests from cls_log 2025-08-15T13:46:18.928 INFO:tasks.workunit.client.0.smithi145.stdout:[ RUN ] cls_log.test_log_add_same_time 2025-08-15T13:46:21.859 INFO:tasks.workunit.client.0.smithi145.stdout:[ OK ] cls_log.test_log_add_same_time (2930 ms) 2025-08-15T13:46:21.859 INFO:tasks.workunit.client.0.smithi145.stdout:[ RUN ] cls_log.test_log_add_different_time 2025-08-15T13:46:24.883 INFO:tasks.workunit.client.0.smithi145.stdout:[ OK ] cls_log.test_log_add_different_time (3024 ms) 2025-08-15T13:46:24.883 INFO:tasks.workunit.client.0.smithi145.stdout:[ RUN ] cls_log.trim_by_time 2025-08-15T16:46:18.921 DEBUG:teuthology.orchestra.run:got remote process result: 124 2025-08-15T16:46:18.953 INFO:tasks.workunit:Stopping ['cls/test_cls_lock.sh', 'cls/test_cls_log.sh', 'cls/test_cls_refcount.sh', 'cls/test_cls_rgw.sh', 'cls/test_cls_rgw_gc.sh', 'cls/test_cls_rgw_stats.sh', 'cls/test_cls_cmpomap.sh', 'cls/test_cls_2pc_queue.sh', 'cls/test_cls_user.sh', 'cls/test_cls_sem_set.sh', 'rgw/test_rgw_gc_log.sh', 'rgw/test_rgw_obj.sh', 'rgw/test_rgw_datalog.sh', 'rgw/test_librgw_file.sh', 'rgw/test_awssdkv4_sig.sh', 'rgw/test_gosdk2.sh'] on client.0...
Updated by Casey Bodley 7 months ago
looking more closely at the timestamps in these logs:
2025-08-14T21:51:48.100 INFO:tasks.workunit.client.0.smithi073.stdout:[ RUN ] TestCls2PCQueue.AsyncReserve 2025-08-15T00:51:39.140 DEBUG:teuthology.orchestra.run:got remote process result: 124
the process doesn't exit until ~3 hours after the last line of test output. these tests are hanging and being killed after timeout
Updated by J. Eric Ivancich 6 months ago
- Status changed from New to Can't reproduce
Actions