QA Run #71057
openaclamk-testing-phoebe-2025-04-24-1431
Description
https://github.com/ceph/ceph/pull/56975 - os/bluestore: Recompression, part 4. Scanner, Estimator and core recompression.
https://github.com/ceph/ceph/pull/62815 - os/bluestore: Add printing shards to Onode::printer
Updated by Adam Kupczyk 11 months ago ยท Edited
[8258022, 8258024, 8258039, 8258050, 8258056, 8258057, 8258067, 8258068 ]
cluster [WRN] OSD bench result of 746.692733 IOPS is not within the threshold limit range of 1000.000000 IOPS and 80000.000000 IOPS
[8258040]
rados/cephadm/workunits/{0-distro/centos_9.stream_runc agent/on mon_election/classic task/test_rgw_multisite}
"2025-04-24T18:08:13.021229+0000 mon.a (mon.0) 425 : cluster [WRN] Health check failed: 1 failed cephadm daemon(s) (CEPHADM_FAILED_DAEMON)" in cluster log
2025-04-24T18:08:18.299 INFO:journalctl@ceph.mon.a.smithi103.stdout:Apr 24 18:08:18 smithi103 ceph-mon[35928]: Health check cleared: OSD_DOWN (was: 1 osds down) 2025-04-24T18:08:18.299 INFO:journalctl@ceph.mon.a.smithi103.stdout:Apr 24 18:08:18 smithi103 ceph-mon[35928]: Health check cleared: OSD_HOST_DOWN (was: 1 host (1 osds) down) 2025-04-24T18:08:18.299 INFO:journalctl@ceph.mon.a.smithi103.stdout:Apr 24 18:08:18 smithi103 ceph-mon[35928]: Health check cleared: OSD_ROOT_DOWN (was: 1 root (1 osds) down) 2025-04-24T18:08:18.299 INFO:journalctl@ceph.mon.a.smithi103.stdout:Apr 24 18:08:18 smithi103 ceph-mon[35928]: Cluster is now healthy
[8258053]
rados/dashboard/{0-single-container-host debug/mgr mon_election/classic random-objectstore$/{bluestore-comp-snappy} tasks/e2e}
Command failed on smithi047 with status 1: 'yes | sudo mkfs.xfs -f -i size=2048 /dev/vg_nvme/lv_2'
https://tracker.ceph.com/issues/68668
[8258055]
rados/cephadm/osds/{0-distro/ubuntu_22.04 0-nvme-loop 1-start 2-ops/repave-all}
"2025-04-24T18:23:50.425399+0000 mon.smithi104 (mon.0) 175 : cluster [WRN] Health check failed: Failed to place 1 daemon(s) (CEPHADM_DAEMON_PLACE_FAIL)" in cluster log
https://tracker.ceph.com/issues/70714
[8258064]
rados/verify/{centos_latest ceph clusters/fixed-4 d-thrash/default/{default thrashosds-health} mon_election/classic msgr-failures/few msgr/async objectstore/{bluestore/{alloc$/{stupid} base compr$/{yes$/{snappy}} mem$/{low} onode-segment$/{256K} write$/{write_random}}} rados read-affinity/balance tasks/rados_cls_all validater/valgrind}
"2025-04-24T19:07:20.787282+0000 mon.a (mon.0) 3616 : cluster [WRN] Health check failed: 1 OSD experiencing slow operations in BlueStore (BLUESTORE_SLOW_OP_ALERT)" in cluster log
[8258069]
rados/cephadm/workunits/{0-distro/centos_9.stream agent/on mon_election/connectivity task/test_ca_signed_key}
"2025-04-24T18:53:16.817810+0000 mon.a (mon.0) 317 : cluster [WRN] Health check failed: 1 failed cephadm daemon(s) (CEPHADM_FAILED_DAEMON)" in cluster log
2025-04-24T18:53:38.218 INFO:journalctl@ceph.mon.a.smithi105.stdout:Apr 24 18:53:37 smithi105 ceph-mon[35196]: Health check cleared: CEPHADM_FAILED_DAEMON (was: 1 failed cephadm daemon(s)) 2025-04-24T18:53:38.218 INFO:journalctl@ceph.mon.a.smithi105.stdout:Apr 24 18:53:37 smithi105 ceph-mon[35196]: Cluster is now healthy
[8258135, 8258156, 8258161, 8258178, 8258190, 8258199, 8258235, 8258242, 8258255, 8258275, 8258278, 8258279]
cluster [WRN] OSD bench result of 746.692733 IOPS is not within the threshold limit range of 1000.000000 IOPS and 80000.000000 IOPS
[8258162]
failed to deploy
[8258147]
rados/dashboard/{0-single-container-host debug/mgr mon_election/classic random-objectstore$/{bluestore-comp-snappy} tasks/e2e}
Command failed on smithi137 with status 1: 'yes | sudo mkfs.xfs -f -i size=2048 /dev/vg_nvme/lv_2'
https://tracker.ceph.com/issues/68668
[8258151]
rados/cephadm/workunits/{0-distro/centos_9.stream agent/on mon_election/classic task/test_iscsi_container/{centos_9.stream test_iscsi_container}}
hit max job timeout
https://tracker.ceph.com/issues/69803
[8258166]
rados/standalone/{supported-random-distro$/{ubuntu_latest} workloads/osd}
Command failed (workunit test osd/repeer-on-acting-back.sh) on smithi046 with status 1: 'mkdir p - /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=57781207339abbf62a41674ad0b8c031ee97be59 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/repeer-on-acting-back.sh'
NEW TRACKER: https://tracker.ceph.com/issues/71071
[8258167]
rados/thrash/{0-size-min-size-overrides/2-size-2-min-size 1-pg-log-overrides/normal_pg_log 2-recovery-overrides/{more-async-partial-recovery} 3-scrub-overrides/{max-simultaneous-scrubs-3} backoff/normal ceph clusters/{fixed-4} crc-failures/bad_map_crc_failure d-balancer/upmap-read mon_election/connectivity msgr-failures/fastclose msgr/async-v1only objectstore/{bluestore/{alloc$/{btree} base compr$/{yes$/{zlib}} mem$/{low} onode-segment$/{1M} write$/{write_random}}} rados supported-random-distro$/{centos_latest} thrashers/careful_host thrashosds-health workloads/radosbench}
reached maximum tries (500) after waiting for 3000 seconds
2025-04-24T21:25:30.745 INFO:tasks.radosbench.radosbench.0.smithi163.stdout:Cleaning up (deleting benchmark objects)
2025-04-24T21:29:11.695 INFO:tasks.radosbench.radosbench.0.smithi163.stdout:Removed 87230 objects
^ clearing took very long
2025-04-24T21:29:22.032 INFO:tasks.radosbench.radosbench.0.smithi163.stdout: sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
^ this only used osd.7 and osd.15 stalled after ~100 objects
2025-04-24T21:25:48.129801+0000 mgr.x (mgr.4111) 1109 : cluster [DBG] pgmap v1392: 25 pgs: 1 active+recovery_wait+degraded, 2 active+recovering+remapped, 3 active+recovering, 19 active+clean; 4.8 GiB data, 2.7 GiB used, 537 GiB / 540 GiB avail; 0 B/s wr, 704 op/s; 2564/155690 objects degraded (1.647%); 8816/155690 objects misplaced (5.663%); 15 MiB/s, 247 objects/s recovering
2025-04-24T21:25:48.130794+0000 mon.a (mon.0) 3321 : cluster [DBG] osdmap e772: 16 total, 16 up, 6 in
^ recovery, and ubut overall ok!
2025-04-24T21:26:44.563018+0000 mon.a (mon.0) 3442 : cluster [INF] Health check cleared: OSDMAP_FLAGS (was: nodeep-scrub flag(s) set)
2025-04-24T21:26:44.563045+0000 mon.a (mon.0) 3443 : cluster [INF] Cluster is now healthy
2025-04-24T21:27:19.018 INFO:tasks.thrashosds.thrasher:in_osds: [15, 7] out_osds: [4, 8, 12, 3, 11, 0, 2, 6, 10, 14, 1, 5, 9, 13] dead_osds: [] live_osds: [0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15]
NEED NEW TRACKER, but unrelated.
[8258173]
rados/verify/{centos_latest ceph clusters/fixed-4 d-thrash/default/{default thrashosds-health} mon_election/classic msgr-failures/few msgr/async objectstore/{bluestore/{alloc$/{hybrid} base compr$/{no$/{no}} mem$/{normal-1} onode-segment$/{512K-onoff} write$/{write_v2}}} rados read-affinity/balance tasks/rados_api_tests validater/valgrind}
Command failed (workunit test rados/test.sh) on smithi177 with status 124: 'mkdir p - /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=57781207339abbf62a41674ad0b8c031ee97be59 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 ALLOW_TIMEOUTS=1 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 6h /home/ubuntu/cephtest/clone.client.0/qa/workunits/rados/test.sh'
-376> 2025-04-24T21:20:45.900+0000 2aa1a640 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/20.0.0-1710-g57781207/rpm/el9/BUILD/ceph-20.0.0-1710-g57781207/src/osd/PrimaryLogPG.cc: In function 'bool PrimaryLogPG::is_degraded_or_backfilling_object(const hobject_t&)' thread 2aa1a640 time 2025-04-24T21:20:45.842726+0000
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/20.0.0-1710-g57781207/rpm/el9/BUILD/ceph-20.0.0-1710-g57781207/src/osd/PrimaryLogPG.cc: 665: FAILED ceph_assert(!get_acting_recovery_backfill().empty())
ceph version 20.0.0-1710-g57781207 (57781207339abbf62a41674ad0b8c031ee97be59) tentacle (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x11e) [0x548538]
2: ceph-osd(+0x3cca0d) [0x4d4a0d]
3: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x3ea7) [0x7d49a7]
4: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x197) [0x7098e7]
5: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x69) [0x95fdb9]
6: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xccf) [0x71451f]
7: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x2a9) [0xcb63d9]
8: ceph-osd(+0xbae994) [0xcb6994]
9: /lib64/libc.so.6(+0x8a3b2) [0x55bf3b2]
10: clone()
https://tracker.ceph.com/issues/70715
[8258195]
rados/standalone/{supported-random-distro$/{centos_latest} workloads/scrub}
Command failed (workunit test scrub/osd-scrub-repair.sh) on smithi149 with status 1: 'mkdir p - /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=57781207339abbf62a41674ad0b8c031ee97be59 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh'
2025-04-24T22:17:34.901 INFO:tasks.workunit.client.0.smithi149.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:4370: corrupt_scrub_erasure: multidiff td/osd-scrub-repair/checkcsjson td/osd-scrub-repair/csjson
2025-04-24T22:17:34.901 INFO:tasks.workunit.client.0.smithi149.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/ceph-helpers.sh:2513: multidiff: diff td/osd-scrub-repair/checkcsjson td/osd-scrub-repair/csjson
caused by extra element not in reference:
88ac6d938ff2 (Bill Scales 2025-03-06 09:44:00 +0000 6630) f->open_array_section("shard_versions");
NEW TRACKER: https://tracker.ceph.com/issues/71076
[8258225]
rados/dashboard/{0-single-container-host debug/mgr mon_election/classic random-objectstore$/{bluestore-bitmap} tasks/dashboard}
Test failure: test_list_enabled_module (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest)
https://tracker.ceph.com/issues/62972
[8258226]
rados/encoder/{0-start 1-tasks supported-random-distro$/{centos_latest}}
Command failed (workunit test dencoder/test-dencoder.sh) on smithi019 with status 1: 'mkdir p - /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=57781207339abbf62a41674ad0b8c031ee97be59 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/dencoder/test-dencoder.sh'
...
2025-04-24T22:01:07.305 INFO:tasks.workunit.client.0.smithi019.stdout:Return code: 1 Command:['ceph-dencoder', 'type', 'CrushWrapper', 'import', PosixPath('/home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/19.2.0-404-g78ddc7f9027/objects/CrushWrapper/fc8463fb82d47a965ce22d157a7218a6'), 'decode', 'dump_json'] Output:
2025-04-24T22:01:07.305 INFO:tasks.workunit.client.0.smithi019.stdout:Return code: 1 Command:['ceph-dencoder', 'type', 'CrushWrapper', 'import', PosixPath('/home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/19.2.0-404-g78ddc7f9027/objects/CrushWrapper/fc8463fb82d47a965ce22d157a7218a6'), 'decode', 'encode', 'decode', 'dump_json'] Output:
2025-04-24T22:01:07.305 INFO:tasks.workunit.client.0.smithi019.stdout:Return code: 1 Command:['ceph-dencoder', 'type', 'CrushWrapper', 'import', PosixPath('/home/ubuntu/cephtest/mnt.0/client.0/tmp/ceph-object-corpus-master/archive/19.2.0-404-g78ddc7f9027/objects/CrushWrapper/ebbc008875fb249834735a78ae1235c0'), 'decode', 'dump_json'] Output:
....
NEW TRACKER:
https://tracker.ceph.com/issues/71077
[8258232]
rados/upgrade/parallel/{0-random-distro$/{centos_9.stream} 0-start 1-tasks mon_election/classic overrides/ignorelist_health upgrade-sequence workload/{ec-rados-default rados_api rados_loadgenbig rbd_import_export test_rbd_api test_rbd_python}}
Command failed on smithi129 with status 128: 'rm -rf /home/ubuntu/cephtest/clone.client.0 && git clone --depth 1 --branch reef https://git.ceph.com/ceph.git /home/ubuntu/cephtest/clone.client.0 && cd /home/ubuntu/cephtest/clone.client.0'
Some wonky net problem?:
2025-04-24T22:45:41.200 DEBUG:teuthology.orchestra.run.smithi129:> rm -rf /home/ubuntu/cephtest/clone.client.0 && git clone --depth 1 --branch reef https://git.ceph.com/ceph-ci.git /home/ubuntu/cephtest/clone.client.0 && cd /home/ubuntu/cephtest/clone.client.0
2025-04-24T22:45:41.256 INFO:tasks.workunit.client.0.smithi129.stderr:Cloning into '/home/ubuntu/cephtest/clone.client.0'...
2025-04-24T22:45:41.347 INFO:journalctl@ceph.mon.a.smithi129.stdout:Apr 24 22:45:40 smithi129 ceph-mon[97162]: from='mgr.25115 172.21.15.129:0/2480328657' entity='mgr.y' cmd={"prefix": "osd blocklist ls", "format": "json"} : dispatch
2025-04-24T22:45:41.348 INFO:journalctl@ceph.mon.c.smithi129.stdout:Apr 24 22:45:40 smithi129 ceph-mon[99123]: from='mgr.25115 172.21.15.129:0/2480328657' entity='mgr.y' cmd={"prefix": "osd blocklist ls", "format": "json"} : dispatch
2025-04-24T22:45:41.377 INFO:tasks.workunit.client.0.smithi129.stderr:warning: Could not find remote branch reef to clone.
2025-04-24T22:45:41.377 INFO:tasks.workunit.client.0.smithi129.stderr:fatal: Remote branch reef not found in upstream origin
2025-04-24T22:45:41.378 DEBUG:teuthology.orchestra.run:got remote process result: 128
2025-04-24T22:45:41.379 INFO:tasks.workunit:failed to check out 'reef' from https://git.ceph.com/ceph-ci.git; will also try in https://git.ceph.com/ceph.git
2025-04-24T22:45:41.379 DEBUG:teuthology.orchestra.run.smithi129:> rm -rf /home/ubuntu/cephtest/clone.client.0 && git clone --depth 1 --branch reef https://git.ceph.com/ceph.git /home/ubuntu/cephtest/clone.client.0 && cd /home/ubuntu/cephtest/clone.client.0
2025-04-24T22:45:41.437 INFO:tasks.workunit.client.0.smithi129.stderr:Cloning into '/home/ubuntu/cephtest/clone.client.0'...
and later (30s) simply fail:
2025-04-24T22:46:11.472 INFO:tasks.workunit.client.0.smithi129.stderr:error: RPC failed; curl 18 transfer closed with outstanding read data remaining 2025-04-24T22:46:11.472 INFO:tasks.workunit.client.0.smithi129.stderr:fatal: expected flush after ref listing
IS IT WORTH A NEW TICKET?
[8258253]
rados/thrash-old-clients/{0-distro$/{centos_9.stream} 0-size-min-size-overrides/3-size-2-min-size 1-install/squid backoff/peering_and_degraded ceph clusters/{three-plus-one} d-balancer/on mon_election/connectivity msgr-failures/fastclose rados thrashers/morepggrow thrashosds-health workloads/test_rbd_api}
"2025-04-24T22:20:00.000241+0000 mon.a (mon.0) 329 : cluster [WRN] [WRN] OSD_HOST_DOWN: 1 host (1 osds) down" in cluster log
Reason:
2025-04-24T22:19:56.743+0000 7fa0d2d00640 -1 received signal: Terminated from /run/podman-init -- /usr/bin/ceph-osd -n osd.0 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-journald=true --default-log-to-stderr=false (PID: 1) UID: 0 2025-04-24T22:19:56.743+0000 7fa0d2d00640 -1 osd.0 8 *** Got signal Terminated *** 2025-04-24T22:19:56.743+0000 7fa0d2d00640 0 osd.0 8 Fast Shutdown: - cct->_conf->osd_fast_shutdown = 1, null-fm = 1 2025-04-24T22:19:56.743+0000 7fa0d2d00640 -1 osd.0 8 *** Immediate shutdown (osd_fast_shutdown=true) *** 2025-04-24T22:19:56.743+0000 7fa0d2d00640 0 osd.0 8 prepare_to_stop telling mon we are shutting down and dead
Can't find why the signal was delivered.
IS IT WORTH A NEW TICKET?
Updated by Adam Kupczyk 11 months ago
- Status changed from QA Testing to QA Approved
Updated by Adam Kupczyk 11 months ago
- Subject changed from aclamk-testing-phoebe-2025-04-23-1454 to aclamk-testing-phoebe-2025-04-24-1431
- Description updated (diff)
- Shaman Build changed from aclamk-testing-phoebe-2025-04-23-1454 to aclamk-testing-phoebe-2025-04-24-1431
- QA Runs changed from aclamk-testing-phoebe-2025-04-23-1454 to aclamk-testing-phoebe-2025-04-24-1431