QA Run #71503: aclamk-testing-nauvoo-2025-05-28-1713 - Ceph QA - Ceph

Actions

Copy link

QA Run #71503

open

aclamk-testing-nauvoo-2025-05-28-1713

Added by Adam Kupczyk 10 months ago. Updated 9 months ago.

Status:

QA Approved

Priority:

Normal

Assignee:

Shaman Build:

aclamk-testing-nauvoo-2025-05-28-1713

QA Runs:

aclamk-testing-nauvoo-2025-05-28-1713

QA Release:

main

Git Branch:

Tags (freeform):

Description

https://github.com/ceph/ceph/pull/62148 - os/bluestore: Implemented create-bdev-label
https://github.com/ceph/ceph/pull/62817 - os/bluestore: Debug code to make reshard fail faster
https://github.com/ceph/ceph/pull/62913 - qa: Add Teuthology test for BlueStore ESB assertion failure
https://github.com/ceph/ceph/pull/63188 - os/bluestore: Fix bluefs_fnode_t::seek
https://github.com/ceph/ceph/pull/63358 - os/bluestore:fix bluestore_volume_selection_reserved_factor usage
https://github.com/ceph/ceph/pull/63373 - os/bluestore/compression: Fix Estimator::split_and_compress
https://github.com/ceph/ceph/pull/63429 - qa/rados: Fix problem with recompression failing osd bench testing

Actions

Copy link

Updated by Adam Kupczyk 9 months ago

Status changed from QA Testing to QA Approved

[[ aclamk-testing-nauvoo-2025-05-28-1713 ]]
https://tracker.ceph.com/issues/71503

RUN 1 = 200
https://pulpito.ceph.com/akupczyk-2025-05-29_07:17:53-rados-aclamk-testing-nauvoo-2025-05-28-1713-distro-default-smithi/

[8301159]
rados/cephadm/workunits/{0-distro/centos_9.stream agent/on mon_election/classic task/test_iscsi_container/{centos_9.stream test_iscsi_container}}
hit max job timeout

https://tracker.ceph.com/issues/69803

[8301162]
rados/thrash-old-clients/{0-distro$/{centos_9.stream} 0-size-min-size-overrides/2-size-2-min-size 1-install/squid backoff/peering ceph clusters/{three-plus-one} d-balancer/crush-compat mon_election/classic msgr-failures/few rados thrashers/careful thrashosds-health workloads/radosbench}
"2025-05-29T09:00:00.000134+0000 mon.a (mon.0) 1431 : cluster [WRN] pg 2.12 is active+remapped+backfill_toofull, acting [1,5]" in cluster log

Tracked by: https://tracker.ceph.com/issues/70716

[8301227]
rados/verify/{centos_latest ceph clusters/fixed-4 d-thrash/default/{default thrashosds-health} mon_election/connectivity msgr-failures/few msgr/async-v1only objectstore/{bluestore/{alloc$/{hybrid} base mem$/{normal-1} onode-segment$/{256K} write$/{random/{compr$/{yes$/{snappy}} random}}}} rados read-affinity/balance tasks/rados_cls_all validater/valgrind}
Command failed (workunit test cls/test_cls_2pc_queue.sh) on smithi191 with status 124: 'mkdir ~~p -~~ /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=091fad2b4c59f0d2789a74378cdc0defd97741d3 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cls/test_cls_2pc_queue.sh'

Tracked by: https://tracker.ceph.com/issues/65314

[8301240]
rados/cephadm/workunits/{0-distro/centos_9.stream agent/on mon_election/classic task/test_cephadm}
Command failed (workunit test cephadm/test_cephadm.sh) on smithi077 with status 1: 'mkdir ~~p -~~ /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=091fad2b4c59f0d2789a74378cdc0defd97741d3 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cephadm/test_cephadm.sh'

2025-05-29T11:02:44.834 INFO:tasks.workunit.client.0.smithi077.stdout:May 29 10:59:31 smithi077 systemd[1]: ceph-00000000-0000-0000-0000-0000deadbeef@grafana.a.service: Main process exited, code=exited, status=1/FAILURE
2025-05-29T11:02:44.834 INFO:tasks.workunit.client.0.smithi077.stdout:May 29 10:59:31 smithi077 systemd[1]: ceph-00000000-0000-0000-0000-0000deadbeef@grafana.a.service: Failed with result 'exit-code'.
2025-05-29T11:02:44.834 INFO:tasks.workunit.client.0.smithi077.stdout:May 29 10:59:31 smithi077 systemd[1]: ceph-00000000-0000-0000-0000-0000deadbeef@grafana.a.service: Consumed 1.497s CPU time.
2025-05-29T11:02:44.834 INFO:tasks.workunit.client.0.smithi077.stdout:May 29 10:59:41 smithi077 systemd[1]: ceph-00000000-0000-0000-0000-0000deadbeef@grafana.a.service: Scheduled restart job, restart counter is at 5.
2025-05-29T11:02:44.834 INFO:tasks.workunit.client.0.smithi077.stdout:May 29 10:59:41 smithi077 systemd[1]: Stopped Ceph grafana.a for 00000000-0000-0000-0000-0000deadbeef.
2025-05-29T11:02:44.834 INFO:tasks.workunit.client.0.smithi077.stdout:May 29 10:59:41 smithi077 systemd[1]: ceph-00000000-0000-0000-0000-0000deadbeef@grafana.a.service: Consumed 1.497s CPU time.
2025-05-29T11:02:44.834 INFO:tasks.workunit.client.0.smithi077.stdout:May 29 10:59:41 smithi077 systemd[1]: ceph-00000000-0000-0000-0000-0000deadbeef@grafana.a.service: Start request repeated too quickly.
2025-05-29T11:02:44.834 INFO:tasks.workunit.client.0.smithi077.stdout:May 29 10:59:41 smithi077 systemd[1]: ceph-00000000-0000-0000-0000-0000deadbeef@grafana.a.service: Failed with result 'exit-code'.
2025-05-29T11:02:44.834 INFO:tasks.workunit.client.0.smithi077.stdout:May 29 10:59:41 smithi077 systemd[1]: Failed to start Ceph grafana.a for 00000000-0000-0000-0000-0000deadbeef.
2025-05-29T11:02:44.852 INFO:tasks.workunit.client.0.smithi077.stderr:+ rm -rf tmp.test_cephadm.sh.ks6oco
2025-05-29T11:02:44.853 DEBUG:teuthology.orchestra.run:got remote process result: 1
2025-05-29T11:02:44.854 INFO:tasks.workunit:Stopping ['cephadm/test_cephadm.sh'] on client.0...

NEW TRACKER: https://tracker.ceph.com/issues/71506

[8301285]
rados/cephadm/workunits/{0-distro/centos_9.stream_runc agent/on mon_election/classic task/test_iscsi_container/{centos_9.stream test_iscsi_container}}
hit max job timeout

Tracked by: https://tracker.ceph.com/issues/69803

[8301300]
rados/dashboard/{0-single-container-host debug/mgr mon_election/connectivity random-objectstore$/{bluestore-bitmap} tasks/e2e}
Command failed on smithi064 with status 1: 'yes | sudo mkfs.xfs -f -i size=2048 /dev/vg_nvme/lv_2'

https://tracker.ceph.com/issues/68668

[8301322]
rados/thrash-erasure-code/{ceph clusters/{fixed-4} fast/fast mon_election/connectivity msgr-failures/osd-dispatch-delay objectstore/{bluestore/{alloc$/{bitmap} base mem$/{normal-2} onode-segment$/{1M} write$/{v2/{compr$/{yes$/{zlib}} v2}}}} rados recovery-overrides/{more-partial-recovery} supported-random-distro$/{centos_latest} thrashers/fastread thrashosds-health workloads/ec-rados-plugin=clay-k=4-m=2}
Command failed on smithi031 with status 124: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd dump --format=json'

ceph version 20.3.0-622-g091fad2b (091fad2b4c59f0d2789a74378cdc0defd97741d3) tentacle (dev - RelWithDebInfo)
1: /lib64/libc.so.6(+0x3ea60) [0x7fb819a3ea60]
2: (std::_Rb_tree_decrement(std::_Rb_tree_node_base const*)+0xe) [0x7fb819ec62de]
3: (OSD::tick_without_osd_lock()+0x544) [0x562991a48724]
4: ceph-osd(+0x547b7d) [0x5629919c2b7d]
5: (CommonSafeTimer<std::mutex>::timer_thread()+0x12b) [0x56299200365b]
6: ceph-osd(+0xb89091) [0x562992004091]
7: /lib64/libc.so.6(+0x8a3b2) [0x7fb819a8a3b2]

Tracked by: https://tracker.ceph.com/issues/66819

[8301323]
rados/valgrind-leaks/{1-start 2-inject-leak/none centos_latest}
valgrind error: Leak_StillReachable malloc CRYPTO_malloc

Tracked by: https://tracker.ceph.com/issues/71182

[8301338]
rados/singleton-nomsgr/{all/osd_stale_reads mon_election/classic rados supported-random-distro$/{ubuntu_latest}}
Exiting scrub checking -- not all pgs scrubbed.

The problems seems to stem from
2025-05-29T13:50:48.751+0000 7f8aa8d46640 20 slow request osd_op(client.4198.0:3 2.6 2:602f83fe:::foo:head [read 0~3] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e16) initiated 2025-05-29T13:41:01.719489+0000 currently waiting for readable

2025-05-29T13:40:45.688172+0000 mon.a (mon.0) 91 : cluster [DBG] osdmap e16: 3 total, 3 up, 3 in
2025-05-29T13:40:46.618241+0000 mgr.x (mgr.4101) 10 : cluster [DBG] pgmap v18: 40 pgs: 32 unknown, 8 active+clean; 19 B data, 79 MiB used, 300 GiB / 300 GiB avail; 294 B/s wr, 0 op/s
2025-05-29T13:40:46.692995+0000 mon.a (mon.0) 94 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)
2025-05-29T13:40:46.708160+0000 mon.a (mon.0) 96 : cluster [DBG] osdmap e17: 3 total, 2 up, 3 in

client sends requests to write to foo:head

2025-05-29T13:40:45.715+0000 7f8a92a27640 20 osd.1 pg_epoch: 16 pg[2.6( empty local-lis/les=15/16 n=0 ec=15/15 lis/c=15/15 les/c/f=16/16/0 sis=15) [1,0] r=0 lpr=15 crt=0'0 mlcod 0'0 active+clean] do_op: op osd_op(client.4196.0:2 2.6 2:602f83fe:::foo:head [writefull 0~3 in=3b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e15)
...
2025-05-29T13:40:45.715+0000 7f8a92a27640 10 osd.1 pg_epoch: 16 pg[2.6( empty local-lis/les=15/16 n=0 ec=15/15 lis/c=15/15 les/c/f=16/16/0 sis=15) [1,0] r=0 lpr=15 crt=0'0 mlcod 0'0 active+clean] new_repop rep_tid 1 on osd_op(client.4196.0:2 2.6 2:602f83fe:::foo:head [writefull 0~3 in=3b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e15) v9

repop send to osd.0

2025-05-29T13:40:45.723+0000 7f704180c640 10 osd.0 pg_epoch: 16 pg[2.6( empty local-lis/les=15/16 n=0 ec=15/15 lis/c=15/15 les/c/f=16/16/0 sis=15) [1,0] r=1 lpr=15 crt=0'0 active mbc={}] do_repop 2:602f83fe:::foo:head v 16'1 (transaction) 183

and finally respond to client

2025-05-29T13:40:45.727+0000 7f8a92a27640 10 osd.1 pg_epoch: 16 pg[2.6( v 16'1 (0'0,16'1] local-lis/les=15/16 n=1 ec=15/15 lis/c=15/15 les/c/f=16/16/0 sis=15) [1,0] r=0 lpr=15 crt=16'1 lcod 0'0 mlcod 0'0 active+clean] sending reply on osd_op(client.4196.0:2 2.6 2:602f83fe:::foo:head [writefull 0~3 in=3b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e15) 0x562e9a2e9400

Setting wonderfully well documented --ms-blackhole-osd

2025-05-29T13:40:45.731+0000 7f8aa2e2b640 20 osd.1 16 OSD::ms_dispatch: command(tid 3: {"prefix": "injectargs","injected_args":["--ms-blackhole-osd", "--ms-blackhole-mon"]})
2025-05-29T13:40:45.731+0000 7f8aa2e2b640 20 osd.1 16 _dispatch 0x562e9a634820 command(tid 3: {"prefix": "injectargs","injected_args":["--ms-blackhole-osd", "--ms-blackhole-mon"]})

Client reading its foo:head

2025-05-29T13:40:46.707+0000 7f8a8ea1f640 20 osd.1 pg_epoch: 16 pg[2.6( v 16'1 (0'0,16'1] local-lis/les=15/16 n=1 ec=15/15 lis/c=15/15 les/c/f=16/16/0 sis=15) [1,0] r=0 lpr=15 crt=16'1 lcod 0'0 mlcod 0'0 active+clean] do_op: op osd_op(client.4198.0:2 2.6 2:602f83fe:::foo:head [read 0~3] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e16)

OSD.1 replies
2025-05-29T13:41:01.715+0000 7f8a92a27640 20 osd.1 pg_epoch: 16 pg[2.6( v 16'1 (0'0,16'1] local-lis/les=15/16 n=1 ec=15/15 lis/c=15/15 les/c/f=16/16/0 sis=15) [1,0] r=0 lpr=15 crt=16'1 lcod 0'0 mlcod 0'0 active+clean] do_op: op osd_op(client.4198.0:3 2.6 2:602f83fe:::foo:head [read 0~3] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e16)
2025-05-29T13:41:01.715+0000 7f8a92a27640 10 osd.1 16 dequeue_op osd_op(client.4198.0:3 2.6 2:602f83fe:::foo:head [read 0~3] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e16) v9 finish

But obviously client did not recieve anything and is complaining

2025-05-29T13:41:32.303+0000 7f8aa8d46640 20 slow request osd_op(client.4198.0:3 2.6 2:602f83fe:::foo:head [read 0~3] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e16) initiated 2025-05-29T13:41:01.719489+0000 currently waiting for readable

2025-05-29T14:00:11.577 INFO:tasks.ceph:pgid 2.17 last_scrub_stamp 2025-05-29T13:40:44.669910+0000 time.struct_time(tm_year=2025, tm_mon=5, tm_mday=29, tm_hour=13, tm_min=40, tm_sec=44, tm_wday=3, tm_yday=149, tm_isdst=-1) <= time.struct_time(tm_year=2025, tm_mon=5, tm_mday=29, tm_hour=13, tm_min=50, tm_sec=53, tm_wday=3, tm_yday=149, tm_isdst=0)
2025-05-29T14:00:11.577 INFO:tasks.ceph:pgid 1.7 last_scrub_stamp 2025-05-29T13:40:36.635731+0000 time.struct_time(tm_year=2025, tm_mon=5, tm_mday=29, tm_hour=13, tm_min=40, tm_sec=36, tm_wday=3, tm_yday=149, tm_isdst=-1) <= time.struct_time(tm_year=2025, tm_mon=5, tm_mday=29, tm_hour=13, tm_min=50, tm_sec=53, tm_wday=3, tm_yday=149, tm_isdst=0)
2025-05-29T14:00:11.577 INFO:tasks.ceph:pgid 1.3 last_scrub_stamp 2025-05-29T13:40:36.635731+0000 time.struct_time(tm_year=2025, tm_mon=5, tm_mday=29, tm_hour=13, tm_min=40, tm_sec=36, tm_wday=3, tm_yday=149, tm_isdst=-1) <= time.struct_time(tm_year=2025, tm_mon=5, tm_mday=29, tm_hour=13, tm_min=50, tm_sec=53, tm_wday=3, tm_yday=149, tm_isdst=0)
2025-05-29T14:00:11.577 INFO:tasks.ceph:pgid 2.3 last_scrub_stamp 2025-05-29T13:40:44.669910+0000 time.struct_time(tm_year=2025, tm_mon=5, tm_mday=29, tm_hour=13, tm_min=40, tm_sec=44, tm_wday=3, tm_yday=149, tm_isdst=-1) <= time.struct_time(tm_year=2025, tm_mon=5, tm_mday=29, tm_hour=13, tm_min=50, tm_sec=53, tm_wday=3, tm_yday=149, tm_isdst=0)
2025-05-29T14:00:11.577 INFO:tasks.ceph:pgid 2.d last_scrub_stamp 2025-05-29T13:40:44.669910+0000 time.struct_time(tm_year=2025, tm_mon=5, tm_mday=29, tm_hour=13, tm_min=40, tm_sec=44, tm_wday=3, tm_yday=149, tm_isdst=-1) <= time.struct_time(tm_year=2025, tm_mon=5, tm_mday=29, tm_hour=13, tm_min=50, tm_sec=53, tm_wday=3, tm_yday=149, tm_isdst=0)
2025-05-29T14:00:11.578 INFO:tasks.ceph:Still waiting for all pgs to be scrubbed.

State extracted from:
#grep '2025-05-29T14:00:11.574.*stdout:{"pg_ready"' teuthology.log |sed "s/^.*stdout://" |jq .

"pgid": "2.17",
"pgid": "1.7",
"pgid": "1.3",
"pgid": "2.3",
"pgid": "2.d",

All have:

"up": [1, 2 ],
"acting": [1, 2 ],

NEW TRACKER: https://tracker.ceph.com/issues/71508

RUN 2 = 100
https://pulpito.ceph.com/akupczyk-2025-05-29_18:13:37-rados-aclamk-testing-nauvoo-2025-05-28-1713-distro-default-smithi/

[8301608]
rados/thrash-erasure-code/{ceph clusters/{fixed-4} fast/normal mon_election/classic msgr-failures/osd-delay objectstore/{bluestore/{alloc$/{stupid} base mem$/{normal-2} onode-segment$/{none} write$/{v2/{compr$/{no$/{no}} v2}}}} rados recovery-overrides/{more-async-partial-recovery} supported-random-distro$/{centos_latest} thrashers/minsize_recovery thrashosds-health workloads/ec-rados-plugin=jerasure-k=8-m=6-crush}
hit max job timeout

2025-05-29T18:49:12.870710+0000 mon.a (mon.0) 1942 : cluster [WRN] Health check failed: Low space hindering backfill (add storage if this doesn't resolve itself): 2 pgs backfill_toofull (PG_BACKFILL_FULL)

TODO?

[8301618]
rados/cephadm/workunits/{0-distro/centos_9.stream agent/on mon_election/classic task/test_iscsi_container/{centos_9.stream test_iscsi_container}}
hit max job timeout

Tracked by: https://tracker.ceph.com/issues/69803

[8301620]
rados/standalone/{supported-random-distro$/{centos_latest} workloads/osd}
Command failed (workunit test osd/repeer-on-acting-back.sh) on smithi133 with status 1: 'mkdir ~~p -~~ /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=091fad2b4c59f0d2789a74378cdc0defd97741d3 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/repeer-on-acting-back.sh'

Tracked by: https://tracker.ceph.com/issues/70949

[8301621]
rados/thrash-old-clients/{0-distro$/{centos_9.stream} 0-size-min-size-overrides/2-size-2-min-size 1-install/squid backoff/peering ceph clusters/{three-plus-one} d-balancer/crush-compat mon_election/classic msgr-failures/few rados thrashers/careful thrashosds-health workloads/radosbench}
"2025-05-29T19:20:00.000133+0000 mon.a (mon.0) 2998 : cluster [WRN] pg 6.4 is stuck inactive for 107s, current state undersized+degraded+peered, last acting [1]" in cluster log

Waits too long to scrub, but ultimately recovers.

[8301649]
rados/singleton-nomsgr/{all/pool-access mon_election/classic rados supported-random-distro$/{ubuntu_latest}}
Command failed on smithi134 with status 128: 'rm -rf /home/ubuntu/cephtest/clone.client.0 && git clone https://git.ceph.com/ceph.git /home/ubuntu/cephtest/clone.client.0 && cd /home/ubuntu/cephtest/clone.client.0 && git checkout 091fad2b4c59f0d2789a74378cdc0defd97741d3'

...
2025-05-29T19:16:19.743 INFO:tasks.workunit.client.0.smithi134.stderr:Cloning into '/home/ubuntu/cephtest/clone.client.0'...
2025-05-29T19:17:55.736 INFO:tasks.workunit.client.0.smithi134.stderr:Updating files: 98% (13278/13497)^MUpdating files: 99% (13363/13497)^MUpdating files: 100% (13497/13497)^MUpdating files: 100% (13497/13497), done.
2025-05-29T19:17:55.767 DEBUG:teuthology.orchestra.run:got remote process result: 128
2025-05-29T19:17:55.767 INFO:tasks.workunit.client.0.smithi134.stderr:fatal: reference is not a tree: 091fad2b4c59f0d2789a74378cdc0defd97741d3

Ignored.

[8301660]
rados/thrash-old-clients/{0-distro$/{centos_9.stream} 0-size-min-size-overrides/3-size-2-min-size 1-install/tentacle backoff/peering_and_degraded ceph clusters/{three-plus-one} d-balancer/on mon_election/connectivity msgr-failures/osd-delay rados thrashers/default thrashosds-health workloads/rbd_cls}
"2025-05-29T19:30:00.000236+0000 mon.a (mon.0) 346 : cluster [WRN] [WRN] OSD_HOST_DOWN: 1 host (1 osds) down" in cluster log

Each time we add new OSD there is a 1s time that we get above message.

[8301675]
rados/dashboard/{0-single-container-host debug/mgr mon_election/classic random-objectstore$/{bluestore-comp-lz4} tasks/dashboard}
Test failure: test_list_enabled_module (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest)

https://tracker.ceph.com/issues/62972

[8301678]
rados/objectstore/{backends/ceph_objectstore_tool supported-random-distro$/{centos_latest}}
"2025-05-29T19:45:31.817382+0000 osd.5 (osd.5) 1 : cluster [ERR] map e48 had wrong cluster addr ([v2:0.0.0.0:6842/2309591447,v1:0.0.0.0:6843/2309591447] != my [v2:172.21.15.110:6842/2309591447,v1:172.21.15.110:6843/2309591447])" in cluster log

https://tracker.ceph.com/issues/69805

[8301683]
rados/upgrade/parallel/{0-random-distro$/{centos_9.stream_runc} 0-start 1-tasks mon_election/classic overrides/ignorelist_health upgrade-sequence workload/{ec-rados-default rados_api rados_loadgenbig rbd_import_export test_rbd_api test_rbd_python}}

Command failed on smithi046 with status 1: 'sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:reef shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid e91bdbc2-3cc4-11f0-8700-adfe0268badd -e sha1=091fad2b4c59f0d2789a74378cdc0defd97741d3 -- bash -c \'ceph versions | jq -e \'"\'"\'.overall | length == 1\'"\'"\'\''

2025-05-29T19:53:57.546 INFO:teuthology.orchestra.run.smithi046.stdout:{
2025-05-29T19:53:57.546 INFO:teuthology.orchestra.run.smithi046.stdout: "mon": {
2025-05-29T19:53:57.546 INFO:teuthology.orchestra.run.smithi046.stdout: "ceph version 18.2.7-323-g7773709c (7773709ccbdeba76bb990f46b32ebd8fac96f512) reef (stable)": 3
2025-05-29T19:53:57.546 INFO:teuthology.orchestra.run.smithi046.stdout: },
2025-05-29T19:53:57.546 INFO:teuthology.orchestra.run.smithi046.stdout: "mgr": {
2025-05-29T19:53:57.546 INFO:teuthology.orchestra.run.smithi046.stdout: "ceph version 18.2.7-323-g7773709c (7773709ccbdeba76bb990f46b32ebd8fac96f512) reef (stable)": 1,
2025-05-29T19:53:57.546 INFO:teuthology.orchestra.run.smithi046.stdout: "ceph version 20.3.0-622-g091fad2b (091fad2b4c59f0d2789a74378cdc0defd97741d3) tentacle (dev - RelWithDebInfo)": 1
2025-05-29T19:53:57.546 INFO:teuthology.orchestra.run.smithi046.stdout: },
2025-05-29T19:53:57.546 INFO:teuthology.orchestra.run.smithi046.stdout: "osd": {
2025-05-29T19:53:57.546 INFO:teuthology.orchestra.run.smithi046.stdout: "ceph version 18.2.7-323-g7773709c (7773709ccbdeba76bb990f46b32ebd8fac96f512) reef (stable)": 8
2025-05-29T19:53:57.546 INFO:teuthology.orchestra.run.smithi046.stdout: },
2025-05-29T19:53:57.547 INFO:teuthology.orchestra.run.smithi046.stdout: "mds": {
2025-05-29T19:53:57.547 INFO:teuthology.orchestra.run.smithi046.stdout: "ceph version 18.2.7-323-g7773709c (7773709ccbdeba76bb990f46b32ebd8fac96f512) reef (stable)": 2
2025-05-29T19:53:57.547 INFO:teuthology.orchestra.run.smithi046.stdout: },
2025-05-29T19:53:57.547 INFO:teuthology.orchestra.run.smithi046.stdout: "overall": {
2025-05-29T19:53:57.547 INFO:teuthology.orchestra.run.smithi046.stdout: "ceph version 18.2.7-323-g7773709c (7773709ccbdeba76bb990f46b32ebd8fac96f512) reef (stable)": 14,
2025-05-29T19:53:57.547 INFO:teuthology.orchestra.run.smithi046.stdout: "ceph version 20.3.0-622-g091fad2b (091fad2b4c59f0d2789a74378cdc0defd97741d3) tentacle (dev - RelWithDebInfo)": 1
2025-05-29T19:53:57.547 INFO:teuthology.orchestra.run.smithi046.stdout: }
2025-05-29T19:53:57.547 INFO:teuthology.orchestra.run.smithi046.stdout:}

Ask Laura now to put upgrade tracker.

[8301686]
rados/verify/{centos_latest ceph clusters/fixed-4 d-thrash/default/{default thrashosds-health} mon_election/connectivity msgr-failures/few msgr/async-v1only objectstore/{bluestore/{alloc$/{btree} base mem$/{normal-1} onode-segment$/{512K-onoff} write$/{v2/{compr$/{no$/{no}} v2}}}} rados read-affinity/balance tasks/rados_cls_all validater/valgrind}
Command failed (workunit test cls/test_cls_2pc_queue.sh) on smithi187 with status 124: 'mkdir ~~p -~~ /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=091fad2b4c59f0d2789a74378cdc0defd97741d3 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cls/test_cls_2pc_queue.sh'

https://tracker.ceph.com/issues/71271

[8301699]
rados/cephadm/workunits/{0-distro/centos_9.stream agent/on mon_election/classic task/test_cephadm}
Command failed (workunit test cephadm/test_cephadm.sh) on smithi026 with status 1: 'mkdir ~~p -~~ /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=091fad2b4c59f0d2789a74378cdc0defd97741d3 TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cephadm/test_cephadm.sh'

https://tracker.ceph.com/issues/71506

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph QA

Tags

Custom queries

QA Run #71503

aclamk-testing-nauvoo-2025-05-28-1713

Updated by Adam Kupczyk 9 months ago