Bug #62235
openAssert failure: test_ceph_osd_pool_create_utf8
0%
Description
FAIL: test_rados.TestCommand.test_ceph_osd_pool_create_utf8
Was observed in Octopus originally --> https://tracker.ceph.com/issues/48840.
The above tracker was closed as Octopus became EOL.
The issue is again seen on Pacific as part of:
/a/yuriw-2023-07-26_15:54:22-rados-wip-yuri6-testing-2023-07-24-0819-pacific-distro-default-smithi/7353397
2023-07-27T02:19:37.511 INFO:tasks.workunit.client.0.smithi096.stderr:====================================================================== 2023-07-27T02:19:37.512 INFO:tasks.workunit.client.0.smithi096.stderr:FAIL: test_rados.TestCommand.test_ceph_osd_pool_create_utf8 2023-07-27T02:19:37.512 INFO:tasks.workunit.client.0.smithi096.stderr:---------------------------------------------------------------------- 2023-07-27T02:19:37.512 INFO:tasks.workunit.client.0.smithi096.stderr:Traceback (most recent call last): 2023-07-27T02:19:37.512 INFO:tasks.workunit.client.0.smithi096.stderr: File "/usr/lib/python3/dist-packages/nose/case.py", line 197, in runTest 2023-07-27T02:19:37.512 INFO:tasks.workunit.client.0.smithi096.stderr: self.test(*self.arg) 2023-07-27T02:19:37.512 INFO:tasks.workunit.client.0.smithi096.stderr: File "/home/ubuntu/cephtest/clone.client.0/src/test/pybind/test_rados.py", line 1373, in test_ceph_osd_pool_create_utf8 2023-07-27T02:19:37.512 INFO:tasks.workunit.client.0.smithi096.stderr: eq(u"pool '\u9ec5' created", out) 2023-07-27T02:19:37.513 INFO:tasks.workunit.client.0.smithi096.stderr:AssertionError: "pool '\u9ec5' created" != "pool '\u9ec5' already exists" 2023-07-27T02:19:37.513 INFO:tasks.workunit.client.0.smithi096.stderr: 2023-07-27T02:19:37.513 INFO:tasks.workunit.client.0.smithi096.stderr:----------------------------------------------------------------------
Updated by Sridhar Seshasayee over 2 years ago
- Copied from Bug #48840: Octopus: Assert failure: test_ceph_osd_pool_create_utf8 added
Updated by Laura Flores 4 months ago
- Subject changed from Pacific: Assert failure: test_ceph_osd_pool_create_utf8 to Assert failure: test_ceph_osd_pool_create_utf8
Updated by Laura Flores 4 months ago
/a/yaarit-2025-11-06_20:06:52-rados:basic-wip-rocky10-branch-of-the-day-2025-11-05-1762369819-distro-default-smithi/8587285
2025-11-06T20:37:26.373 INFO:tasks.workunit.client.0.smithi001.stdout:../../../clone.client.0/src/test/pybind/test_rados.py::TestCommand::test_ceph_osd_pool_create_utf8 FAILED [ 97%]
2025-11-06T20:37:30.365 INFO:tasks.workunit.client.0.smithi001.stdout:../../../clone.client.0/src/test/pybind/test_rados.py::TestWatchNotify::test PASSED [ 98%]
2025-11-06T20:37:34.235 INFO:tasks.workunit.client.0.smithi001.stdout:../../../clone.client.0/src/test/pybind/test_rados.py::TestWatchNotify::test_aio_notify PASSED [100%]
2025-11-06T20:37:34.235 INFO:tasks.workunit.client.0.smithi001.stdout:
2025-11-06T20:37:34.235 INFO:tasks.workunit.client.0.smithi001.stdout:=================================== FAILURES ===================================
2025-11-06T20:37:34.235 INFO:tasks.workunit.client.0.smithi001.stdout:__________________ TestCommand.test_ceph_osd_pool_create_utf8 __________________
2025-11-06T20:37:34.235 INFO:tasks.workunit.client.0.smithi001.stdout:
2025-11-06T20:37:34.235 INFO:tasks.workunit.client.0.smithi001.stdout:self = <test_rados.TestCommand object at 0x7f95094b9f00>
2025-11-06T20:37:34.235 INFO:tasks.workunit.client.0.smithi001.stdout:
2025-11-06T20:37:34.235 INFO:tasks.workunit.client.0.smithi001.stdout: def test_ceph_osd_pool_create_utf8(self):
2025-11-06T20:37:34.235 INFO:tasks.workunit.client.0.smithi001.stdout: poolname = "\u9ec5"
2025-11-06T20:37:34.236 INFO:tasks.workunit.client.0.smithi001.stdout:
2025-11-06T20:37:34.236 INFO:tasks.workunit.client.0.smithi001.stdout: cmd = {"prefix": "osd pool create", "pg_num": 16, "pool": poolname}
2025-11-06T20:37:34.236 INFO:tasks.workunit.client.0.smithi001.stdout: ret, buf, out = self.rados.mon_command(json.dumps(cmd), b'')
2025-11-06T20:37:34.236 INFO:tasks.workunit.client.0.smithi001.stdout: eq(ret, 0)
2025-11-06T20:37:34.236 INFO:tasks.workunit.client.0.smithi001.stdout: assert len(out) > 0
2025-11-06T20:37:34.236 INFO:tasks.workunit.client.0.smithi001.stdout:> eq(u"pool '\u9ec5' created", out)
2025-11-06T20:37:34.236 INFO:tasks.workunit.client.0.smithi001.stdout:
2025-11-06T20:37:34.236 INFO:tasks.workunit.client.0.smithi001.stdout:../../../clone.client.0/src/test/pybind/test_rados.py:1447:
2025-11-06T20:37:34.236 INFO:tasks.workunit.client.0.smithi001.stdout:_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2025-11-06T20:37:34.236 INFO:tasks.workunit.client.0.smithi001.stdout:
2025-11-06T20:37:34.237 INFO:tasks.workunit.client.0.smithi001.stdout:a = "pool '\u9ec5' created", b = "pool '\u9ec5' already exists"
2025-11-06T20:37:34.237 INFO:tasks.workunit.client.0.smithi001.stdout:
2025-11-06T20:37:34.237 INFO:tasks.workunit.client.0.smithi001.stdout: def assert_equal(a, b):
2025-11-06T20:37:34.237 INFO:tasks.workunit.client.0.smithi001.stdout:> assert a == b
2025-11-06T20:37:34.237 INFO:tasks.workunit.client.0.smithi001.stdout:E AssertionError
2025-11-06T20:37:34.237 INFO:tasks.workunit.client.0.smithi001.stdout:
2025-11-06T20:37:34.237 INFO:tasks.workunit.client.0.smithi001.stdout:../../../clone.client.0/src/test/pybind/assertions.py:2: AssertionError
2025-11-06T20:37:34.237 INFO:tasks.workunit.client.0.smithi001.stdout:=========================== short test summary info ============================
2025-11-06T20:37:34.238 INFO:tasks.workunit.client.0.smithi001.stdout:FAILED ../../../clone.client.0/src/test/pybind/test_rados.py::TestCommand::test_ceph_osd_pool_create_utf8
2025-11-06T20:37:34.238 INFO:tasks.workunit.client.0.smithi001.stdout:=================== 1 failed, 90 passed in 329.17s (0:05:29) ===================
2025-11-06T20:37:34.284 DEBUG:teuthology.orchestra.run:got remote process result: 1
2025-11-06T20:37:34.285 INFO:tasks.workunit:Stopping ['rados/test_python.sh'] on client.0...
We are testing new rocky 10 packages on main, and this issue came up again.
Updated by Laura Flores 4 months ago
- Assignee set to Nitzan Mordechai
@Nitzan Mordechai can you take a look? If not, we can reassign!
Updated by Yaarit Hatuka 4 months ago
QA ticket:
https://tracker.ceph.com/issues/73749
Updated by Nitzan Mordechai 4 months ago
I'm trying to recreate it, this time with more debug, but I still can't recreate
Updated by Radoslaw Zarzynski 4 months ago
If it can't be replicated, let's switch the status Need More Info.
Updated by Laura Flores 4 months ago
@Nitzan Mordechai are you trying to reproduce this locally or in teuthology? Might be hard to reproduce locally- wonder if it would come up if we run the test 50x times on plain main.
Updated by Nitzan Mordechai 3 months ago · Edited
I search for all the creations of that pool in the logs:
nmordech@folio02 /a/yaarit-2025-11-06_20:06:52-rados:basic-wip-rocky10-branch-of-the-day-2025-11-05-1762369819-distro-default-smithi/8587285 find . -type f -name "ceph-*.log*" -exec zgrep -aH "黅" {} \;
./remote/smithi001/log/ceph-mon.a.log.gz:2025-11-06T20:37:26.285+0000 7f2584806640 2 mon.a@0(leader) e1 send_reply 0x55f68df1f440 0x55f68e49ab40 mon_command_ack([{"prefix": "osd pool create", "pg_num": 16, "pool": "\u9ec5"}]=0 pool '黅' created v340)
./remote/smithi001/log/ceph-mon.a.log.gz:2025-11-06T20:37:26.285+0000 7f2584806640 1 -- v1:172.21.15.1:6789/0 --> v1:172.21.15.1:6790/0 -- route(mon_command_ack([{"prefix": "osd pool create", "pg_num": 16, "pool": "\u9ec5"}]=0 pool '黅' created v340) tid 1780) -- 0x55f68dd17040 con 0x55f68c50c800
./remote/smithi001/log/ceph-mon.a.log.gz:2025-11-06T20:37:26.297+0000 7f258700b640 2 mon.a@0(leader) e1 send_reply 0x55f68e6cf560 0x55f68d81d2c0 mon_command_ack([{"prefix": "osd pool create", "pg_num": 16, "pool": "\u9ec5"}]=0 pool '黅' already exists v340)
./remote/smithi001/log/ceph-mon.a.log.gz:2025-11-06T20:37:26.297+0000 7f258700b640 1 -- v1:172.21.15.1:6789/0 --> v1:172.21.15.1:6790/0 -- route(mon_command_ack([{"prefix": "osd pool create", "pg_num": 16, "pool": "\u9ec5"}]=0 pool '黅' already exists v340) tid 1782) -- 0x55f68ded4340 con 0x55f68c50c800
./remote/smithi001/log/ceph-mon.c.log.gz:2025-11-06T20:37:26.293+0000 7fb4b5279640 1 -- v1:172.21.15.1:6790/0 <== mon.0 v1:172.21.15.1:6789/0 5323 ==== route(mon_command_ack([{"prefix": "osd pool create", "pg_num": 16, "pool": "\u9ec5"}]=0 pool '黅' created v340) tid 1780) ==== 260+0+0 (unknown 2986139300 0 0) 0x557acf47fba0 con 0x557acec72000
./remote/smithi001/log/ceph-mon.c.log.gz:2025-11-06T20:37:26.293+0000 7fb4b5279640 10 mon.c@2(peon) e1 handle_route tid 1780 mon_command_ack([{"prefix": "osd pool create", "pg_num": 16, "pool": "\u9ec5"}]=0 pool '黅' created v340)
./remote/smithi001/log/ceph-mon.c.log.gz:2025-11-06T20:37:26.293+0000 7fb4b5279640 1 -- v1:172.21.15.1:6790/0 --> v1:172.21.15.1:0/3567985004 -- mon_command_ack([{"prefix": "osd pool create", "pg_num": 16, "pool": "\u9ec5"}]=0 pool '黅' created v340) -- 0x557acf44a780 con 0x557acf896c00
./remote/smithi001/log/ceph-mon.c.log.gz:2025-11-06T20:37:26.297+0000 7fb4b5279640 1 -- v1:172.21.15.1:6790/0 <== mon.0 v1:172.21.15.1:6789/0 5325 ==== route(mon_command_ack([{"prefix": "osd pool create", "pg_num": 16, "pool": "\u9ec5"}]=0 pool '黅' already exists v340) tid 1782) ==== 267+0+0 (unknown 3523449057 0 0) 0x557acf7e0000 con 0x557acec72000
./remote/smithi001/log/ceph-mon.c.log.gz:2025-11-06T20:37:26.297+0000 7fb4b5279640 10 mon.c@2(peon) e1 handle_route tid 1782 mon_command_ack([{"prefix": "osd pool create", "pg_num": 16, "pool": "\u9ec5"}]=0 pool '黅' already exists v340)
./remote/smithi001/log/ceph-mon.c.log.gz:2025-11-06T20:37:26.297+0000 7fb4b5279640 1 -- v1:172.21.15.1:6790/0 --> v1:172.21.15.1:0/3567985004 -- mon_command_ack([{"prefix": "osd pool create", "pg_num": 16, "pool": "\u9ec5"}]=0 pool '黅' already exists v340) -- 0x557aceee5680 con 0x557acee08400
./remote/smithi116/log/ceph-mgr.x.log.gz:2025-11-06T20:37:44.195+0000 7f205f6e1640 0 [balancer INFO root] pools ['rbd', '.mgr', '黅']
./remote/smithi116/log/ceph-mgr.x.log.gz:2025-11-06T20:37:44.259+0000 7f20595d5640 0 [pg_autoscaler INFO root] Pool '黅' root_id -1 using 0.0 of space, bias 1.0, pg target 0.0 quantized to 32 (current 16)
mon.a (leader) - ack 2 tids 1780 - 'pool created', tid 1782 'pool exists'
mon.c (peon) - ack 2 tids 1780 - 'pool created', tid 1782 'pool exists'
lets check what happened - why do we have 2 tids ...
from ./remote/smithi001/log/ceph-mon.c.log.gz
ack that the command create the pool from leader mon.0
2025-11-06T20:37:26.293+0000 7fb4b5279640 1 -- v1:172.21.15.1:6790/0 <== mon.0 v1:172.21.15.1:6789/0 5323 ==== route(mon_command_ack([{"prefix": "osd pool create", "pg_num": 16, "pool": "\u9ec5"}]=0 pool '黅' created v340) tid 1780) ==== 260+0+0 (unknown 2986139300 0 0) 0x557acf47fba0 con 0x557acec72000
2025-11-06T20:37:26.293+0000 7fb4b5279640 20 mon.c@2(peon) e1 _ms_dispatch existing session 0x557aced87200 for mon.0
2025-11-06T20:37:26.293+0000 7fb4b5279640 20 mon.c@2(peon) e1 entity_name global_id 0 (none) caps allow *
2025-11-06T20:37:26.293+0000 7fb4b5279640 20 MonCap is_capable service=mon command= read addr v1:172.21.15.1:6789/0 on cap allow *
2025-11-06T20:37:26.293+0000 7fb4b5279640 20 MonCap allow so far , doing grant allow *
2025-11-06T20:37:26.293+0000 7fb4b5279640 20 MonCap allow all
2025-11-06T20:37:26.293+0000 7fb4b5279640 20 MonCap is_capable service=mon command= exec addr v1:172.21.15.1:6789/0 on cap allow *
2025-11-06T20:37:26.293+0000 7fb4b5279640 20 MonCap allow so far , doing grant allow *
2025-11-06T20:37:26.293+0000 7fb4b5279640 20 MonCap allow all
2025-11-06T20:37:26.293+0000 7fb4b5279640 10 mon.c@2(peon) e1 handle_route tid 1780 mon_command_ack([{"prefix": "osd pool create", "pg_num": 16, "pool": "\u9ec5"}]=0 pool '黅' created v340)
*<-- ack that the command create the pool from peon mon.c *
2025-11-06T20:37:26.293+0000 7fb4b5279640 1 -- v1:172.21.15.1:6790/0 --> v1:172.21.15.1:0/3567985004 -- mon_command_ack([{"prefix": "osd pool create", "pg_num": 16, "pool": "\u9ec5"}]=0 pool '黅' created v340) -- 0x557acf44a780 con 0x557acf896c00
2025-11-06T20:37:26.293+0000 7fb4b5279640 1 -- v1:172.21.15.1:6790/0 <== mon.0 v1:172.21.15.1:6789/0 5324 ==== paxos(lease lc 823 fc 252 pn 0 opn 0) ==== 84+0+0 (unknown 3703216096 0 0) 0x557acee54480 con 0x557acec72000
2025-11-06T20:37:26.293+0000 7fb4b5279640 20 mon.c@2(peon) e1 _ms_dispatch existing session 0x557aced87200 for mon.0
2025-11-06T20:37:26.293+0000 7fb4b5279640 20 mon.c@2(peon) e1 entity_name global_id 0 (none) caps allow *
2025-11-06T20:37:26.293+0000 7fb4b5279640 20 MonCap is_capable service=mon command= read addr v1:172.21.15.1:6789/0 on cap allow *
2025-11-06T20:37:26.293+0000 7fb4b5279640 20 MonCap allow so far , doing grant allow *
2025-11-06T20:37:26.293+0000 7fb4b5279640 20 MonCap allow all
2025-11-06T20:37:26.293+0000 7fb4b5279640 20 MonCap is_capable service=mon command= exec addr v1:172.21.15.1:6789/0 on cap allow *
2025-11-06T20:37:26.293+0000 7fb4b5279640 20 MonCap allow so far , doing grant allow *
2025-11-06T20:37:26.293+0000 7fb4b5279640 20 MonCap allow all
2025-11-06T20:37:26.293+0000 7fb4b5279640 10 mon.c@2(peon).paxos(paxos active c 252..823) handle_lease on 823 now 2025-11-06T20:37:31.294822+0000
2025-11-06T20:37:26.293+0000 7fb4b5279640 1 -- v1:172.21.15.1:6790/0 --> v1:172.21.15.1:6789/0 -- paxos(lease_ack lc 823 fc 252 pn 0 opn 0) -- 0x557ad08d4fc0 con 0x557acec72000
2025-11-06T20:37:26.293+0000 7fb4b5279640 20 mon.c@2(peon).paxos(paxos active c 252..823) reset_lease_timeout - setting timeout event
2025-11-06T20:37:26.293+0000 7fb4b5279640 5 mon.c@2(peon).monmap v1 apply_mon_features wait for service to be writeable
2025-11-06T20:37:26.293+0000 7fb4b5279640 5 mon.c@2(peon).paxos(paxos active c 252..823) is_readable = 1 - now=2025-11-06T20:37:26.296579+0000 lease_expire=2025-11-06T20:37:31.294822+0000 has v0 lc 823
2025-11-06T20:37:26.293+0000 7fb4b9281640 10 _calc_signature seq 4290 front_crc_ = 1653099814 middle_crc = 0 data_crc = 0 sig = 2112904793629595632
2025-11-06T20:37:26.293+0000 7fb4b9281640 20 Putting signature in client message(seq # 4290): sig = 2112904793629595632
2025-11-06T20:37:26.293+0000 7fb4b3275640 1 -- v1:172.21.15.1:6790/0 >> v1:172.21.15.1:0/3567985004 conn(0x557acf896c00 legacy=0x557aced99000 unknown :6790 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 26
2025-11-06T20:37:26.293+0000 7fb4b3275640 1 -- v1:172.21.15.1:6790/0 >> v1:172.21.15.1:0/3567985004 conn(0x557acf896c00 legacy=0x557aced99000 unknown :6790 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed
2025-11-06T20:37:26.293+0000 7fb4b3275640 1 --1- v1:172.21.15.1:6790/0 >> v1:172.21.15.1:0/3567985004 conn(0x557acf896c00 0x557aced99000 :6790 s=OPENED pgs=1 gs=0 cs=1 l=1).handle_message read tag failed
2025-11-06T20:37:26.293+0000 7fb4b3275640 1 --1- v1:172.21.15.1:6790/0 >> v1:172.21.15.1:0/3567985004 conn(0x557acf896c00 0x557aced99000 :6790 s=OPENED pgs=1 gs=0 cs=1 l=1).fault on lossy channel, failing
2025-11-06T20:37:26.293+0000 7fb4b3275640 1 -- v1:172.21.15.1:6790/0 reap_dead start
<-- network failure that probably causing the second request "read_bulk peer close file descriptor 26 "
Updated by Nitzan Mordechai 3 months ago
I was able to recreate the issue by injecting socket failures while the rados client was waiting for a mon_command ACK. If the error hits at that specific window, librados automatically resubmits the command. However, because the first command was already processed by the monitor, the pool was created. When the retry attempt executes, it correctly reports that the pool already exists.
This confirms it is a testing issue, not a librados bug. We should update the test to check if the pool exists rather than failing on the 'already exists' message.
Updated by Radoslaw Zarzynski 2 months ago
Let's observe for awhile. The current replication rate is very small.