Bug #70142
openRotating keyring disabled access
0%
Description
Detected in
https://tracker.ceph.com/issues/69920
Test:
https://pulpito.ceph.com/akupczyk-2025-02-17_09:28:03-rados-aclamk-testing-phoebe-2025-02-12-0909-distro-default-smithi/8134393
2025-02-17T10:37:39.363 INFO:tasks.workunit.client.0.smithi165.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:42: expect_true: diff -au keyring1 keyring2 2025-02-17T10:37:39.364 INFO:tasks.workunit.client.0.smithi165.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:42: expect_true: return 0 2025-02-17T10:37:39.364 INFO:tasks.workunit.client.0.smithi165.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:619: test_auth: env CEPH_KEYRING=keyring1 ceph -n client.admin2 auth rotate client.admin2 2025-02-17T10:52:39.486 INFO:tasks.workunit.client.0.smithi165.stderr:2025-02-17T10:52:39.486+0000 7f9625a78640 0 --1- 172.21.15.165:0/1067412820 >> v1:172.21.15.165:6812/3612510196 conn(0x7f9620059a10 0x7f95f00c1420 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER 2025-02-17T10:52:40.486 INFO:tasks.workunit.client.0.smithi165.stderr:2025-02-17T10:52:40.486+0000 7f9625a78640 0 --1- 172.21.15.165:0/1067412820 >> v1:172.21.15.165:6812/3612510196 conn(0x7f9620059a10 0x7f95ec023640 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER ..... 2025-02-17T13:34:40.999 INFO:tasks.workunit.client.0.smithi165.stderr:2025-02-17T13:34:40.995+0000 7f9625a78640 0 --1- 172.21.15.165:0/1067412820 >> v1:172.21.15.165:6812/3612510196 conn(0x7f9620059a10 0x7f95ec023620 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER 2025-02-17T13:34:41.459 INFO:tasks.workunit.client.0.smithi165.stderr://home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:1: test_auth: rm -fr /tmp/cephtool.Q6y 2025-02-17T13:34:41.461 DEBUG:teuthology.orchestra.run:got remote process result: 124
The test rotated keyring, and it failed with repeated BADAUTHORIZER.
Updated by Radoslaw Zarzynski about 1 year ago
- Assignee set to Nitzan Mordechai
Hi Nitzan, would you mind taking a look on this?
Updated by Nitzan Mordechai about 1 year ago
- Status changed from New to In Progress
the mgr show:
2025-02-17T10:52:39.486+0000 7f5a9f6d9640 1 -- v1:172.21.15.165:6812/3612510196 >> v1:172.21.15.165:0/1067412820 conn(0x55eebabbfc00 legacy=0x55eebad81800 unknown :6812 s=STATE_CONNECTION_ESTABLISHED l=1).tick idle (900003839) for more than 900000000 us, fault. 2025-02-17T10:52:39.486+0000 7f5a9f6d9640 1 --1- v1:172.21.15.165:6812/3612510196 >> v1:172.21.15.165:0/1067412820 conn(0x55eebabbfc00 0x55eebad81800 :6812 s=OPENED pgs=4 cs=1 l=1).fault on lossy channel, failing 2025-02-17T10:52:39.486+0000 7f5a9f6d9640 1 --1- v1:172.21.15.165:6812/3612510196 >> conn(0x55eebaa39000 0x55eebaa45000 :6812 s=ACCEPTING pgs=0 cs=0 l=0).send_server_banner sd=31 legacy v1:172.21.15.165:6812/3612510196 socket_addr v1:172.21.15.165:6812/3612510196 target_addr v1:172.21.15.165:44948/0 2025-02-17T10:52:39.486+0000 7f5a9f6d9640 20 AuthRegistry(0x7ffd588b7958) get_handler peer_type 8 method 2 cluster_methods [2] service_methods [2] client_methods [2] 2025-02-17T10:52:39.486+0000 7f5a9f6d9640 10 cephx: verify_authorizer decrypted service mgr secret_id=2 2025-02-17T10:52:39.486+0000 7f5a9f6d9640 0 auth: could not find secret_id=2 2025-02-17T10:52:39.486+0000 7f5a9f6d9640 10 auth: dump_rotating: 2025-02-17T10:52:39.486+0000 7f5a9f6d9640 10 auth: id 4 AQBNErNnPR49DhAArd+F0+v1255bZdD68V4VfA== expires 2025-02-17T10:49:17.238884+0000 2025-02-17T10:52:39.486+0000 7f5a9f6d9640 10 auth: id 5 AQA4E7NnZ6mREBAAkbmArv+YPn42f69JVIZfHw== expires 2025-02-17T10:53:17.238884+0000 2025-02-17T10:52:39.486+0000 7f5a9f6d9640 10 auth: id 6 AQAtFLNnUeayEhAABdaePziwoxCFAZ9e4v9h4A== expires 2025-02-17T10:57:17.313712+0000 2025-02-17T10:52:39.486+0000 7f5a9f6d9640 0 cephx: verify_authorizer could not get service secret for service mgr secret_id=2 2025-02-17T10:52:39.486+0000 7f5a9f6d9640 10 monclient: handle_auth_request bad authorizer on 0x55eebaa39000 2025-02-17T10:52:39.486+0000 7f5a9f6d9640 0 --1- v1:172.21.15.165:6812/3612510196 >> v1:172.21.15.165:0/1067412820 conn(0x55eebaa39000 0x55eebaa45000 :6812 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_message_2: got bad authorizer, auth_reply_len=0 2025-02-17T10:52:39.486+0000 7f5a9f6d9640 1 -- v1:172.21.15.165:6812/3612510196 >> v1:172.21.15.165:0/1067412820 conn(0x55eebaa39000 legacy=0x55eebaa45000 unknown :6812 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 31 2025-02-17T10:52:39.486+0000 7f5a9f6d9640 1 -- v1:172.21.15.165:6812/3612510196 >> v1:172.21.15.165:0/1067412820 conn(0x55eebaa39000 legacy=0x55eebaa45000 unknown :6812 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed 2025-02-17T10:52:39.486+0000 7f5a9f6d9640 1 --1- v1:172.21.15.165:6812/3612510196 >> v1:172.21.15.165:0/1067412820 conn(0x55eebaa39000 0x55eebaa45000 :6812 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_message_1 read connect msg failed 2025-02-17T10:52:39.486+0000 7f5a9f6d9640 1 --1- v1:172.21.15.165:6812/3612510196 >> v1:172.21.15.165:0/1067412820 conn(0x55eebaa39000 0x55eebaa45000 :6812 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).fault on lossy channel, failing
the monitor shows:
Updated by Laura Flores about 1 year ago
Nitzan Mordechai wrote in #note-3:
the mgr show:
[...]the monitor shows:
Bump up; also, @Nitzan Mordechai did you miss pasting a snippet about the monitor?
Updated by Nitzan Mordechai about 1 year ago
@Laura Flores yes.. i was not able to recreate it or find any other related bugs. so it's weird that the loop over secret_id=2 that was rotated and no longer available.
Updated by Laura Flores about 1 year ago
- Status changed from In Progress to Need More Info
Ok, changing to Need More Info. Update as needed.
Updated by Naveen Naidu 12 months ago
/a/yuriw-2025-03-18_01:22:46-rados-wip-yuri5-testing-2025-03-17-1441-reef-distro-default-smithi
4 jobs: ['8196407', '8196610', '8196542', '8196473']
Updated by Radoslaw Zarzynski 12 months ago
@Nitzan Mordechai: can this a timeskew issue?
Updated by Laura Flores 11 months ago
Let's monitor for at least another week. (note from bug scrub)
Updated by Nitzan Mordechai 11 months ago
Naveen Naidu wrote in #note-7:
/a/yuriw-2025-03-18_01:22:46-rados-wip-yuri5-testing-2025-03-17-1441-reef-distro-default-smithi
4 jobs: ['8196407', '8196610', '8196542', '8196473']
it doesn't relate to that issue. still didn't see any re-occur of that issue
Updated by Laura Flores 6 months ago ยท Edited
/a/yuriw-2025-09-18_21:29:32-rados-tentacle-release-distro-default-smithi/8509721
2025-09-19T02:05:31.186 INFO:tasks.workunit.client.0.smithi193.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:42: expect_true: return 0 2025-09-19T02:05:31.186 INFO:tasks.workunit.client.0.smithi193.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:619: test_auth: env CEPH_KEYRING=keyring1 ceph -n client.admin2 auth rotate client.admin2 2025-09-19T02:05:31.805 INFO:tasks.workunit.client.0.smithi193.stderr:2025-09-19T02:05:31.803+0000 7f2bf58f8640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2] 2025-09-19T02:05:31.805 INFO:tasks.workunit.client.0.smithi193.stderr:2025-09-19T02:05:31.803+0000 7f2bf68fa640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2] 2025-09-19T02:05:31.805 INFO:tasks.workunit.client.0.smithi193.stderr:2025-09-19T02:05:31.803+0000 7f2bf60f9640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2] 2025-09-19T02:05:41.271 INFO:tasks.workunit.client.0.smithi193.stderr:2025-09-19T02:05:41.268+0000 7f2bf58f8640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2] 2025-09-19T02:05:41.272 INFO:tasks.workunit.client.0.smithi193.stderr:2025-09-19T02:05:41.268+0000 7f2bf68fa640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2] 2025-09-19T02:05:41.272 INFO:tasks.workunit.client.0.smithi193.stderr:2025-09-19T02:05:41.268+0000 7f2bf60f9640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2] 2025-09-19T02:30:42.219 INFO:tasks.workunit.client.0.smithi193.stderr:2025-09-19T02:30:42.214+0000 7f2bf60f9640 0 -- 172.21.15.193:0/2681247443 >> [v2:172.21.15.193:6824/1606473246,v1:172.21.15.193:6825/1606473246] conn(0x7f2bf011cde0 msgr2=0x7f2bb4081b80 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until injecting socket failure 2025-09-19T02:33:57.389 INFO:tasks.workunit.client.0.smithi193.stderr:2025-09-19T02:33:57.385+0000 7f2bf60f9640 0 -- 172.21.15.193:0/2681247443 >> [v2:172.21.15.193:6824/1606473246,v1:172.21.15.193:6825/1606473246] conn(0x7f2bf011cde0 msgr2=0x7f2bb4081b80 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until injecting socket failure 2025-09-19T05:02:27.050 INFO:tasks.workunit.client.0.smithi193.stderr://home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:1: test_auth: rm -fr /tmp/cephtool.sLb 2025-09-19T05:02:27.082 DEBUG:teuthology.orchestra.run:got remote process result: 124 2025-09-19T05:02:27.084 INFO:tasks.workunit:Stopping ['cephtool'] on client.0... 2025-09-19T05:02:27.085 DEBUG:teuthology.orchestra.run.smithi193:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0 2025-09-19T05:02:27.431 ERROR:teuthology.run_tasks:Saw exception from tasks.
Updated by Laura Flores 6 months ago
Laura Flores wrote in #note-12:
/a/yuriw-2025-09-18_21:29:32-rados-tentacle-release-distro-default-smithi/8509721
[...]
I think this one will need to be re-categorized in a new ticket since the root cause seems different.
Updated by Nitzan Mordechai about 1 month ago
/a/yaarit-2026-02-10_02:34:33-rados-wip-rocky10-branch-of-the-day-2026-02-09-1770676549-distro-default-trial/42133/
Updated by Radoslaw Zarzynski about 1 month ago
2025-02-17T10:52:39.486+0000 7f5a9f6d9640 20 AuthRegistry(0x7ffd588b7958) get_handler peer_type 8 method 2 cluster_methods [2] service_methods [2] client_methods [2] 2025-02-17T10:52:39.486+0000 7f5a9f6d9640 10 cephx: verify_authorizer decrypted service mgr secret_id=2 2025-02-17T10:52:39.486+0000 7f5a9f6d9640 0 auth: could not find secret_id=2 2025-02-17T10:52:39.486+0000 7f5a9f6d9640 10 auth: dump_rotating: 2025-02-17T10:52:39.486+0000 7f5a9f6d9640 10 auth: id 4 AQBNErNnPR49DhAArd+F0+v1255bZdD68V4VfA== expires 2025-02-17T10:49:17.238884+0000 2025-02-17T10:52:39.486+0000 7f5a9f6d9640 10 auth: id 5 AQA4E7NnZ6mREBAAkbmArv+YPn42f69JVIZfHw== expires 2025-02-17T10:53:17.238884+0000 2025-02-17T10:52:39.486+0000 7f5a9f6d9640 10 auth: id 6 AQAtFLNnUeayEhAABdaePziwoxCFAZ9e4v9h4A== expires 2025-02-17T10:57:17.313712+0000 2025-02-17T10:52:39.486+0000 7f5a9f6d9640 0 cephx: verify_authorizer could not get service secret for service mgr secret_id=2
Time drift between nodes?