Project

General

Profile

Actions

Bug #70142

open

Rotating keyring disabled access

Added by Adam Kupczyk about 1 year ago. Updated about 1 month ago.

Status:
Need More Info
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Backport:
reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Tags (freeform):
Merge Commit:
Fixed In:
Released In:
Upkeep Timestamp:

Description

Detected in
https://tracker.ceph.com/issues/69920
Test:
https://pulpito.ceph.com/akupczyk-2025-02-17_09:28:03-rados-aclamk-testing-phoebe-2025-02-12-0909-distro-default-smithi/8134393

2025-02-17T10:37:39.363 INFO:tasks.workunit.client.0.smithi165.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:42: expect_true:  diff -au keyring1 keyring2
2025-02-17T10:37:39.364 INFO:tasks.workunit.client.0.smithi165.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:42: expect_true:  return 0
2025-02-17T10:37:39.364 INFO:tasks.workunit.client.0.smithi165.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:619: test_auth:  env CEPH_KEYRING=keyring1 ceph -n client.admin2 auth rotate client.admin2
2025-02-17T10:52:39.486 INFO:tasks.workunit.client.0.smithi165.stderr:2025-02-17T10:52:39.486+0000 7f9625a78640  0 --1- 172.21.15.165:0/1067412820 >> v1:172.21.15.165:6812/3612510196 conn(0x7f9620059a10 0x7f95f00c1420 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2025-02-17T10:52:40.486 INFO:tasks.workunit.client.0.smithi165.stderr:2025-02-17T10:52:40.486+0000 7f9625a78640  0 --1- 172.21.15.165:0/1067412820 >> v1:172.21.15.165:6812/3612510196 conn(0x7f9620059a10 0x7f95ec023640 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
.....
2025-02-17T13:34:40.999 INFO:tasks.workunit.client.0.smithi165.stderr:2025-02-17T13:34:40.995+0000 7f9625a78640  0 --1- 172.21.15.165:0/1067412820 >> v1:172.21.15.165:6812/3612510196 conn(0x7f9620059a10 0x7f95ec023620 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER
2025-02-17T13:34:41.459 INFO:tasks.workunit.client.0.smithi165.stderr://home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:1: test_auth:  rm -fr /tmp/cephtool.Q6y
2025-02-17T13:34:41.461 DEBUG:teuthology.orchestra.run:got remote process result: 124

The test rotated keyring, and it failed with repeated BADAUTHORIZER.

Actions #1

Updated by Radoslaw Zarzynski about 1 year ago

  • Assignee set to Nitzan Mordechai

Hi Nitzan, would you mind taking a look on this?

Actions #2

Updated by Laura Flores about 1 year ago

Bump up

Actions #3

Updated by Nitzan Mordechai about 1 year ago

  • Status changed from New to In Progress

the mgr show:

2025-02-17T10:52:39.486+0000 7f5a9f6d9640  1 -- v1:172.21.15.165:6812/3612510196 >> v1:172.21.15.165:0/1067412820 conn(0x55eebabbfc00 legacy=0x55eebad81800 unknown :6812 s=STATE_CONNECTION_ESTABLISHED l=1).tick idle (900003839) for more than 900000000 us, fault.
2025-02-17T10:52:39.486+0000 7f5a9f6d9640  1 --1- v1:172.21.15.165:6812/3612510196 >> v1:172.21.15.165:0/1067412820 conn(0x55eebabbfc00 0x55eebad81800 :6812 s=OPENED pgs=4 cs=1 l=1).fault on lossy channel, failing
2025-02-17T10:52:39.486+0000 7f5a9f6d9640  1 --1- v1:172.21.15.165:6812/3612510196 >>  conn(0x55eebaa39000 0x55eebaa45000 :6812 s=ACCEPTING pgs=0 cs=0 l=0).send_server_banner sd=31 legacy v1:172.21.15.165:6812/3612510196 socket_addr v1:172.21.15.165:6812/3612510196 target_addr v1:172.21.15.165:44948/0
2025-02-17T10:52:39.486+0000 7f5a9f6d9640 20 AuthRegistry(0x7ffd588b7958) get_handler peer_type 8 method 2 cluster_methods [2] service_methods [2] client_methods [2]
2025-02-17T10:52:39.486+0000 7f5a9f6d9640 10 cephx: verify_authorizer decrypted service mgr secret_id=2
2025-02-17T10:52:39.486+0000 7f5a9f6d9640  0 auth: could not find secret_id=2
2025-02-17T10:52:39.486+0000 7f5a9f6d9640 10 auth: dump_rotating:
2025-02-17T10:52:39.486+0000 7f5a9f6d9640 10 auth:  id 4 AQBNErNnPR49DhAArd+F0+v1255bZdD68V4VfA== expires 2025-02-17T10:49:17.238884+0000
2025-02-17T10:52:39.486+0000 7f5a9f6d9640 10 auth:  id 5 AQA4E7NnZ6mREBAAkbmArv+YPn42f69JVIZfHw== expires 2025-02-17T10:53:17.238884+0000
2025-02-17T10:52:39.486+0000 7f5a9f6d9640 10 auth:  id 6 AQAtFLNnUeayEhAABdaePziwoxCFAZ9e4v9h4A== expires 2025-02-17T10:57:17.313712+0000
2025-02-17T10:52:39.486+0000 7f5a9f6d9640  0 cephx: verify_authorizer could not get service secret for service mgr secret_id=2
2025-02-17T10:52:39.486+0000 7f5a9f6d9640 10 monclient: handle_auth_request bad authorizer on 0x55eebaa39000
2025-02-17T10:52:39.486+0000 7f5a9f6d9640  0 --1- v1:172.21.15.165:6812/3612510196 >> v1:172.21.15.165:0/1067412820 conn(0x55eebaa39000 0x55eebaa45000 :6812 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_message_2: got bad authorizer, auth_reply_len=0
2025-02-17T10:52:39.486+0000 7f5a9f6d9640  1 -- v1:172.21.15.165:6812/3612510196 >> v1:172.21.15.165:0/1067412820 conn(0x55eebaa39000 legacy=0x55eebaa45000 unknown :6812 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 31
2025-02-17T10:52:39.486+0000 7f5a9f6d9640  1 -- v1:172.21.15.165:6812/3612510196 >> v1:172.21.15.165:0/1067412820 conn(0x55eebaa39000 legacy=0x55eebaa45000 unknown :6812 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed
2025-02-17T10:52:39.486+0000 7f5a9f6d9640  1 --1- v1:172.21.15.165:6812/3612510196 >> v1:172.21.15.165:0/1067412820 conn(0x55eebaa39000 0x55eebaa45000 :6812 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_message_1 read connect msg failed
2025-02-17T10:52:39.486+0000 7f5a9f6d9640  1 --1- v1:172.21.15.165:6812/3612510196 >> v1:172.21.15.165:0/1067412820 conn(0x55eebaa39000 0x55eebaa45000 :6812 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).fault on lossy channel, failing

the monitor shows:

Actions #4

Updated by Laura Flores about 1 year ago

Nitzan Mordechai wrote in #note-3:

the mgr show:
[...]

the monitor shows:

Bump up; also, @Nitzan Mordechai did you miss pasting a snippet about the monitor?

Actions #5

Updated by Nitzan Mordechai about 1 year ago

@Laura Flores yes.. i was not able to recreate it or find any other related bugs. so it's weird that the loop over secret_id=2 that was rotated and no longer available.

Actions #6

Updated by Laura Flores about 1 year ago

  • Status changed from In Progress to Need More Info

Ok, changing to Need More Info. Update as needed.

Actions #7

Updated by Naveen Naidu 12 months ago

/a/yuriw-2025-03-18_01:22:46-rados-wip-yuri5-testing-2025-03-17-1441-reef-distro-default-smithi

4 jobs: ['8196407', '8196610', '8196542', '8196473']

Actions #8

Updated by Naveen Naidu 12 months ago

  • Backport set to reef
Actions #9

Updated by Radoslaw Zarzynski 12 months ago

@Nitzan Mordechai: can this a timeskew issue?

Actions #10

Updated by Laura Flores 11 months ago

Let's monitor for at least another week. (note from bug scrub)

Actions #11

Updated by Nitzan Mordechai 11 months ago

Naveen Naidu wrote in #note-7:

/a/yuriw-2025-03-18_01:22:46-rados-wip-yuri5-testing-2025-03-17-1441-reef-distro-default-smithi

4 jobs: ['8196407', '8196610', '8196542', '8196473']

it doesn't relate to that issue. still didn't see any re-occur of that issue

Actions #12

Updated by Laura Flores 6 months ago ยท Edited

/a/yuriw-2025-09-18_21:29:32-rados-tentacle-release-distro-default-smithi/8509721

2025-09-19T02:05:31.186 INFO:tasks.workunit.client.0.smithi193.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:42: expect_true:  return 0
2025-09-19T02:05:31.186 INFO:tasks.workunit.client.0.smithi193.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:619: test_auth:  env CEPH_KEYRING=keyring1 ceph -n client.admin2 auth rotate client.admin2
2025-09-19T02:05:31.805 INFO:tasks.workunit.client.0.smithi193.stderr:2025-09-19T02:05:31.803+0000 7f2bf58f8640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
2025-09-19T02:05:31.805 INFO:tasks.workunit.client.0.smithi193.stderr:2025-09-19T02:05:31.803+0000 7f2bf68fa640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
2025-09-19T02:05:31.805 INFO:tasks.workunit.client.0.smithi193.stderr:2025-09-19T02:05:31.803+0000 7f2bf60f9640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
2025-09-19T02:05:41.271 INFO:tasks.workunit.client.0.smithi193.stderr:2025-09-19T02:05:41.268+0000 7f2bf58f8640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
2025-09-19T02:05:41.272 INFO:tasks.workunit.client.0.smithi193.stderr:2025-09-19T02:05:41.268+0000 7f2bf68fa640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
2025-09-19T02:05:41.272 INFO:tasks.workunit.client.0.smithi193.stderr:2025-09-19T02:05:41.268+0000 7f2bf60f9640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
2025-09-19T02:30:42.219 INFO:tasks.workunit.client.0.smithi193.stderr:2025-09-19T02:30:42.214+0000 7f2bf60f9640  0 -- 172.21.15.193:0/2681247443 >> [v2:172.21.15.193:6824/1606473246,v1:172.21.15.193:6825/1606473246] conn(0x7f2bf011cde0 msgr2=0x7f2bb4081b80 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until injecting socket failure
2025-09-19T02:33:57.389 INFO:tasks.workunit.client.0.smithi193.stderr:2025-09-19T02:33:57.385+0000 7f2bf60f9640  0 -- 172.21.15.193:0/2681247443 >> [v2:172.21.15.193:6824/1606473246,v1:172.21.15.193:6825/1606473246] conn(0x7f2bf011cde0 msgr2=0x7f2bb4081b80 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until injecting socket failure
2025-09-19T05:02:27.050 INFO:tasks.workunit.client.0.smithi193.stderr://home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:1: test_auth:  rm -fr /tmp/cephtool.sLb
2025-09-19T05:02:27.082 DEBUG:teuthology.orchestra.run:got remote process result: 124
2025-09-19T05:02:27.084 INFO:tasks.workunit:Stopping ['cephtool'] on client.0...
2025-09-19T05:02:27.085 DEBUG:teuthology.orchestra.run.smithi193:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0
2025-09-19T05:02:27.431 ERROR:teuthology.run_tasks:Saw exception from tasks.

Actions #13

Updated by Laura Flores 6 months ago

Laura Flores wrote in #note-12:

/a/yuriw-2025-09-18_21:29:32-rados-tentacle-release-distro-default-smithi/8509721
[...]

I think this one will need to be re-categorized in a new ticket since the root cause seems different.

Actions #14

Updated by Radoslaw Zarzynski 5 months ago

Bump up.

Actions #15

Updated by Radoslaw Zarzynski 5 months ago

Bump up.

Actions #16

Updated by Nitzan Mordechai about 1 month ago

/a/yaarit-2026-02-10_02:34:33-rados-wip-rocky10-branch-of-the-day-2026-02-09-1770676549-distro-default-trial/42133/

Actions #17

Updated by Radoslaw Zarzynski about 1 month ago

2025-02-17T10:52:39.486+0000 7f5a9f6d9640 20 AuthRegistry(0x7ffd588b7958) get_handler peer_type 8 method 2 cluster_methods [2] service_methods [2] client_methods [2]
2025-02-17T10:52:39.486+0000 7f5a9f6d9640 10 cephx: verify_authorizer decrypted service mgr secret_id=2
2025-02-17T10:52:39.486+0000 7f5a9f6d9640  0 auth: could not find secret_id=2
2025-02-17T10:52:39.486+0000 7f5a9f6d9640 10 auth: dump_rotating:
2025-02-17T10:52:39.486+0000 7f5a9f6d9640 10 auth:  id 4 AQBNErNnPR49DhAArd+F0+v1255bZdD68V4VfA== expires 2025-02-17T10:49:17.238884+0000
2025-02-17T10:52:39.486+0000 7f5a9f6d9640 10 auth:  id 5 AQA4E7NnZ6mREBAAkbmArv+YPn42f69JVIZfHw== expires 2025-02-17T10:53:17.238884+0000
2025-02-17T10:52:39.486+0000 7f5a9f6d9640 10 auth:  id 6 AQAtFLNnUeayEhAABdaePziwoxCFAZ9e4v9h4A== expires 2025-02-17T10:57:17.313712+0000
2025-02-17T10:52:39.486+0000 7f5a9f6d9640  0 cephx: verify_authorizer could not get service secret for service mgr secret_id=2

Time drift between nodes?

Actions

Also available in: Atom PDF