Skip to content

msg, auth: multiple cleanups for msgr v2#62

Closed
rzarzynski wants to merge 42 commits intoliewegas:wip-v2-authfrom
rzarzynski:wip-v2-auth-cleanups
Closed

msg, auth: multiple cleanups for msgr v2#62
rzarzynski wants to merge 42 commits intoliewegas:wip-v2-authfrom
rzarzynski:wip-v2-auth-cleanups

Conversation

@rzarzynski
Copy link

No description provided.

Someone else may have bound to the same port.  If they identify as someone
else, fault and back off.

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
We're the only user, and no Dispatchers override.

Signed-off-by: Sage Weil <sage@redhat.com>
No need to special-case auth methods at this layer.

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
- return error code, not type (which never changes)
- take const ref for input args
- pointers for output args

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
We are sending the monmap prematurely (before we finish the authentication
handshake).

Signed-off-by: Sage Weil <sage@redhat.com>
Previously we would call ms_handle_authentication() possibly multiple
times, and without knowning whether it might succeed. Instead, only call
it when start_session() or handle_request() returns >0 to indicate that
we should.

Signed-off-by: Sage Weil <sage@redhat.com>
We're potentially handing out ids that haven't committed by increasing
max_global_id before it commits.

Signed-off-by: Sage Weil <sage@redhat.com>
We should get some runway even if there isn't an auth request coming in.

Signed-off-by: Sage Weil <sage@redhat.com>
- We can now construct a session before we know who it is
- We can later call _ident to identify it
- and also later register it in the session map

Signed-off-by: Sage Weil <sage@redhat.com>
We were passing CephXServiceTicketInfo in, but the only part of it we
needed was the embedded AuthTicket.  Pass that instead.

No functional change.

Signed-off-by: Sage Weil <sage@redhat.com>
Allow these methods to populate session and connection secrets.

No functional change (yet).

Signed-off-by: Sage Weil <sage@redhat.com>
No functional change.

Signed-off-by: Sage Weil <sage@redhat.com>
This will hold all of the authentication-related state in an easy-to-find
section that can be accessed via a Connection* or by the protocol stack
(as needed).

Signed-off-by: Sage Weil <sage@redhat.com>
Previously, we would give the client the auth ticket, like a rbd TGT
(ticket granting ticket), and the client would then ask for all of the
other tickets it wants in a separate message.

Instead, have the client specify which tickets it wants up front and pass
them all at the same time.

Also, generate and share the connection_secret, which will be used for
encryption.

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
const bufferlists in, pointers for output args.

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
The AuthAuthorizer encoding always begins with byte 0x01.  Codify that
as AUTH_MODE_AUTHORIZER so that we can distinguish an authorizer from
something else (e.g., an attempt to authenticate and get an initial auth
ticket with the mon).

Signed-off-by: Sage Weil <sage@redhat.com>
These will be the primary interfaces consumed by the messenger and
implemented by either MonClient (regular client, or service daemon) or
Monitor for doing authentication.

Signed-off-by: Sage Weil <sage@redhat.com>
With msgr2 the initial kickoff of an authentication handshake is client ->
server, while with msgr1 it was server -> client.  So existing
implementations have an empty initial message (outside of the messenger's
envelope).  Future auth implementations that are msgr2 only (e.g., krb)
may want to make use of this initial payload.

Signed-off-by: Sage Weil <sage@redhat.com>
- MonClient implements AuthClient to authenticate as a client
- MonClient implements AuthServer to allow daemons to verify authorizers
- Monitor implements AuthServer to allow clients to authenticate with
  an exchange of msgr2 frames
- Monitor implements AuthClient to authenticate with other monitors

After this change ProtocolV1 and SimpleMessenger still use all of the
old Dispatcher-based callbacks, but ProtocolV2 doesn't need them at
all (except for ms_handle_authentication when we finish).

Signed-off-by: Sage Weil <sage@redhat.com>
These are handled by AuthClient and AuthServer now.

Signed-off-by: Sage Weil <sage@redhat.com>
- simplify/consolidate my type and peer type effects on auth method
- watch for runtime config changes

Signed-off-by: Sage Weil <sage@redhat.com>
liewegas and others added 12 commits January 24, 2019 10:41
The modes are:

- crc: crc32c checksums to protect against bit errors.  No secrecy or
  authenticity guarantees, so a MITM could alter traffic in flight.
- secure: cryptographic secrecy and authenticity proection (i.e, encrypted
  and signed).

We do not include a 'signed' mode that provides authenticity without
secrecy because the cryptographic protocols appear to be faster than
SHA-2.

New settings:

- ms_cluster_mode  : mode(s list) for intra-cluster connections
- ms_service_mode  : mode(s list) for daemons to allow
- ms_client_mode   : mode(s list) for clients to allow

Also,

- ms_mon_cluster_mode  : mon <-> mon connections
- ms_mon_service_mode  : mon <-> daemon or client connections

The msgr2 protocol is expanded slightly to negotiate a mode.  Client
shares it's allowed/preferred modes, and server picks one as auth finishes.
Negotiation is independent of the authentication, except that the
authentiction mode may precluse certain choices. Specifically, AUTH_NONE
does not support 'secure', only 'crc'.

Signed-off-by: Sage Weil <sage@redhat.com>
Move connection mode decision to initial auth_request point so that it
can inform auth implementation how big the connection secret should be.
Pass that value through where appropriate.

The connection_secret is now a std::string filled with random bytes.

For now the v2 protocol just uses the session_key CryptoKey to encrypt,
but this is about to change.

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
This is possible because the dropped state is/was required only by:
 * CephxSessionHandler,
 * methods removed by the previous commits.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Yeah, this means wire encryption isn't supposed for V1 anymore.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
@liewegas liewegas force-pushed the wip-v2-auth branch 7 times, most recently from e565519 to ded6a5f Compare February 3, 2019 20:42
@liewegas liewegas force-pushed the wip-v2-auth branch 5 times, most recently from ca35217 to 43b8a77 Compare February 7, 2019 18:10
@rzarzynski
Copy link
Author

Merged as a part of the on-wire crypto PR. Closing.

@rzarzynski rzarzynski closed this Mar 14, 2019
liewegas pushed a commit that referenced this pull request Jun 2, 2021
There is a scenario when the `active_con` is properly
chosen but isn't marked as `ready_to_send`.
If `renew_subs()` is called during the `on_session_opened()`,
the flag will be turned on after the subscriptions are
renewed which cannot happen as it requires the flag to be
already set. In other words: there is a circular data dependency.

The net result is stalling the subscription machinery,
particularly the `OSDMap` subs. This caused a nasty peering
issue at Sepia [1] where PG 2.7 got stuck in the `GetInfo`
state.

```
rzarzynski@teuthology:/home/teuthworker/archive/rzarzynski-2021-05-26_12:20:26-rados-master-distro-basic-smithi/6136908$ less ./remote/smithi039/log/ceph-osd.1.log.gz
...
DEBUG 2021-05-26 20:19:48,134 [shard 0] osd -  pg_epoch 14 pg[2.7( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c=0/0 les/c/f=0/0/0 sis=0) [] r=
-1 lpr=0 crt=0'0 mlcod 0'0 unknown enter Initial
...
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0]
r=0 lpr=0 crt=0'0 mlcod 0'0 unknown enter Reset
...
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 unknown enter Started
...
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 unknown enter Start
...
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 unknown enter Started/Primary
...
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating enter Started/Primary/Peering
...
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering enter Started/Primary/Peering/GetInfo
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering build_prior all_probe
DEBUG 2021-05-26 20:19:48,139 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering build_prior final: probe 0,1 down  blocked_by {}
DEBUG 2021-05-26 20:19:48,139 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering up_thru 0 < same_since 14, must notify monitor
DEBUG 2021-05-26 20:19:48,139 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>:  no prior_set down osds, clearing prior_readable_until_ub
DEBUG 2021-05-26 20:19:48,139 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>:  querying info from osd.0
...
DEBUG 2021-05-26 20:19:48,237 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering  got osd.0 2.7( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c=0/0 les/c/f=0/0/0 sis=0)
DEBUG 2021-05-26 20:19:48,237 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: Adding osd: 0 peer features: 3f01cfbb7ffdffff
DEBUG 2021-05-26 20:19:48,237 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: Common peer features: 3f01cfbb7ffdffff
DEBUG 2021-05-26 20:19:48,237 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: Common acting features: 3f01cfbb7ffdffff
DEBUG 2021-05-26 20:19:48,238 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: Common upacting features: 3f01cfbb7ffdffff
DEBUG 2021-05-26 20:19:48,238 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering exit Started/Primary/Peering/GetInfo 0.099480 4 2021-05-26T20:19:48.146172+0000
...
DEBUG 2021-05-26 20:19:48,238 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering enter Started/Primary/Peering/GetLog
...
DEBUG 2021-05-26 20:19:48,238 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering enter Started/Primary/Peering/GetMissing
...
DEBUG 2021-05-26 20:19:48,238 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering enter Started/Primary/Peering/WaitUpThru
...
DEBUG 2021-05-26 20:19:49,139 [shard 0] osd -  pg_epoch 15 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating enter Started/Primary/Active
...
DEBUG 2021-05-26 20:19:49,142 [shard 0] osd -  pg_epoch 15 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+activating enter Started/Primary/Active/Activating
...
DEBUG 2021-05-26 20:19:49,204 [shard 0] osd -  pg_epoch 15 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/0 les/c/f=15/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 active enter Started/Primary/Active/Recovered
...
DEBUG 2021-05-26 20:19:49,204 [shard 0] osd -  pg_epoch 15 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/0 les/c/f=15/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 active enter Started/Primary/Active/Clean
...
DEBUG 2021-05-26 20:22:31,223 [shard 0] osd -  pg_epoch 86 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 active enter Reset
...
<a lot of flipping>
...
DEBUG 2021-05-26 20:24:07,851 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown activate_map
DEBUG 2021-05-26 20:24:07,851 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown exit Reset 0.035744 1 2021-05-26T20:24:07.817331+0000
INFO  2021-05-26 20:24:07,851 [shard 0] osd - Exiting state: Reset, entered at 1622060647.8158188, 1622060647.8173316 spent on 1 events
DEBUG 2021-05-26 20:24:07,851 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown enter Started
INFO  2021-05-26 20:24:07,851 [shard 0] osd - Entering state: Started
DEBUG 2021-05-26 20:24:07,851 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown enter Start
INFO  2021-05-26 20:24:07,851 [shard 0] osd - Entering state: Start
INFO  2021-05-26 20:24:07,851 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown state<Start>: transitioning to Primary
DEBUG 2021-05-26 20:24:07,851 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown exit Start 0.000041 0 0.000000
INFO  2021-05-26 20:24:07,851 [shard 0] osd - Exiting state: Start, entered at 1622060647.8516333, 0.0 spent on 0 events
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown enter Started/Primary
INFO  2021-05-26 20:24:07,852 [shard 0] osd - Entering state: Started/Primary
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown enter Started/Primary/Peering
INFO  2021-05-26 20:24:07,852 [shard 0] osd - Entering state: Started/Primary/Peering
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering enter Started/Primary/Peering/GetInfo
INFO  2021-05-26 20:24:07,852 [shard 0] osd - Entering state: Started/Primary/Peering/GetInfo
...
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering build_prior all_probe 0,1,4
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering build_prior maybe_rw interval:139, acting: 0
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering build_prior final: probe 0,1,4 down  blocked_by {}
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering up_thru 125 < same_since 163, must notify monitor
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering state<Started/Primary/Peering/GetInfo>:  no prior_set down osds, clearing prior_readable_until_ub
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering state<Started/Primary/Peering/GetInfo>:  querying info from osd.0
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering state<Started/Primary/Peering/GetInfo>:  querying info from osd.4
...
DEBUG 2021-05-26 20:24:07,924 [shard 0] ms - [osd.1(cluster) v2:172.21.15.39:6803/34727@61064 >> osd.4 v2:172.21.15.62:6802/34686] connect to existing
DEBUG 2021-05-26 20:24:07,924 [shard 0] ms - [osd.1(cluster) v2:172.21.15.39:6803/34727@61064 >> osd.4 v2:172.21.15.62:6802/34686] --> #62 === pg_query2(2.7 2.7 query(info 0'0 epoch_sent 163) e163/163) v1 (131)
...
DEBUG 2021-05-26 20:24:07,942 [shard 0] ms - [osd.1(cluster) v2:172.21.15.39:6803/34727@61064 >> osd.4 v2:172.21.15.62:6802/34686] GOT AckFrame: seq=62
...
<plenty of osd_ping messanging but no reply to the pg_query for 2.7>
...
DEBUG 2021-05-26 20:58:19,829 [shard 0] ms - [osd.1(hb_front) v2:172.21.15.39:6807/34727 >> osd.4 v2:172.21.15.62:6807/34686@54816] <== ceph#772 =
== osd_ping(ping e17 up_from 10 ping_stamp 2021-05-26T20:58:19.825573+0000/2319.780029297s send_stamp 2319.780029297s) v5 (70)
DEBUG 2021-05-26 20:58:19,829 [shard 0] ms - [osd.1(hb_front) v2:172.21.15.39:6807/34727 >> osd.4 v2:172.21.15.62:6807/34686@54816] --> ceph#772 === osd_ping(ping_reply e249 up_from 10 ping_stamp 2021-05-26T20:58:19.825573+0000/2319.780029297s send_stamp 2320.039062500s) v5 (70
```

The peering request got stuck due to awaiting for `OSDMap`.

```
DEBUG 2021-05-26 20:24:07,930 [shard 0] ms - [osd.4(cluster) v2:172.21.15.62:6802/34686 >> osd.1 v2:172.21.15.39:6803/34727@61064] <== #62 === pg_query2(2.7 2.7 query(info 0'0 epoch_sent 163) e163/163) v1 (131)
DEBUG 2021-05-26 20:24:07,930 [shard 0] osd - handle_peering_op on 2.7 from 1
DEBUG 2021-05-26 20:24:07,930 [shard 0] osd - peering_event(id=517, detail=PeeringEvent(from=1 pgid=2.7 sent=163 requested=163 evt=epoch_sent: 163 epoch_requested: 163 MQuery 2.7 from 1 query_epoch 163 query: query(info 0'0 epoch_sent 163))): star
```

```
INFO  2021-05-26 20:19:49,127 [shard 0] osd - evt epoch is 15, i have 14, will wait
INFO  2021-05-26 20:19:49,128 [shard 0] osd - osdmap_subscribe(14)
DEBUG 2021-05-26 20:19:49,128 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@63208 >> mon.1 v2:172.21.15.62:3300/0] --> #9 === mon_s
ubscribe({osdmap=14}) v3 (15)
...
INFO  2021-05-26 20:19:49,131 [shard 0] osd - handle_osd_map osd_map(14..15 src has 1..15) v4
INFO  2021-05-26 20:19:49,131 [shard 0] osd - handle_osd_map epochs [14..15], i have 15, src has [1..15]
...
INFO  2021-05-26 20:19:49,138 [shard 0] osd - handle_osd_map osd_map(14..15 src has 1..15) v4
INFO  2021-05-26 20:19:49,138 [shard 0] osd - handle_osd_map epochs [14..15], i have 15, src has [1..15]
...
INFO  2021-05-26 20:19:49,139 [shard 0] osd - evt epoch is 15, i have 14, will wait
INFO  2021-05-26 20:19:49,141 [shard 0] osd - osdmap_subscribe(14)
WARN  2021-05-26 20:19:49,141 [shard 0] monc - renew_subs - empty
...
INFO  2021-05-26 20:19:50,140 [shard 0] osd - handle_osd_map osd_map(15..16 src has 1..16) v4
INFO  2021-05-26 20:19:50,140 [shard 0] osd - handle_osd_map epochs [15..16], i have 15, src has [1..16]
DEBUG 2021-05-26 20:19:50,141 [shard 0] bluestore - do_transaction
INFO  2021-05-26 20:19:50,145 [shard 0] osd - osd.4: committed_osd_maps(16, 16)
...
INFO  2021-05-26 20:20:42,881 [shard 0] osd - handle_osd_map epochs [16..17], i have 16, src has [1..17]
DEBUG 2021-05-26 20:20:42,882 [shard 0] bluestore - do_transaction
INFO  2021-05-26 20:20:42,886 [shard 0] osd - osd.4: committed_osd_maps(17, 17)
...
INFO  2021-05-26 20:20:43,941 [shard 0] osd - evt epoch is 18, i have 17, will wait
INFO  2021-05-26 20:20:43,941 [shard 0] osd - osdmap_subscribe(17)
...
INFO  2021-05-26 20:20:43,957 [shard 0] osd - evt epoch is 18, i have 17, will wait
INFO  2021-05-26 20:20:43,957 [shard 0] osd - osdmap_subscribe(17)
...
INFO  2021-05-26 20:20:43,969 [shard 0] osd - evt epoch is 18, i have 17, will wait
INFO  2021-05-26 20:20:43,969 [shard 0] osd - osdmap_subscribe(17)
...
DEBUG 2021-05-26 20:20:46,930 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@57288 >> mon.2 v2:172.21.15.39:3301/0] <== #4 === osd_m
ap(20..21 src has 1..21) v4 (41)
INFO  2021-05-26 20:20:46,930 [shard 0] osd - handle_osd_map osd_map(20..21 src has 1..21) v4
INFO  2021-05-26 20:20:46,930 [shard 0] osd - handle_osd_map epochs [20..21], i have 17, src has [1..21]
INFO  2021-05-26 20:20:46,930 [shard 0] osd - handle_osd_map message skips epochs 18..19
INFO  2021-05-26 20:20:46,930 [shard 0] osd - osdmap_subscribe(18)
...
DEBUG 2021-05-26 20:20:47,936 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@57288 >> mon.2 v2:172.21.15.39:3301/0] <== #5 === osd_m
ap(21..22 src has 1..22) v4 (41)
INFO  2021-05-26 20:20:47,936 [shard 0] osd - handle_osd_map osd_map(21..22 src has 1..22) v4
INFO  2021-05-26 20:20:47,936 [shard 0] osd - handle_osd_map epochs [21..22], i have 17, src has [1..22]
INFO  2021-05-26 20:20:47,936 [shard 0] osd - handle_osd_map message skips epochs 18..20
INFO  2021-05-26 20:20:47,936 [shard 0] osd - osdmap_subscribe(18)
...
<osdmap_subscribe(18) over and over>
```

```
2021-05-26T20:19:42.048+0000 7f4712ffd700  1 -- [v2:172.21.15.62:3300/0,v1:172.21.15.62:6789/0] <== osd.4 v2:172.21.15.62:6801/34686 4 ==== mon_subscribe({mgrmap=0+,osd_pg_creates=0+,osdmap=0+}) v3 ==== 82+0+0 (secure 0 0 0) 0x7f46fc04e150 con 0x7f470401c480
2021-05-26T20:19:42.048+0000 7f4712ffd700 20 mon.b@1(peon) e1 _ms_dispatch existing session 0x7f46fc02f500 for osd.4
2021-05-26T20:19:42.048+0000 7f4712ffd700 20 mon.b@1(peon) e1  entity_name osd.4 global_id 4168 (new_ok) caps allow *
2021-05-26T20:19:42.048+0000 7f4712ffd700 10 mon.b@1(peon) e1 handle_subscribe mon_subscribe({mgrmap=0+,osd_pg_creates=0+,osdmap=0+}) v3
...
2021-05-26T20:19:49.129+0000 7f4712ffd700  1 -- [v2:172.21.15.62:3300/0,v1:172.21.15.62:6789/0] <== osd.4 v2:172.21.15.62:6801/34686 9 ==== mo
n_subscribe({osdmap=14}) v3 ==== 36+0+0 (secure 0 0 0) 0x7f46e8556210 con 0x7f470401c480
2021-05-26T20:19:49.129+0000 7f4712ffd700 20 mon.b@1(peon) e1 _ms_dispatch existing session 0x7f46fc02f500 for osd.4
2021-05-26T20:19:49.129+0000 7f4712ffd700 20 mon.b@1(peon) e1  entity_name osd.4 global_id 4168 (new_ok) caps allow *
2021-05-26T20:19:49.129+0000 7f4712ffd700 10 mon.b@1(peon) e1 handle_subscribe mon_subscribe({osdmap=14}) v3
2021-05-26T20:19:49.129+0000 7f4712ffd700 20 is_capable service=mon command= read addr v2:172.21.15.62:6801/34686 on cap allow *
2021-05-26T20:19:49.129+0000 7f4712ffd700 20  allow so far , doing grant allow *
2021-05-26T20:19:49.129+0000 7f4712ffd700 20  allow all
2021-05-26T20:19:49.129+0000 7f4712ffd700 20 is_capable service=osd command= read addr v2:172.21.15.62:6801/34686 on cap allow *
2021-05-26T20:19:49.129+0000 7f4712ffd700 20  allow so far , doing grant allow *
2021-05-26T20:19:49.129+0000 7f4712ffd700 20  allow all
2021-05-26T20:19:49.129+0000 7f4712ffd700 10 mon.b@1(peon).osd e15 check_osdmap_sub 0x7f46e84f0150 next 14 (onetime)
2021-05-26T20:19:49.129+0000 7f4712ffd700  5 mon.b@1(peon).osd e15 send_incremental [14..15] to osd.4
2021-05-26T20:19:49.129+0000 7f4712ffd700 10 mon.b@1(peon).osd e15 build_incremental [14..15] with features 3f01cfbb7ffdffff
2021-05-26T20:19:49.129+0000 7f4712ffd700 20 mon.b@1(peon).osd e15 build_incremental    inc 15 622 bytes
2021-05-26T20:19:49.129+0000 7f4712ffd700 20 mon.b@1(peon).osd e15 build_incremental    inc 14 578 bytes
2021-05-26T20:19:49.129+0000 7f4712ffd700  1 -- [v2:172.21.15.62:3300/0,v1:172.21.15.62:6789/0] --> v2:172.21.15.62:6801/34686 -- osd_map(14..
15 src has 1..15) v4 -- 0x7f46e856a100 con 0x7f470401c480
```

```
seastar::future<> Client::renew_subs()
{
  if (!sub.have_new()) {
    logger().warn("{} - empty", __func__);
    return seastar::now();
  }
  logger().trace("{}", __func__);

  auto m = crimson::make_message<MMonSubscribe>();
  m->what = sub.get_subs();
  m->hostname = ceph_get_short_hostname();
  return send_message(std::move(m)).then([this] {
    sub.renewed();
  });
}
```

```
INFO  2021-05-26 20:19:42,081 [shard 0] osd - osdmap_subscribe(1)
DEBUG 2021-05-26 20:19:42,081 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@63208 >> mon.1 v2:172.21.15.62:3300/0] --> #6 === mon_s
ubscribe({osdmap=1}) v3 (15)
...
INFO  2021-05-26 20:19:49,128 [shard 0] osd - osdmap_subscribe(14)
DEBUG 2021-05-26 20:19:49,128 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@63208 >> mon.1 v2:172.21.15.62:3300/0] --> #9 === mon_subscribe({osdmap=14}) v3 (15)
...
INFO  2021-05-26 20:19:49,141 [shard 0] osd - osdmap_subscribe(14)
WARN  2021-05-26 20:19:49,141 [shard 0] monc - renew_subs - empty
<no MMonSubcribe>
...
INFO  2021-05-26 20:20:43,941 [shard 0] osd - evt epoch is 18, i have 17, will wait
INFO  2021-05-26 20:20:43,941 [shard 0] osd - osdmap_subscribe(17)
<no MMonSubcribe>
...
INFO  2021-05-26 20:20:46,930 [shard 0] osd - handle_osd_map message skips epochs 18..19
INFO  2021-05-26 20:20:46,930 [shard 0] osd - osdmap_subscribe(18)
<no MMonSubcribe>
```

[1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-05-26_12:20:26-rados-master-distro-basic-smithi/6136908

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
liewegas pushed a commit that referenced this pull request Nov 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants