Project

General

Profile

Actions

Bug #66231

closed

msg/AsyncMessenger: l_msgr_active_connections numerical anomaly

Added by Yite Gu almost 2 years ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

100%

Source:
Development
Backport:
quincy,reef,squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Messenger
Pull request ID:
Tags (freeform):
backport_processed
Fixed In:
v19.3.0-3679-gc9f8088413
Released In:
v20.2.0~2412
Upkeep Timestamp:
2025-11-01T01:33:34+00:00

Description

This issue occur after cluster startup, no need to reproduce conditions. Occurred in the mon process:

[root@rook-ceph-tools-5994c8d987-mnbhj /]# ceph tell mon.al perf dump | grep msgr_active_connections
        "msgr_active_connections": 18446744073709489389,
        "msgr_active_connections": 18446744073707700685,
        "msgr_active_connections": 18446744073709110714,
[root@rook-ceph-tools-5994c8d987-mnbhj /]# ceph tell mon.ak perf dump | grep msgr_active_connections
        "msgr_active_connections": 18446744073706728600,
        "msgr_active_connections": 18446744073709551609,
        "msgr_active_connections": 18446744073709550369,
[root@rook-ceph-tools-5994c8d987-mnbhj /]# ceph tell mon.am perf dump | grep msgr_active_connections
        "msgr_active_connections": 18446744073706156105,
        "msgr_active_connections": 18446744073706545507,
        "msgr_active_connections": 18446744073709480831,


Related issues 3 (0 open3 closed)

Copied to RADOS - Backport #68663: reef: msg/AsyncMessenger: l_msgr_active_connections numerical anomalyResolvedMOHIT AGRAWALActions
Copied to RADOS - Backport #68664: quincy: msg/AsyncMessenger: l_msgr_active_connections numerical anomalyRejectedMOHIT AGRAWALActions
Copied to RADOS - Backport #68665: squid: msg/AsyncMessenger: l_msgr_active_connections numerical anomalyResolvedMOHIT AGRAWALActions
Actions #1

Updated by Yite Gu almost 2 years ago

  • Component(RADOS) Messenger added
Actions #2

Updated by Radoslaw Zarzynski almost 2 years ago

Note from scrub: uninitialized memory under the counter?

Actions #3

Updated by Radoslaw Zarzynski almost 2 years ago

  • Assignee set to MOHIT AGRAWAL
Actions #4

Updated by MOHIT AGRAWAL almost 2 years ago

  • Pull request ID set to 57951

I am able to reproduce an issue after follow the steps

1) kill all ceph processes
2) Start ceph.mon currently mon would not have a connection with anyone
3) Now put a break point on accept_conn in gdb for ceph.mon and continue
4) From another terminal throw a cli command "ceph tell mon.a perf dump | grep msgr_active_connections"
5) Wait sometime on gdb prompt so that a client will send a stop request to mon and it will call dec
option for the connection
6) type c on gdb prompt to finish all accept_conn request once all request will complete the
high value will be print on the other terminal something like below
ceph tell mon.a perf dump | grep msgr_active_connections
"msgr_active_connections": 18446744073709551613,
"msgr_active_connections": 18446744073709551614,
"msgr_active_connections": 18446744073709551615,

The daemon is showing high value only while a daemon is getting an unregister_conn
request before completing accept_conn request successfully. As I put a break point
on accept_conn request and if client is not getting a response it is sending a disconnect
and the server daemon is trying to handle a unregister_conn request without checking
the connection was accepted or not so in that case it is easily reproducible.

Actions #5

Updated by Radoslaw Zarzynski almost 2 years ago

Approved the PR.

Actions #6

Updated by Laura Flores almost 2 years ago

  • Status changed from New to Fix Under Review
Actions #7

Updated by Dan van der Ster over 1 year ago ยท Edited

  • Backport set to quincy,reef,squid

I've seen this as far back as pacific (e.g. see the perf dump below.) Can we backport this please?

    "AsyncMessenger::Worker-1": {
        "msgr_recv_messages": 55616369430,
        "msgr_send_messages": 47821870033,
        "msgr_recv_bytes": 52943301798829,
        "msgr_send_bytes": 44484760296991,
        "msgr_created_connections": 1199096,
        "msgr_active_connections": 18446744073708857178,
        "msgr_running_total_time": 1439133.231844527,
        "msgr_running_send_time": 579385.165429560,
        "msgr_running_recv_time": 4353254.942779597,
        "msgr_running_fast_dispatch_time": 99387.256271816,
        "msgr_send_messages_queue_lat": {
            "avgcount": 47805865674,
            "sum": 365781017.221541607,
            "avgtime": 0.007651383
        },
        "msgr_handle_ack_lat": {
            "avgcount": 74599913403,
            "sum": 6636.247815068,
            "avgtime": 0.000000088
        }
    },
Actions #8

Updated by MOHIT AGRAWAL over 1 year ago

  • Status changed from Fix Under Review to Pending Backport
Actions #9

Updated by Upkeep Bot over 1 year ago

  • Copied to Backport #68663: reef: msg/AsyncMessenger: l_msgr_active_connections numerical anomaly added
Actions #10

Updated by Upkeep Bot over 1 year ago

  • Copied to Backport #68664: quincy: msg/AsyncMessenger: l_msgr_active_connections numerical anomaly added
Actions #11

Updated by Upkeep Bot over 1 year ago

  • Copied to Backport #68665: squid: msg/AsyncMessenger: l_msgr_active_connections numerical anomaly added
Actions #12

Updated by Upkeep Bot over 1 year ago

  • Tags (freeform) set to backport_processed
Actions #13

Updated by Konstantin Shalygin about 1 year ago

  • Status changed from Pending Backport to Resolved
  • Target version set to v20.0.0
  • % Done changed from 0 to 100
  • Source set to Development
Actions #14

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to c9f8088413ab65a0c50ac053cba55273de804ff6
  • Fixed In set to v19.3.0-3679-gc9f8088413a
  • Upkeep Timestamp set to 2025-07-11T08:43:26+00:00
Actions #15

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v19.3.0-3679-gc9f8088413a to v19.3.0-3679-gc9f8088413
  • Upkeep Timestamp changed from 2025-07-11T08:43:26+00:00 to 2025-07-14T22:43:39+00:00
Actions #16

Updated by Upkeep Bot 5 months ago

  • Released In set to v20.2.0~2412
  • Upkeep Timestamp changed from 2025-07-14T22:43:39+00:00 to 2025-11-01T01:33:34+00:00
Actions

Also available in: Atom PDF