Bug #66231
closedmsg/AsyncMessenger: l_msgr_active_connections numerical anomaly
100%
Description
This issue occur after cluster startup, no need to reproduce conditions. Occurred in the mon process:
[root@rook-ceph-tools-5994c8d987-mnbhj /]# ceph tell mon.al perf dump | grep msgr_active_connections
"msgr_active_connections": 18446744073709489389,
"msgr_active_connections": 18446744073707700685,
"msgr_active_connections": 18446744073709110714,
[root@rook-ceph-tools-5994c8d987-mnbhj /]# ceph tell mon.ak perf dump | grep msgr_active_connections
"msgr_active_connections": 18446744073706728600,
"msgr_active_connections": 18446744073709551609,
"msgr_active_connections": 18446744073709550369,
[root@rook-ceph-tools-5994c8d987-mnbhj /]# ceph tell mon.am perf dump | grep msgr_active_connections
"msgr_active_connections": 18446744073706156105,
"msgr_active_connections": 18446744073706545507,
"msgr_active_connections": 18446744073709480831,
Updated by Radoslaw Zarzynski almost 2 years ago
Note from scrub: uninitialized memory under the counter?
Updated by MOHIT AGRAWAL almost 2 years ago
- Pull request ID set to 57951
I am able to reproduce an issue after follow the steps
1) kill all ceph processes
2) Start ceph.mon currently mon would not have a connection with anyone
3) Now put a break point on accept_conn in gdb for ceph.mon and continue
4) From another terminal throw a cli command "ceph tell mon.a perf dump | grep msgr_active_connections"
5) Wait sometime on gdb prompt so that a client will send a stop request to mon and it will call dec
option for the connection
6) type c on gdb prompt to finish all accept_conn request once all request will complete the
high value will be print on the other terminal something like below
ceph tell mon.a perf dump | grep msgr_active_connections
"msgr_active_connections": 18446744073709551613,
"msgr_active_connections": 18446744073709551614,
"msgr_active_connections": 18446744073709551615,
The daemon is showing high value only while a daemon is getting an unregister_conn
request before completing accept_conn request successfully. As I put a break point
on accept_conn request and if client is not getting a response it is sending a disconnect
and the server daemon is trying to handle a unregister_conn request without checking
the connection was accepted or not so in that case it is easily reproducible.
Updated by Laura Flores almost 2 years ago
- Status changed from New to Fix Under Review
Updated by Dan van der Ster over 1 year ago ยท Edited
- Backport set to quincy,reef,squid
I've seen this as far back as pacific (e.g. see the perf dump below.) Can we backport this please?
"AsyncMessenger::Worker-1": {
"msgr_recv_messages": 55616369430,
"msgr_send_messages": 47821870033,
"msgr_recv_bytes": 52943301798829,
"msgr_send_bytes": 44484760296991,
"msgr_created_connections": 1199096,
"msgr_active_connections": 18446744073708857178,
"msgr_running_total_time": 1439133.231844527,
"msgr_running_send_time": 579385.165429560,
"msgr_running_recv_time": 4353254.942779597,
"msgr_running_fast_dispatch_time": 99387.256271816,
"msgr_send_messages_queue_lat": {
"avgcount": 47805865674,
"sum": 365781017.221541607,
"avgtime": 0.007651383
},
"msgr_handle_ack_lat": {
"avgcount": 74599913403,
"sum": 6636.247815068,
"avgtime": 0.000000088
}
},
Updated by MOHIT AGRAWAL over 1 year ago
- Status changed from Fix Under Review to Pending Backport
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #68663: reef: msg/AsyncMessenger: l_msgr_active_connections numerical anomaly added
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #68664: quincy: msg/AsyncMessenger: l_msgr_active_connections numerical anomaly added
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #68665: squid: msg/AsyncMessenger: l_msgr_active_connections numerical anomaly added
Updated by Upkeep Bot over 1 year ago
- Tags (freeform) set to backport_processed
Updated by Konstantin Shalygin about 1 year ago
- Status changed from Pending Backport to Resolved
- Target version set to v20.0.0
- % Done changed from 0 to 100
- Source set to Development
Updated by Upkeep Bot 8 months ago
- Merge Commit set to c9f8088413ab65a0c50ac053cba55273de804ff6
- Fixed In set to v19.3.0-3679-gc9f8088413a
- Upkeep Timestamp set to 2025-07-11T08:43:26+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v19.3.0-3679-gc9f8088413a to v19.3.0-3679-gc9f8088413
- Upkeep Timestamp changed from 2025-07-11T08:43:26+00:00 to 2025-07-14T22:43:39+00:00
Updated by Upkeep Bot 5 months ago
- Released In set to v20.2.0~2412
- Upkeep Timestamp changed from 2025-07-14T22:43:39+00:00 to 2025-11-01T01:33:34+00:00