Project

General

Profile

Actions

Bug #58049

closed

mon:stretch-cluster: mishandled removed_ranks -> inconsistent peer_tracker leading to unable to form quorum

Added by Kamoltat (Junior) Sirivadhna over 3 years ago. Updated 8 months ago.

Status:
Resolved
Priority:
Urgent
Category:
Stretch Clusters
Target version:
-
% Done:

0%

Source:
Backport:
pacific,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Tags (freeform):
Fixed In:
v18.0.0-1520-g4aa8af29aec
Released In:
v18.2.0~796
Upkeep Timestamp:
2025-07-12T22:08:32+00:00

Description

First encountered in the downstream: https://bugzilla.redhat.com/show_bug.cgi?id=2142674

When we failover monitors many times in the stretch cluster, there are instances where Ceph becomes
unresponsive due to monitors not being able to form a quorum.

We have investigated and concluded that this is due to how we mishandled `removed_ranks` in MonMap which
leads to inconsistent peer_tracker which then leads to deadlock election state of the monitor, which means
they cannot form a quorum -> ceph becomes unresponsive.


Related issues 3 (0 open3 closed)

Related to RADOS - Bug #58107: mon-stretch: old stretch_marked_down_mons leads to ceph unresponsiveClosedKamoltat (Junior) Sirivadhna

Actions
Copied to RADOS - Backport #58380: pacific: mon:stretch-cluster: mishandled removed_ranks -> inconsistent peer_tracker leading to unable to form quorumResolvedKamoltat (Junior) SirivadhnaActions
Copied to RADOS - Backport #58381: quincy: mon:stretch-cluster: mishandled removed_ranks -> inconsistent peer_tracker leading to unable to form quorumResolvedKamoltat (Junior) SirivadhnaActions
Actions #1

Updated by Radoslaw Zarzynski over 3 years ago

  • Pull request ID set to 48991
Actions #2

Updated by Kamoltat (Junior) Sirivadhna over 3 years ago

  • Related to Bug #58107: mon-stretch: old stretch_marked_down_mons leads to ceph unresponsive added
Actions #3

Updated by Radoslaw Zarzynski about 3 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to pacific,quincy
Actions #4

Updated by Upkeep Bot about 3 years ago

  • Copied to Backport #58380: pacific: mon:stretch-cluster: mishandled removed_ranks -> inconsistent peer_tracker leading to unable to form quorum added
Actions #5

Updated by Upkeep Bot about 3 years ago

  • Copied to Backport #58381: quincy: mon:stretch-cluster: mishandled removed_ranks -> inconsistent peer_tracker leading to unable to form quorum added
Actions #7

Updated by Kamoltat (Junior) Sirivadhna about 3 years ago

  • Status changed from Pending Backport to Resolved
Actions #8

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to 4aa8af29aec32c3378bc796e7b1fd2170f41df8f
  • Fixed In set to v18.0.0-1520-g4aa8af29aec
  • Released In set to v18.2.0~796
  • Upkeep Timestamp set to 2025-07-12T22:08:32+00:00
Actions

Also available in: Atom PDF