Project

General

Profile

Actions

Bug #67230

open

mgr: should be declared available only after all python modules have been loaded

Added by Milind Changire over 1 year ago. Updated about 2 months ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Q/A
Backport:
reef,squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Tags (freeform):
temp-assign
Merge Commit:
Fixed In:
Released In:
Upkeep Timestamp:

Description

mgr tests for presence of module.py file in all module dirs and declares availability before actually loading the module.py file and creating the Module sub-class

This behavior causes denial-of-service even after mgr dump says that the mgr is available for service.

This inconsistency needs to be fixed.


Related issues 3 (2 open1 closed)

Related to CephFS - Bug #68747: fs:upgrade failure due to ceph-mgr not being readyDuplicateMilind Changire

Actions
Related to CephFS - Bug #70456: qa: Command failed on smithi012 with status 124: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph fs volume ls'TriagedMahesh Mohan

Actions
Related to mgr - Bug #71631: Commands using Mgr Modules fail if run immediately post a mgr failover/ restartPending BackportLaura Flores

Actions
Actions #1

Updated by Milind Changire over 1 year ago

  • Description updated (diff)
Actions #2

Updated by Patrick Donnelly over 1 year ago

Few things:

- this should be a mgr ticket, not cephfs
- there has been discussion about this recently, I think beginning here: https://github.com/ceph/ceph/pull/51169#issuecomment-1574039892
- make sure to sync with Ilya before proposing any changes.

Actions #3

Updated by Venky Shankar over 1 year ago

Patrick Donnelly wrote in #note-2:

Few things:

- this should be a mgr ticket, not cephfs
- there has been discussion about this recently, I think beginning here: https://github.com/ceph/ceph/pull/51169#issuecomment-1574039892
- make sure to sync with Ilya before proposing any changes.

I added this for CDM discussion under core: https://tracker.ceph.com/projects/ceph/wiki/CDM_07-AUG-2024#Core

This is in cephfs project for tracking purposes.

Actions #4

Updated by Venky Shankar over 1 year ago

  • Category set to Correctness/Safety
  • Status changed from New to Triaged
  • Assignee set to Milind Changire
  • Target version set to v20.0.0
  • Source set to Q/A
  • Backport set to quincy,reef,squid
Actions #5

Updated by Venky Shankar over 1 year ago

@Milind Changire Suggest to update the proposed way forward here in this tracker so that a single place has all the updates regarding the discussions. Right now the discussion lives here0 and this got discussed in CDM.

[0]: https://github.com/ceph/ceph/pull/51169#issuecomment-1574039892

Actions #6

Updated by Milind Changire over 1 year ago

My PR is a attempt to send the beacon from the mgr to the mon only after all the lambda functions that load individual modules have had one pass at loading the modules.

This PR should be used as a baseline by other teams to evaluate if it is sufficient for their testing without any random sleep() to work around to the module availability.

Actions #7

Updated by Milind Changire over 1 year ago

  • Pull request ID set to 59089
Actions #8

Updated by Venky Shankar over 1 year ago

  • Related to Bug #68747: fs:upgrade failure due to ceph-mgr not being ready added
Actions #9

Updated by Konstantin Shalygin about 1 year ago

  • Backport changed from quincy,reef,squid to reef,squid
Actions #10

Updated by Konstantin Shalygin about 1 year ago

  • Status changed from Triaged to Fix Under Review
Actions #11

Updated by Milind Changire 12 months ago

  • Related to Bug #70456: qa: Command failed on smithi012 with status 124: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph fs volume ls' added
Actions #12

Updated by Laura Flores 9 months ago

  • Related to Bug #71631: Commands using Mgr Modules fail if run immediately post a mgr failover/ restart added
Actions #13

Updated by Laura Flores 9 months ago

  • Related to deleted (Bug #71631: Commands using Mgr Modules fail if run immediately post a mgr failover/ restart)
Actions #14

Updated by Laura Flores 9 months ago

  • Related to Bug #71631: Commands using Mgr Modules fail if run immediately post a mgr failover/ restart added
Actions #15

Updated by Venky Shankar about 2 months ago

  • Assignee changed from Milind Changire to Mahesh Mohan
  • Tags (freeform) set to temp-assign

@Milind Changire is moving away from CephFS development. Assigning this to @Mahesh Mohan in the interim.

Actions

Also available in: Atom PDF