Skip to content

mgr: ensure that all modules have started before advertising active mgr#63859

Merged
batrick merged 5 commits intoceph:mainfrom
ljflores:wip-bz-2314146
Mar 20, 2026
Merged

mgr: ensure that all modules have started before advertising active mgr#63859
batrick merged 5 commits intoceph:mainfrom
ljflores:wip-bz-2314146

Conversation

@ljflores
Copy link
Member

@ljflores ljflores commented Jun 10, 2025

When the mgr is restarted or failed over via ceph mgr fail or during an
upgrade, mgr modules sometimes take longer to start up (this includes
loading their class, commands, and module options, and being removed
from the pending_modules map structure). This startup delay can happen
due to a cluster's specific hardware or if a code bottleneck is triggered in
a module’s serve() function (each mgr module has a serve() function that
performs initialization tasks right when the module is loaded).

When this startup delay occurs, any mgr module command issued against the
cluster around the same time fails with error saying that the command is not
supported:

$ ceph mgr fail; ceph fs volume ls
Error ENOTSUP: Warning: due to ceph-mgr restart, some PG states may not be up to date
Module 'volumes' is not enabled/loaded (required by command 'fs volume ls'): use `ceph mgr module enable volumes` to enable it

We should try to lighten any bottlenecks in the mgr module serve()
functions wherever possible, but the root cause of this failure is that the
mgr sends a beacon to the mon too early, indicating that it is active before
the module loading has completed. Specifically, some of the mgr modules
have loaded their class but have not yet been deleted from the pending_modules
structure, indicating that they have not finished starting up.

This commit improves the criteria for sending the “active” beacon to the mon so
the mgr does not signal that it’s active too early. We do this through the following additions:

  1. A new context ActivePyModules::recheck_modules_start that will be set if not all modules
    have finished startup.

  2. A new function ActivePyModules::check_all_modules_started() that checks if modules are
    still pending startup; if all have started up (pending_modules is empty), then we send
    the beacon right away. But if some are still pending, we pass the beacon task on to the new
    recheck context ActivePyModules::recheck_modules_start so we know to send the beacon later.

  3. Logic in ActivePyModules::start_one() that only gets triggered if the modules did not all finish
    startup the first time we checked. We know this is the case if the new recheck context
    recheck_modules_start was set from nullptr. The beacon is only sent once pending_modules is
    confirmed to be empty, which means that all the modules have started up and are ready to support commands.

  4. Adjustment of when the booleans initializing and initialized are set. These booleans come into play in
    MgrStandby::send_beacon() when we check that the active mgr has been initialized (thus, it is available).
    We only send the beacon when this boolean is set. Currently, we set these booleans at the end of Mgr::init(),
    which means that it gets set early before pending_modules is clear. With this adjustment, the bools are set
    only after we check that all modules have started up. The send_beacon code is triggered on mgr failover AND on
    every Mgr::tick(), which occurs by default every two seconds. If we don’t adjust when these bools are set, we
    only fix the mgr failover part, but the mgr still sends the beacon too early via Mgr::tick(). Below is the relevant
    code from MgrStandby::send_beacon(), which is triggered in Mgr::background_init() AND in Mgr::tick():

  // Whether I think I am available (request MgrMonitor to set me
  // as available in the map)
  bool available = active_mgr != nullptr && active_mgr->is_initialized();

  auto addrs = available ? active_mgr->get_server_addrs() : entity_addrvec_t();
  dout(10) << "sending beacon as gid " << monc.get_global_id() << dendl;

At face value, this issue is indeterministically reproducible since it
can depend on environmental factors or specific cluster workloads.
However, I was able to deterministically reproduce it by injecting a
bottleneck into the balancer module:

diff --git a/src/pybind/mgr/balancer/module.py b/src/pybind/mgr/balancer/module.py
index https://github.com/ceph/ceph/commit/d12d69ffb01d973307b5eaa341f741811b6c4aa7..91c83fa8023 100644
--- a/src/pybind/mgr/balancer/module.py
+++ b/src/pybind/mgr/balancer/module.py
@@ -772,10 +772,10 @@ class Module(MgrModule):
                     self.update_pg_upmap_activity(plan)  # update pg activity in `balancer status detail`
                 self.optimizing = False
+                # causing a bottleneck
+                for i in range(0, 1000):
+                    for j in range (0, 1000):
+                        x = i + j
+                        self.log.debug("hitting the bottleneck in the balancer module")
             self.log.debug('Sleeping for %d', sleep_interval)
             self.event.wait(sleep_interval)
             self.event.clear()

Then, the error reproduces every time by running:

$ ./bin/ceph mgr fail; ./bin/ceph telemetry show
Error ENOTSUP: Warning: due to ceph-mgr restart, some PG states may not be up to date
Module 'telemetry' is not enabled/loaded (required by command 'telemetry show'): use `ceph mgr module enable telemetry` to enable it

With this fix, the active mgr is marked as "initialized" only after all
the modules have started up, and this error goes away. The command may
take a bit longer to execute depending on the extent of the delay.

Part 2 -- Adding a "max expiration" mechanism + health warning to address extreme loading cases

During a mgr failover, the active mgr is marked available if:
1. The mon has chosen a standby to be active
2. The chosen active mgr has all of its modules initialized

Now that we've improved the criteria for sending the "active" beacon by enforcing it to retry initializing mgr modules, we need to account for extreme cases in which the modules are stuck loading for a very long time, or even indefinitely. In these extreme cases where the modules might never initialize, we don't want to delay sending the "active" beacon for too long. This can result in blocking other important mgr functionality, such as reporting PG availability in the health status. We want to avoid sending warnings about PGs being unknown in the health status when that's not ultimately the problem.

To account for an exceptionally long module loading time, I added a new configurable mgr_module_load_max_expiration. After a certain expiration (by default, 20 seconds), if the mgr modules still haven't finished initializing, the standby will then proceed to mark itself "available" and send the "active" beacon to the mon, which unblocks other critical mgr functionality.

For added clarity, a health error will be issued at this time, indicating which mgr modules got stuck initializing (See src/mgr/PyModuleRegistry.cc). The idea is to unblock the rest of the mgr's critical functionality while making it clear to Ceph operators that some modules are unusable.

This is what the health warning might look like:

$ ceph health detail

HEALTH_ERR 4 mgr modules have failed
[ERR] MGR_MODULE_ERROR: 14 mgr modules have failed
    Module 'rbd_support' has failed: Module failed to initialize.
    Module 'status' has failed: Module failed to initialize.
    Module 'telemetry' has failed: Module failed to initialize.
    Module 'volumes' has failed: Module failed to initialize.

Links to rendered documentation:

Note: All of the new configurations added in this PR are marked as "dev" level, and are to be used only in testing scenarios or in out-of-the-norm troubleshooting scenarios. I have chosen not to document them for this reason.

Fixes: https://tracker.ceph.com/issues/71631
Signed-off-by: Laura Flores lflores@ibm.com

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

@ljflores ljflores requested a review from a team as a code owner June 10, 2025 23:32
@ljflores ljflores requested review from batrick and rzarzynski and removed request for a team June 10, 2025 23:33
@ljflores
Copy link
Member Author

jenkins test windows

@ljflores
Copy link
Member Author

jenkins test submodules

@ljflores ljflores requested review from athanatos and mchangir June 11, 2025 18:18
Copy link
Contributor

@athanatos athanatos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but someone more familiar with the manager should probably have a look as well.

@ljflores
Copy link
Member Author

Changed the commit description; code is the same.

@ljflores
Copy link
Member Author

Further context for testing:

With the current solution, I used the following reproducer, which injects a time-complex bottleneck into the balancer module to cause its startup time to take longer:

diff --git a/src/pybind/mgr/balancer/module.py b/src/pybind/mgr/balancer/module.py
index 476304275c1..d2dfa01e99a 100644
--- a/src/pybind/mgr/balancer/module.py
+++ b/src/pybind/mgr/balancer/module.py
@@ -771,6 +771,11 @@ class Module(MgrModule):
                 if pg_upmap_activity:
                     self.update_pg_upmap_activity(plan)  # update pg activity in `balancer status detail`
                 self.optimizing = False
+            # causing a bottleneck
+            for i in range(0, 1000):
+                for j in range (0, 1000):
+                    x = i + j
+                    self.log.debug("hitting the bottleneck in the balancer module")
             self.log.debug('Sleeping for %d', sleep_interval)
             self.event.wait(sleep_interval)
             self.event.clear()

After injecting the bottleneck, I ran this command on a vstart cluster, which shows that the mgr declared availability too early:

$ ./bin/ceph mgr fail; ./bin/ceph telemetry show --debug_mgrc=20
2025-06-11T18:44:49.771+0000 7f2ed6ffd640 20 mgrc handle_mgr_map mgrmap(e 6)
2025-06-11T18:44:49.772+0000 7f2ed6ffd640  4 mgrc handle_mgr_map Got map version 6
2025-06-11T18:44:49.772+0000 7f2ed6ffd640  4 mgrc handle_mgr_map Active mgr is now 
2025-06-11T18:44:49.772+0000 7f2ed6ffd640  4 mgrc reconnect No active mgr available yet
2025-06-11T18:44:50.117+0000 7f2ef4805640 20 mgrc start_command cmd: [{"prefix": "telemetry show", "target": ["mon-mgr", ""]}]
2025-06-11T18:44:50.117+0000 7f2ef4805640  5 mgrc start_command no mgr session (no running mgr daemon?), waiting
2025-06-11T18:44:54.116+0000 7f2ed6ffd640 20 mgrc handle_mgr_map mgrmap(e 7)
2025-06-11T18:44:54.116+0000 7f2ed6ffd640  4 mgrc handle_mgr_map Got map version 7
2025-06-11T18:44:54.116+0000 7f2ed6ffd640  4 mgrc handle_mgr_map Active mgr is now 
2025-06-11T18:44:54.116+0000 7f2ed6ffd640  4 mgrc reconnect No active mgr available yet
2025-06-11T18:44:56.132+0000 7f2ed6ffd640 20 mgrc handle_mgr_map mgrmap(e 8)
2025-06-11T18:44:56.132+0000 7f2ed6ffd640  4 mgrc handle_mgr_map Got map version 8
2025-06-11T18:44:56.132+0000 7f2ed6ffd640  4 mgrc handle_mgr_map Active mgr is now [v2:127.0.0.1:6800/3720421852,v1:127.0.0.1:6801/3720421852]
2025-06-11T18:44:56.132+0000 7f2ed6ffd640  4 mgrc reconnect Starting new session with [v2:127.0.0.1:6800/3720421852,v1:127.0.0.1:6801/3720421852]
2025-06-11T18:44:56.132+0000 7f2ed6ffd640 10 mgrc reconnect resending 0 (cli)
2025-06-11T18:44:56.135+0000 7f2ed6ffd640 20 mgrc handle_command_reply tid 0 r -95
Error ENOTSUP: Warning: due to ceph-mgr restart, some PG states may not be up to date
Module 'telemetry' is not enabled/loaded (required by command 'telemetry show'): use `ceph mgr module enable telemetry` to enable it

In the mgr log, we can see that mgr send_beacon active occurred before the telemetry module had finished loading, which resulted in the ENOTSUP error:

$ cat out/mgr.x.log | grep "send_beacon active\|I am now activating\|telemetry show"
2025-06-11T18:46:11.817+0000 7f8f5fc20200 20 mgr[py] loaded command telemetry show name=channels,req=false,n=N,type=CephString
2025-06-11T18:46:11.817+0000 7f8f5fc20200 20 mgr[py] loaded command telemetry show-device 
2025-06-11T18:46:11.817+0000 7f8f5fc20200 20 mgr[py] loaded command telemetry show-all 
2025-06-11T18:46:15.905+0000 7f8f5adb8640  1 mgr handle_mgr_map I am now activating
2025-06-11T18:46:17.845+0000 7f8f57db2640 20 mgr send_beacon active
2025-06-11T18:46:17.921+0000 7f8cbf60d640  1 -- [v2:127.0.0.1:6800/976509187,v1:127.0.0.1:6801/976509187] <== client.4402 127.0.0.1:0/273425155 1 ==== mgr_command(tid 0: {"prefix": "telemetry show", "target": ["mon-mgr", ""]}) ==== 79+0+0 (secure 0 0 0) 0x55cd1423a820 con 0x55cd14711400
2025-06-11T18:46:17.921+0000 7f8cbf60d640 10 mgr.server _handle_command decoded-size=2 prefix=telemetry show
2025-06-11T18:46:17.922+0000 7f8cbf60d640 20 is_capable service=py module=telemetry command=telemetry show read addr - on cap allow *
2025-06-11T18:46:17.922+0000 7f8cbf60d640  0 log_channel(audit) log [DBG] : from='client.4402 -' entity='client.admin' cmd=[{"prefix": "telemetry show", "target": ["mon-mgr", ""]}]: dispatch
Module 'telemetry' is not enabled/loaded (required by command 'telemetry show'): use `ceph mgr module enable telemetry` to enable it
Module 'telemetry' is not enabled/loaded (required by command 'telemetry show'): use `ceph mgr module enable telemetry` to enable it
Module 'telemetry' is not enabled/loaded (required by command 'telemetry show'): use `ceph mgr module enable telemetry` to enable it) -- 0x55cd14311380 con 0x55cd14711400
2025-06-11T18:46:18.503+0000 7f8f58db4640 10 log_client  will send 2025-06-11T18:46:17.923865+0000 mgr.x (mgr.4396) 1 : audit [DBG] from='client.4402 -' entity='client.admin' cmd=[{"prefix": "telemetry show", "target": ["mon-mgr", ""]}]: dispatch
2025-06-11T18:46:19.847+0000 7f8f57db2640 20 mgr send_beacon active
2025-06-11T18:46:21.847+0000 7f8f57db2640 20 mgr send_beacon active

In my patch, the mgr declares availability only after pending_modules is empty. Here, the mgr waits to declare availability properly, and the command succeeds:

$ ./bin/ceph mgr fail; ./bin/ceph telemetry show --debug_mgrc=20
2025-06-11T18:48:32.969+0000 7f67277fe640 20 mgrc handle_mgr_map mgrmap(e 19)
2025-06-11T18:48:32.969+0000 7f67277fe640  4 mgrc handle_mgr_map Got map version 19
2025-06-11T18:48:32.969+0000 7f67277fe640  4 mgrc handle_mgr_map Active mgr is now 
2025-06-11T18:48:32.969+0000 7f67277fe640  4 mgrc reconnect No active mgr available yet
2025-06-11T18:48:33.317+0000 7f674d6b9640 20 mgrc start_command cmd: [{"prefix": "telemetry show", "target": ["mon-mgr", ""]}]
2025-06-11T18:48:33.317+0000 7f674d6b9640  5 mgrc start_command no mgr session (no running mgr daemon?), waiting
2025-06-11T18:48:37.368+0000 7f67277fe640 20 mgrc handle_mgr_map mgrmap(e 20)
2025-06-11T18:48:37.368+0000 7f67277fe640  4 mgrc handle_mgr_map Got map version 20
2025-06-11T18:48:37.368+0000 7f67277fe640  4 mgrc handle_mgr_map Active mgr is now 
2025-06-11T18:48:37.368+0000 7f67277fe640  4 mgrc reconnect No active mgr available yet
2025-06-11T18:48:39.378+0000 7f67277fe640 20 mgrc handle_mgr_map mgrmap(e 21)
2025-06-11T18:48:39.378+0000 7f67277fe640  4 mgrc handle_mgr_map Got map version 21
2025-06-11T18:48:39.378+0000 7f67277fe640  4 mgrc handle_mgr_map Active mgr is now 
2025-06-11T18:48:39.378+0000 7f67277fe640  4 mgrc reconnect No active mgr available yet
2025-06-11T18:48:50.818+0000 7f67277fe640 20 mgrc handle_mgr_map mgrmap(e 22)
2025-06-11T18:48:50.818+0000 7f67277fe640  4 mgrc handle_mgr_map Got map version 22
2025-06-11T18:48:50.818+0000 7f67277fe640  4 mgrc handle_mgr_map Active mgr is now [v2:127.0.0.1:6800/3826094627,v1:127.0.0.1:6801/3826094627]
2025-06-11T18:48:50.818+0000 7f67277fe640  4 mgrc reconnect Starting new session with [v2:127.0.0.1:6800/3826094627,v1:127.0.0.1:6801/3826094627]
2025-06-11T18:48:50.818+0000 7f67277fe640 10 mgrc reconnect resending 0 (cli)
2025-06-11T18:48:50.821+0000 7f67277fe640 20 mgrc handle_command_reply tid 0 r 0
Telemetry is off. Please consider opting-in with `ceph telemetry on`.
Preview sample reports with `ceph telemetry preview`.

In the mgr log, we can see that the mgr is sending a "starting" beacon to indicate that it is still starting, and it only sends the "active" beacon after pending_modules is empty. So, the command is ready to be supported:

$ cat out/mgr.x.log | grep "send_beacon active\|I am now activating\|telemetry show"
2025-06-11T18:48:33.278+0000 7f8f6f753200 20 mgr[py] loaded command telemetry show name=channels,req=false,n=N,type=CephString
2025-06-11T18:48:33.278+0000 7f8f6f753200 20 mgr[py] loaded command telemetry show-device 
2025-06-11T18:48:33.278+0000 7f8f6f753200 20 mgr[py] loaded command telemetry show-all 
2025-06-11T18:48:37.368+0000 7f8f6aed3640  1 mgr handle_mgr_map I am now activating
2025-06-11T18:48:39.312+0000 7f8f67ecd640 20 mgr send_beacon active (starting)
2025-06-11T18:48:41.313+0000 7f8f67ecd640 20 mgr send_beacon active (starting)
2025-06-11T18:48:43.314+0000 7f8f67ecd640 20 mgr send_beacon active (starting)
2025-06-11T18:48:45.315+0000 7f8f67ecd640 20 mgr send_beacon active (starting)
2025-06-11T18:48:47.316+0000 7f8f67ecd640 20 mgr send_beacon active (starting)
2025-06-11T18:48:49.316+0000 7f8f67ecd640 20 mgr send_beacon active (starting)
2025-06-11T18:48:50.756+0000 7f8ccff3c640 20 mgr send_beacon active
2025-06-11T18:48:50.819+0000 7f8ccf73b640  1 -- [v2:127.0.0.1:6800/3826094627,v1:127.0.0.1:6801/3826094627] <== client.4469 127.0.0.1:0/3847426830 1 ==== mgr_command(tid 0: {"prefix": "telemetry show", "target": ["mon-mgr", ""]}) ==== 79+0+0 (secure 0 0 0) 0x5645b30ecb60 con 0x5645b3936400
2025-06-11T18:48:50.819+0000 7f8ccf73b640 10 mgr.server _handle_command decoded-size=2 prefix=telemetry show
2025-06-11T18:48:50.821+0000 7f8ccf73b640 20 is_capable service=py module=telemetry command=telemetry show read addr - on cap allow *
2025-06-11T18:48:50.821+0000 7f8ccf73b640  0 log_channel(audit) log [DBG] : from='client.4469 -' entity='client.admin' cmd=[{"prefix": "telemetry show", "target": ["mon-mgr", ""]}]: dispatch
2025-06-11T18:48:50.821+0000 7f8ccf73b640 10 mgr.server _handle_command passing through command 'telemetry show' size 2
2025-06-11T18:48:50.821+0000 7f8cba611640 10 mgr.server operator() dispatching command 'telemetry show' size 2
2025-06-11T18:48:50.964+0000 7f8f68ecf640 10 log_client  will send 2025-06-11T18:48:50.822638+0000 mgr.x (mgr.4461) 1 : audit [DBG] from='client.4469 -' entity='client.admin' cmd=[{"prefix": "telemetry show", "target": ["mon-mgr", ""]}]: dispatch
2025-06-11T18:48:51.317+0000 7f8f67ecd640 20 mgr send_beacon active
2025-06-11T18:48:51.821+0000 7f8f6aed3640 10 log_client  logged 2025-06-11T18:48:50.822638+0000 mgr.x (mgr.4461) 1 : audit [DBG] from='client.4469 -' entity='client.admin' cmd=[{"prefix": "telemetry show", "target": ["mon-mgr", ""]}]: dispatch
2025-06-11T18:48:53.318+0000 7f8f67ecd640 20 mgr send_beacon active

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't studied this code in detail yet. I have a question on the overall approach here: why not just have the ceph-mgr sit on MMgrCommands and retry all tabled commands whenever a module becomes active. If a module never becomes active after some timeout, return -ENOSYS (not -ENOTSUP).

That way the ceph-mgr is available immediately for modules which load quickly and others just delay the command delivery until a module that understands the command successfully loads.

@ljflores
Copy link
Member Author

I haven't studied this code in detail yet. I have a question on the overall approach here: why not just have the ceph-mgr sit on MMgrCommands and retry all tabled commands whenever a module becomes active. If a module never becomes active after some timeout, return -ENOSYS (not -ENOTSUP).

That way the ceph-mgr is available immediately for modules which load quickly and others just delay the command delivery until a module that understands the command successfully loads.

I see what you’re saying and how that might make mgr command handling more resilient. However, your suggestion assumes that mgr modules load independently and in parallel, but that’s not the case. Mgr modules load sequentially, and the load time of one can directly impact the next. For instance, if the orchestrator module takes longer to load, the telemetry module won’t have loaded yet, since modules load in alphabetical order and start one by one. So retrying tabled commands on a per-module basis introduces unnecessary complexity, especially since we know that if one earlier module is still pending, all subsequent ones will be as well.

If we did want to check for availability on a per-module basis, we could add logic to the DaemonServer code where mgr commands are handled. However, I’ve already implemented that type of approach here: #61325
As you pointed out though, there are concerns about blocking the messenger thread. This solution is non-blocking to the messenger thread, works cleanly within the existing finisher thread framework, and doesn’t add multithreading which has a higher potential for race conditions.

Another important detail is that this startup issue only applies to enabled modules, not disabled ones. If a module is enabled, it’s reasonable to assume it’s part of the cluster’s intended functionality. In my opinion, the mgr simply should not be considered “available” until all enabled modules have fully started. If users didn’t intend to use those modules, they’d disable them. Marking the mgr as active before those modules are ready signals false availability to the cluster. Since the cluster expects enabled modules to be usable, it’s cleaner to wait for them to finish starting up than to mark the mgr active early and try to recover later on a per-module basis.

If you're interested in reviewing the module startup sequence, the relevant logic can be found in:

Mgr::background_init(...)
Mgr::init()
PyModuleRegistry::active_start(...)
ActivePyModules::start_one(...)
PyModule::load(...)

Here, you can see how modules are loaded one by one, where each is dependent on the completion of the previous.

One aspect I’m unsure about is what the mgr’s behavior should be if the loading somehow gets stalled indefinitely. My solution is currently implemented so if pending_modules never becomes empty, the mgr hangs indefinitely and never sends the active beacon. So, the mgr has the potential to never become active in this extreme case. I could see improving this by adding some logic to say “if the active beacon retries X number of times, issue an error message that the mgr got stuck on a certain module and that the user should try disabling it”.

@ljflores
Copy link
Member Author

jenkins test api

@athanatos
Copy link
Contributor

I haven't studied this code in detail yet. I have a question on the overall approach here: why not just have the ceph-mgr sit on MMgrCommands and retry all tabled commands whenever a module becomes active. If a module never becomes active after some timeout, return -ENOSYS (not -ENOTSUP).

That way the ceph-mgr is available immediately for modules which load quickly and others just delay the command delivery until a module that understands the command successfully loads.

I don't see any real advantage to allowing the manager to be "active" without the modules being loaded -- seems a lot more complicated for little advantage. Yes, it allows a slightly faster startup, but that hardly seems critical on the manager.

@batrick
Copy link
Member

batrick commented Jun 13, 2025

I haven't studied this code in detail yet. I have a question on the overall approach here: why not just have the ceph-mgr sit on MMgrCommands and retry all tabled commands whenever a module becomes active. If a module never becomes active after some timeout, return -ENOSYS (not -ENOTSUP).
That way the ceph-mgr is available immediately for modules which load quickly and others just delay the command delivery until a module that understands the command successfully loads.

I see what you’re saying and how that might make mgr command handling more resilient. However, your suggestion assumes that mgr modules load independently and in parallel, but that’s not the case. Mgr modules load sequentially, and the load time of one can directly impact the next. For instance, if the orchestrator module takes longer to load, the telemetry module won’t have loaded yet, since modules load in alphabetical order and start one by one. So retrying tabled commands on a per-module basis introduces unnecessary complexity, especially since we know that if one earlier module is still pending, all subsequent ones will be as well.

If we did want to check for availability on a per-module basis, we could add logic to the DaemonServer code where mgr commands are handled. However, I’ve already implemented that type of approach here: #61325 As you pointed out though, there are concerns about blocking the messenger thread. This solution is non-blocking to the messenger thread, works cleanly within the existing finisher thread framework, and doesn’t add multithreading which has a higher potential for race conditions.

My thought was that you'd keep a vector of MMgrCommands to retry whenever a module becomes active. There would be no need to sleep the messenger thread.

Another important detail is that this startup issue only applies to enabled modules, not disabled ones. If a module is enabled, it’s reasonable to assume it’s part of the cluster’s intended functionality. In my opinion, the mgr simply should not be considered “available” until all enabled modules have fully started. If users didn’t intend to use those modules, they’d disable them. Marking the mgr as active before those modules are ready signals false availability to the cluster. Since the cluster expects enabled modules to be usable, it’s cleaner to wait for them to finish starting up than to mark the mgr active early and try to recover later on a per-module basis.

If you're interested in reviewing the module startup sequence, the relevant logic can be found in:

Mgr::background_init(...)
Mgr::init()
PyModuleRegistry::active_start(...)
ActivePyModules::start_one(...)
PyModule::load(...)

Here, you can see how modules are loaded one by one, where each is dependent on the completion of the previous.

One aspect I’m unsure about is what the mgr’s behavior should be if the loading somehow gets stalled indefinitely. My solution is currently implemented so if pending_modules never becomes empty, the mgr hangs indefinitely and never sends the active beacon. So, the mgr has the potential to never become active in this extreme case. I could see improving this by adding some logic to say “if the active beacon retries X number of times, issue an error message that the mgr got stuck on a certain module and that the user should try disabling it”.

The larger problem is that the ceph-mgr doesn't actually decide when it's active. The monitors tell the standby it's active. If the ceph-mgr doesn't immediately start sending beacons then the mons will assume it's dead and replace it.

This seems to be the reason why the "available" flag was invented but its meaning is controversial and recently discussed in e.g.

#51169 (comment)

I'd argue the "available" flag is half-baked and we keep churning on this continually. I think a complete solution requires adding recovery states to the ceph-mgr starting with something like "STATE_LOADING_MODULES". The ceph-mgr eventually should tell the mons when it's ready to switch to active in the same way that the MDS do.

What do you think?

@ljflores
Copy link
Member Author

My thought was that you'd keep a vector of MMgrCommands to retry whenever a module becomes active. There would be no need to sleep the messenger thread.

I still don’t think this would offer any significant benefit. Mgr modules load sequentially and depend on each other, so we can already infer a module’s availability based on whether the ones before it have loaded. Retrying tabled commands individually adds extra complexity, especially when we know every enabled module is expected to be loaded and functional in the cluster.

The larger problem is that the ceph-mgr doesn't actually decide when it's active. The monitors tell the standby it's active.

I disagree with you on this. The monitor AND the active mgr it chooses both have a hand in determining the mgr’s availability. In MgrStandby::send_beacon(), the “available” status is set based on two conditions:

  1. The mon must choose a new active mgr → decision made by mon
  2. The new active mgr must be initialized. → decision made by the chosen active mgr

The chosen active mgr ultimately tells the mon when it’s done initializing, and it is actually an expectation that the python modules be loaded per the comment written under the “if (available)” condition.

The code that I’m referencing is here:

void MgrStandby::send_beacon()
{
 …
  // Whether I think I am available (request MgrMonitor to set me
  // as available in the map)
  bool available = active_mgr != nullptr && active_mgr->is_initialized();
…
if (available) {
    if (!available_in_map) {
      // We are informing the mon that we are done initializing: inform
      // it of our command set.  This has to happen after init() because
      // it needs the python modules to have loaded.
      std::vector<MonCommand> commands = mgr_commands;
      std::vector<MonCommand> py_commands = py_module_registry.get_commands();
      commands.insert(commands.end(), py_commands.begin(), py_commands.end());
      if (monc.monmap.min_mon_release < ceph_release_t::quincy) {
        dout(10) << " stripping out positional=false quincy-ism" << dendl;
        for (auto& i : commands) {
          boost::replace_all(i.cmdstring, ",positional=false", "");
        }
      }
      m->set_command_descs(commands);
      dout(4) << "going active, including " << m->get_command_descs().size()
              << " commands in beacon" << dendl;
    }

    m->set_services(active_mgr->get_services());
  }

If the ceph-mgr doesn't immediately start sending beacons then the mons will assume it's dead and replace it.

That’s not quite what’s happening here. In my solution, the mgr does keep sending beacons, just not the final “active” beacon until all modules have started. While it's waiting for initialization to complete, it sends a beacon indicating that it’s “active (starting)”. So the mon is being notified continuously, and doesn’t consider the mgr dead. Even if a module is stalled loading forever, the mon doesn’t replace the chosen active mgr. It waits for the mgr to signal that initialization is complete.

See this comment for a clear demonstration of that: #63859 (comment)

This seems to be the reason why the "available" flag was invented but its meaning is controversial and recently discussed in e.g.

#51169 (comment)

I'd argue the "available" flag is half-baked and we keep churning on this continually. I think a complete solution requires adding recovery states to the ceph-mgr starting with something like "STATE_LOADING_MODULES".

Looking at the MgrStandby::send_beacon code, it seems straightforward to me that the “available” condition is set by 1) the mon choosing a new active_mgr and 2) the new active mgr signaling initialization. Introducing module-by-module availability would again break these expectations and make communication with the mon and mgr command handler much more difficult. That said, I'm open to understanding more about why the condition might be half-baked and how we can make it clearer.

The ceph-mgr eventually should tell the mons when it's ready to switch to active in the same way that the MDS do.

Can you please elaborate on the active sequence in the MDS and how it differs from the mgr?

@athanatos
Copy link
Contributor

I'm not actually sure this is a big deal. I haven't actually seen evidence that loading modules takes a long time, merely that there's a possible race condition if we send the beacon before it's actually happened. Perhaps we try this solution and see if the delay is actually a problem?

@ljflores
Copy link
Member Author

I'm not actually sure this is a big deal. I haven't actually seen evidence that loading modules takes a long time, merely that there's a possible race condition if we send the beacon before it's actually happened. Perhaps we try this solution and see if the delay is actually a problem?

I know you and I have gone over this scenario many times, but I want to point out the original example that motivated this fix, which is documented in https://bugzilla.redhat.com/show_bug.cgi?id=2314146. In this report, the QE team was testing upgrades from Reef/Quincy to Squid, and their scripts were erroring out as soon as the mgrs upgraded to Squid. We had not seen this race condition occur before Squid, and after digging into it, I found that it is likely due to us gradually adding more load on the various serve() functions that are triggered for every enabled module upon initialization.

One might suggest optimizing these serve() functions. Sometimes that is the right call; in a related issue (https://tracker.ceph.com/issues/68657), an extra function was added to the balancer module's serve() function in Squid, which caused the module to load more slowly and exacerbate this error. We fixed the bottleneck since it wasn't necessary to the balancer's functionality. However, what happens when we want to add extra tasks to a module's serve() function out of necessity? That's a very valid scenario that should be accounted for in the module loading sequence.

In the example from https://bugzilla.redhat.com/show_bug.cgi?id=2314146, the QE team is still experiencing the error after we fixed the bottleneck in the balancer, although to a lesser extent. I experimented with disabling cephadm to remove it from the loading sequence, and then the error "went away". This indicates that cephadm might have a heavy serve() function as well, and sure enough there were some additions to cephadm's serve() function that got added to Squid that may be related. However, even if we attempt to optimize that, those additions might be necessary to cephadm's functionality. Also, that doesn't address the root cause, which is that the mgr is declaring availability too early.

Ultimately, we likely didn't see this problem before Squid since we got lucky. But, with the introduction of the finisher thread framework, which changed the module loading sequence to an async background task, and by us gradually adding more tasks to the serve() functions, it was only a matter of time before this race occurred. This fix will be most beneficial in scenarios where we are running mgr commands against mgrs that are in the middle of upgrading, failing over, or otherwise initializing.

@batrick
Copy link
Member

batrick commented Jun 13, 2025

My thought was that you'd keep a vector of MMgrCommands to retry whenever a module becomes active. There would be no need to sleep the messenger thread.

I still don’t think this would offer any significant benefit. Mgr modules load sequentially and depend on each other, so we can already infer a module’s availability based on whether the ones before it have loaded. Retrying tabled commands individually adds extra complexity, especially when we know every enabled module is expected to be loaded and functional in the cluster.

The larger problem is that the ceph-mgr doesn't actually decide when it's active. The monitors tell the standby it's active.

I disagree with you on this. The monitor AND the active mgr it chooses both have a hand in determining the mgr’s availability. In MgrStandby::send_beacon(), the “available” status is set based on two conditions:

1. The mon must choose a new active mgr → decision made by mon

2. The new active mgr must be initialized. → decision made by the chosen active mgr

The chosen active mgr ultimately tells the mon when it’s done initializing, and it is actually an expectation that the python modules be loaded per the comment written under the “if (available)” condition.

The code that I’m referencing is here:

void MgrStandby::send_beacon()
{
 …
  // Whether I think I am available (request MgrMonitor to set me
  // as available in the map)
  bool available = active_mgr != nullptr && active_mgr->is_initialized();
…
if (available) {
    if (!available_in_map) {
      // We are informing the mon that we are done initializing: inform
      // it of our command set.  This has to happen after init() because
      // it needs the python modules to have loaded.
      std::vector<MonCommand> commands = mgr_commands;
      std::vector<MonCommand> py_commands = py_module_registry.get_commands();
      commands.insert(commands.end(), py_commands.begin(), py_commands.end());
      if (monc.monmap.min_mon_release < ceph_release_t::quincy) {
        dout(10) << " stripping out positional=false quincy-ism" << dendl;
        for (auto& i : commands) {
          boost::replace_all(i.cmdstring, ",positional=false", "");
        }
      }
      m->set_command_descs(commands);
      dout(4) << "going active, including " << m->get_command_descs().size()
              << " commands in beacon" << dendl;
    }

    m->set_services(active_mgr->get_services());
  }

If the ceph-mgr doesn't immediately start sending beacons then the mons will assume it's dead and replace it.

That’s not quite what’s happening here. In my solution, the mgr does keep sending beacons, just not the final “active” beacon until all modules have started. While it's waiting for initialization to complete, it sends a beacon indicating that it’s “active (starting)”. So the mon is being notified continuously, and doesn’t consider the mgr dead. Even if a module is stalled loading forever, the mon doesn’t replace the chosen active mgr. It waits for the mgr to signal that initialization is complete.

See this comment for a clear demonstration of that: #63859 (comment)

This seems to be the reason why the "available" flag was invented but its meaning is controversial and recently discussed in e.g.

Thanks for patiently explaining and reminding me of the details Laura. I think you're right.

I would also ask @idryomov to chime in here as I think this PR is doing what he previously asked we do.

#51169 (comment)

I'd argue the "available" flag is half-baked and we keep churning on this continually. I think a complete solution requires adding recovery states to the ceph-mgr starting with something like "STATE_LOADING_MODULES".

Looking at the MgrStandby::send_beacon code, it seems straightforward to me that the “available” condition is set by 1) the mon choosing a new active_mgr and 2) the new active mgr signaling initialization. Introducing module-by-module availability would again break these expectations and make communication with the mon and mgr command handler much more difficult. That said, I'm open to understanding more about why the condition might be half-baked and how we can make it clearer.

No sorry, I didn't mean that the ceph-mgr should indicate via a state change for each individual module. It would just be a recovery state. The "available" flag approximates that to a degree. What I don't like is that operators won't understand why the mgr may not be working when it's clearly "active". Perhaps that's something that can be included in this PR: the mon's can log a health warning in MgrMonitor::encode_pending that the mgr is active but not yet available. (This is normal as we log similar warnings when the MDS is in a degraded state.)

In any case, any necessary redesign with recovery states can be done in the future.

The ceph-mgr eventually should tell the mons when it's ready to switch to active in the same way that the MDS do.

Can you please elaborate on the active sequence in the MDS and how it differs from the mgr?

@ljflores
Copy link
Member Author

I would also ask @idryomov to chime in here as I think this PR is doing what he previously asked we do.

Sure, I'll request Ilya as a reviewer.

No sorry, I didn't mean that the ceph-mgr should indicate via a state change for each individual module. It would just be a recovery state. The "available" flag approximates that to a degree. What I don't like is that operators won't understand why the mgr may not be working when it's clearly "active". Perhaps that's something that can be included in this PR: the mon's can log a health warning in MgrMonitor::encode_pending that the mgr is active but not yet available. (This is normal as we log similar warnings when the MDS is in a degraded state.)

FWIW, the ceph status does already indicate that the new active mgr is still starting. This snippet was taken during the same "initialization sequence delay via balancer bottleneck" that I illustrated in #63859 (comment):

$ ./bin/ceph -s
  cluster:
    id:     b150f540-745a-460c-a566-376b28b95ac3
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum a,b,c (age 47m) [leader: a]
    mgr: x(active, starting, since 3s)
    mds: 1/1 daemons up, 2 standby
    osd: 4 osds: 4 up (since 47m), 4 in (since 47m)
 
  data:
    volumes: 1/1 healthy
    pools:   4 pools, 177 pgs
    objects: 24 objects, 451 KiB
    usage:   4.0 GiB used, 400 GiB / 404 GiB avail
    pgs:     177 active+clean

The MgrStandby code defines the three possible states: "standby", "active", and "active (starting)":

std::string MgrStandby::state_str()
{
  if (active_mgr == nullptr) {
    return "standby";
  } else if (active_mgr->is_initialized()) {
    return "active";
  } else {
    return "active (starting)";
  }

active (starting) is synonymous with "pending", and should indicate to an operator that the mgr is still starting up, but perhaps this can be better documented. A quick grep of "active (starting)" didn't reveal anything in the documentation, so we could place a section there explaining what each of these three mgr states means. I can also look into adding a cluster warning, but it's worth noting that in most field examples of module loading delay, the delay has historically only been a second or two, so any new warning would only appear for short time. Still, I could foresee a case where the loading sequence gets stuck (this has not happened as of yet in the field), but we should probably warn about that.

In any case, any necessary redesign with recovery states can be done in the future.

Sure, it would be helpful to have a design proposal in the form of an Enhancement tracker so all the criteria and intended benefits are clearly stated.

@ljflores ljflores requested review from idryomov June 13, 2025 20:02
@github-actions github-actions bot added the stale label Feb 1, 2026
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@github-actions github-actions bot closed this Mar 3, 2026
@vshankar vshankar reopened this Mar 4, 2026
@vshankar
Copy link
Contributor

vshankar commented Mar 4, 2026

reopening since this is something we want to make available.

@github-actions github-actions bot removed the stale label Mar 4, 2026
@ljflores
Copy link
Member Author

ljflores commented Mar 4, 2026

reopening since this is something we want to make available.

Thanks Venky- I'm retesting this one more time and then going to seek final reviews.

Laura Flores added 4 commits March 11, 2026 13:23
----------------- Explanation of Problem ----------------

When the mgr is restarted or failed over via `ceph mgr fail` or during an
upgrade, mgr modules sometimes take longer to start up (this includes
loading their class, commands, and module options, and being removed
from the `pending_modules` map structure). This startup delay can happen
due to a cluster's specific hardware or if a code bottleneck is triggered in
a module’s `serve()` function (each mgr module has a `serve()` function that
performs initialization tasks right when the module is loaded).

When this startup delay occurs, any mgr module command issued against the
cluster around the same time fails with error saying that the command is not
supported:
```
$ ceph mgr fail; ceph fs volume ls
Error ENOTSUP: Warning: due to ceph-mgr restart, some PG states may not be up to date
Module 'volumes' is not enabled/loaded (required by command 'fs volume ls'): use `ceph mgr module enable volumes` to enable it
```

We should try to lighten any bottlenecks in the mgr module `serve()`
functions wherever possible, but the root cause of this failure is that the
mgr sends a beacon to the mon too early, indicating that it is active before
the module loading has completed. Specifically, some of the mgr modules
have loaded their class but have not yet been deleted from the `pending_modules`
structure, indicating that they have not finished starting up.

--------------------- Explanation of Fix  --------------------

This commit improves the criteria for sending the “active” beacon to the mon so
the mgr does not signal that it’s active too early. We do this through the following additions:

1. A new context `ActivePyModules::recheck_modules_start` that will be set if not all modules
   have finished startup.

2. A new function `ActivePyModules::check_all_modules_started()` that checks if modules are
   still pending startup; if all have started up (`pending_modules` is empty), then we send
   the beacon right away. But if some are still pending, we pass the beacon task on to the new
   recheck context `ActivePyModules::recheck_modules_start` so we know to send the beacon later.

3. Logic in ActivePyModules::start_one() that only gets triggered if the modules did not all finish
   startup the first time we checked. We know this is the case if the new recheck context
   `recheck_modules_start` was set from `nullptr`. The beacon is only sent once `pending_modules` is
   confirmed to be empty, which means that all the modules have started up and are ready to support commands.

4. Adjustment of when the booleans `initializing` and `initialized` are set. These booleans come into play in
   MgrStandby::send_beacon() when we check that the active mgr has been initialized (thus, it is available).
   We only send the beacon when this boolean is set. Currently, we set these booleans at the end of Mgr::init(),
   which means that it gets set early before `pending_modules` is clear. With this adjustment, the bools are set
   only after we check that all modules have started up. The send_beacon code is triggered on mgr failover AND on
   every Mgr::tick(), which occurs by default every two seconds. If we don’t adjust when these bools are set, we
   only fix the mgr failover part, but the mgr still sends the beacon too early via Mgr::tick(). Below is the relevant
   code from MgrStandby::send_beacon(), which is triggered in Mgr::background_init() AND in Mgr::tick():
```
  // Whether I think I am available (request MgrMonitor to set me
  // as available in the map)
  bool available = active_mgr != nullptr && active_mgr->is_initialized();

  auto addrs = available ? active_mgr->get_server_addrs() : entity_addrvec_t();
  dout(10) << "sending beacon as gid " << monc.get_global_id() << dendl;

```

--------------------- Reproducing the Bug ----------------------

At face value, this issue is indeterministically reproducible since it
can depend on environmental factors or specific cluster workloads.
However, I was able to deterministically reproduce it by injecting a
bottleneck into the balancer module:
```
diff --git a/src/pybind/mgr/balancer/module.py b/src/pybind/mgr/balancer/module.py
index d12d69f..91c83fa8023 100644
--- a/src/pybind/mgr/balancer/module.py
+++ b/src/pybind/mgr/balancer/module.py
@@ -772,10 +772,10 @@ class Module(MgrModule):
                     self.update_pg_upmap_activity(plan)  # update pg activity in `balancer status detail`
                 self.optimizing = False
+                # causing a bottleneck
+                for i in range(0, 1000):
+                    for j in range (0, 1000):
+                        x = i + j
+                        self.log.debug("hitting the bottleneck in the balancer module")
             self.log.debug('Sleeping for %d', sleep_interval)
             self.event.wait(sleep_interval)
             self.event.clear()
```

Then, the error reproduces every time by running:
```
$ ./bin/ceph mgr fail; ./bin/ceph telemetry show
Error ENOTSUP: Warning: due to ceph-mgr restart, some PG states may not be up to date
Module 'telemetry' is not enabled/loaded (required by command 'telemetry show'): use `ceph mgr module enable telemetry` to enable it
```

With this fix, the active mgr is marked as "initialized" only after all
the modules have started up, and this error goes away. The command may
take a bit longer to execute depending on the extent of the delay.

---------------------- Integration Testing ---------------------

This commit adds a dev-only config that can inject a longer
loading time into the mgr module loading sequence so we can
simulate this scenario in a test.

The config is 0 ms by default since we do not add any delay
outside of testing scenarios. The config can be adjusted
with the following command:
  `ceph config set mgr mgr_module_load_delay <ms>`

A second dev-only config also allows you to specify which
module you want to be delayed in loading time. You may change
this with the following command:
  `ceph config set mgr mgr_module_load_delay_name <module name>`

The workunit added here tests a simulated slow loading module
scenario to ensure that this case is properly handled.

--------------------- Documentation --------------------

The new documentation describes the three existing mgr states so Ceph
operators can better interpret their Ceph status output.

Fixes: https://tracker.ceph.com/issues/71631
Signed-off-by: Laura Flores <lflores@ibm.com>
…eded

----------------- Enhancement to the Original Fix -----------------

During a mgr failover, the active mgr is marked available if:
  1. The mon has chosen a standby to be active
  2. The chosen active mgr has all of its modules initialized

Now that we've improved the criteria for sending the "active" beacon
by enforcing it to retry initializing mgr modules, we need to account
for extreme cases in which the modules are stuck loading for a very long
time, or even indefinitely. In these extreme cases where the modules might
never initialize, we don't want to delay sending the "active" beacon for
too long. This can result in blocking other important mgr functionality,
such as reporting PG availability in the health status. We want
to avoid sending warnings about PGs being unknown in the health status when
that's not ultimately the problem.

To account for an exeptionally long module loading time, I added a new
configurable `mgr_module_load_expiration`. If we exceed this maximum amount
of time (in ms) allotted for the active mgr to load the mgr modules before declaring
availability, the standby will then proceed to mark itself "available" and
send the "active" beacon to the mon and unblock other critical mgr functionality.

If this happens, a health error will be issued at this time, indicating
which mgr modules got stuck initializing (See src/mgr/PyModuleRegistry.cc). The
idea is to unblock the rest of the mgr's critical functionality while making it
clear to Ceph operators that some modules are unusable.

--------------------- Integration Testing --------------------

The workunit was rewritten so it tests for these scenarios:

1. Normal module loading behavior (no health error should be issued)
2. Acceptable delay in module loading behavior (no health error should be
   issued)
3. Unacceptable delay in module loading behavior (a health error should be
   issued)

--------------------- Documentation --------------------

This documentation explains the "Module failed to initialize"
cluster error.

Users are advised to try failing over
the mgr to reboot the module initialization process,
then if the error persists, file a bug report. I decided
to write it this way instead of providing more complex
debugging tips such as advising to disable some mgr modules
since every case will be different depending on which modules
failed to initialize.

In the bug report, developers can ask for the health detail
output to narrow down which module is causing a bottleneck,
and then ask the user to try disabling certain modules until
the mgr is able to fully initialize.

Fixes: https://tracker.ceph.com/issues/71631
Signed-off-by: Laura Flores <lflores@ibm.com>
Now, the command `ceph tell mgr mgr_status` will show a
"pending_modules" field. This is another way for Ceph operators
to check which modules haven't been initalized yet (in addition
to the health error).

This command was also added to testing scenarios in the workunit.

Fixes: https://tracker.ceph.com/issues/71631
Signed-off-by: Laura Flores <lflores@ibm.com>
The current check groups modules not being
enabled with failing to initialize. In this commit,
we reorder the checks:

1: Screen for a module being enabled. If it's not,
   issue an EOPNOTSUPP with instructions on how
   to enable it.

2. Screen for if a module is active. If a module
   is enabled, then the cluster expects it to
   be active to support commands. If the module
   took too long to initialize though, we will
   catch this and issue an ETIMEDOUT error with
   a link for troubleshooting.

Now, these two separate issues are not grouped
together, and they are checked in the right order.

Fixes: https://tracker.ceph.com/issues/71631
Signed-off-by: Laura Flores <lflores@ibm.com>
@ljflores ljflores requested a review from batrick March 11, 2026 18:25
@ljflores
Copy link
Member Author

Squashed all the commits. Everything is now in order and ready for final reviews.

I also ran some QA tests on the workunit, and confirmed it passed:
https://pulpito.ceph.com/lflores-2026-03-04_22:16:23-rados:mgr-wip-lflores-testing-3-2026-03-04-1243-distro-default-trial/

@ljflores
Copy link
Member Author

/config check ok

Copy link
Contributor

@NitzanMordhai NitzanMordhai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed all the changes, and it looks good!

@yaarith
Copy link
Contributor

yaarith commented Mar 17, 2026

adding the rocky10 label to make testing easier; however, this fix is not specific to Rocky 10.

fi

echo "Check mgr_status to ensure 'pending_modules' is populated with modules we expect..."
expected='["balancer","cephadm","crash","devicehealth","iostat","nfs","orchestrator","pg_autoscaler","progress","rbd_support","status","telemetry","volumes"]'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ljflores need to add nvmeof as well, it was added as "always on" module

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Post the merge of this: ceph#67641

Fixes: https://tracker.ceph.com/issues/71631
Signed-off-by: Laura Flores <lflores@ibm.com>
@ljflores ljflores requested a review from NitzanMordhai March 19, 2026 14:48
@ljflores
Copy link
Member Author

jenkins test make check

Copy link
Contributor

@NitzanMordhai NitzanMordhai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@batrick batrick merged commit bbfedaf into ceph:main Mar 20, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants