Project

General

Profile

Actions

Bug #71631

open

Commands using Mgr Modules fail if run immediately post a mgr failover/ restart

Added by Laura Flores 9 months ago. Updated about 15 hours ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Backport:
squid,tentacle
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Tags (freeform):
backport_processed
Fixed In:
v20.3.0-6266-gbbfedafcf5
Released In:
Upkeep Timestamp:
2026-03-20T21:42:02+00:00

Description

Reproduced by:

$ ceph mgr fail; ceph fs volume ls
Error ENOTSUP: Warning: due to ceph-mgr restart, some PG states may not be up to date
Module 'volumes' is not enabled/loaded (required by command 'fs volume ls'): use `ceph mgr module enable volumes` to enable it

Related BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2314146


Related issues 7 (6 open1 closed)

Related to CephFS - Bug #67230: mgr: should be declared available only after all python modules have been loadedFix Under ReviewMahesh Mohan

Actions
Related to mgr - Bug #68657: squid: mgr/balancer preventing orchestrator and dashboard functionalityResolvedLaura Flores

Actions
Related to CephFS - Bug #70456: qa: Command failed on smithi012 with status 124: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph fs volume ls'TriagedMahesh Mohan

Actions
Related to Orchestrator - Bug #71830: Upgrade tests stuck when upgrading ceph-mgr daemonNewRedouane Kachach Elhicou

Actions
Related to mgr - Bug #75422: Rocky10 - Module 'orchestrator' is not enabled/loadedNew

Actions
Copied to mgr - Backport #75564: tentacle: Commands using Mgr Modules fail if run immediately post a mgr failover/ restartIn ProgressLaura FloresActions
Copied to mgr - Backport #75565: squid: Commands using Mgr Modules fail if run immediately post a mgr failover/ restartNewLaura FloresActions
Actions #1

Updated by Laura Flores 9 months ago

  • Assignee set to Laura Flores
Actions #2

Updated by Laura Flores 9 months ago

  • Description updated (diff)
Actions #3

Updated by Laura Flores 9 months ago

  • Description updated (diff)
Actions #4

Updated by Laura Flores 9 months ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 63859
Actions #5

Updated by Laura Flores 9 months ago

  • Related to Bug #67230: mgr: should be declared available only after all python modules have been loaded added
Actions #6

Updated by Laura Flores 9 months ago

  • Related to deleted (Bug #67230: mgr: should be declared available only after all python modules have been loaded)
Actions #7

Updated by Laura Flores 9 months ago

  • Related to Bug #67230: mgr: should be declared available only after all python modules have been loaded added
Actions #8

Updated by Laura Flores 9 months ago

  • Related to Bug #68657: squid: mgr/balancer preventing orchestrator and dashboard functionality added
Actions #9

Updated by Venky Shankar 9 months ago

  • Related to Bug #70456: qa: Command failed on smithi012 with status 124: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph fs volume ls' added
Actions #10

Updated by Venky Shankar 9 months ago

https://pulpito.ceph.com/vshankar-2025-06-13_17:03:06-fs-wip-vshankar-testing-20250613.134551-debug-testing-default-smithi/8327080/ is likely another instance of this issue.

$ zgrep -v "client\." ./remote/smithi159/log/ceph-mgr.x.log.gz | egrep "_handle_command|ceph-mgr, pid" 
...
...
...
025-06-15T01:22:33.693+0000 7f07749cd100  0 ceph version 20.3.0-896-g1a8c963f (1a8c963f6e5d0aa68a79fd7c4ea3e0bb861d7d90) tentacle (dev - Debug), process ceph-mgr, pid 62269
2025-06-15T01:22:38.107+0000 7f07132f2640 10 mgr.server _handle_command decoded-size=4 prefix=fs subvolumegroup create
2025-06-15T01:22:38.108+0000 7f07132f2640 10 mgr.server _handle_command passing through command 'fs subvolumegroup create' size 4
2025-06-15T01:23:36.228+0000 7f07132f2640 10 mgr.server _handle_command decoded-size=7 prefix=fs snap-schedule add
2025-06-15T01:23:36.229+0000 7f07132f2640 10 mgr.server _handle_command passing through command 'fs snap-schedule add' size 7
2025-06-15T01:23:36.697+0000 7f07132f2640 10 mgr.server _handle_command decoded-size=6 prefix=fs snap-schedule retention add
2025-06-15T01:23:36.698+0000 7f07132f2640 10 mgr.server _handle_command passing through command 'fs snap-schedule retention add' size 6
2025-06-15T01:23:37.069+0000 7f07132f2640 10 mgr.server _handle_command decoded-size=7 prefix=fs snap-schedule remove
2025-06-15T01:23:37.069+0000 7f07132f2640 10 mgr.server _handle_command passing through command 'fs snap-schedule remove' size 7
2025-06-15T01:23:37.536+0000 7f07132f2640 10 mgr.server _handle_command decoded-size=5 prefix=fs subvolume getpath
2025-06-15T01:23:37.536+0000 7f07132f2640 10 mgr.server _handle_command passing through command 'fs subvolume getpath' size 5
2025-06-15T01:23:40.762+0000 7f07132f2640 10 mgr.server _handle_command decoded-size=5 prefix=fs subvolume rm
2025-06-15T01:23:40.763+0000 7f07132f2640 10 mgr.server _handle_command passing through command 'fs subvolume rm' size 5
2025-06-15T01:23:41.153+0000 7f07132f2640 10 mgr.server _handle_command decoded-size=4 prefix=fs subvolumegroup rm
2025-06-15T01:23:41.154+0000 7f07132f2640 10 mgr.server _handle_command passing through command 'fs subvolumegroup rm' size 4
2025-06-15T01:23:42.365+0000 7f16e2a0d100  0 ceph version 20.3.0-896-g1a8c963f (1a8c963f6e5d0aa68a79fd7c4ea3e0bb861d7d90) tentacle (dev - Debug), process ceph-mgr, pid 62269
2025-06-15T01:24:02.460+0000 7f16811a4640 10 mgr.server _handle_command decoded-size=3 prefix=pg dump
2025-06-15T01:24:02.842+0000 7f16811a4640 10 mgr.server _handle_command decoded-size=3 prefix=pg dump
2025-06-15T01:24:03.222+0000 7f16811a4640 10 mgr.server _handle_command decoded-size=3 prefix=pg dump
2025-06-15T01:24:04.030+0000 7f16811a4640 10 mgr.server _handle_command decoded-size=3 prefix=pg dump
2025-06-15T01:24:07.390+0000 7f16811a4640 10 mgr.server _handle_command decoded-size=3 prefix=pg dump
2025-06-15T01:24:10.816+0000 7f16811a4640 10 mgr.server _handle_command decoded-size=2 prefix=fs volume ls
2025-06-15T01:24:10.816+0000 7f16811a4640 10 mgr.server _handle_command passing through command 'fs volume ls' size 2

In this case volume ls command didn't make progress after ceph-mgr got restarted. The command timeout (120 seconds) thereby failing the test.

Actions #11

Updated by Venky Shankar 9 months ago

@Laura Flores I see that the command run just after ceph-mgr restart could fail, however, as I mention in note-10, the command was blocked. Is that also a possibility?

Actions #12

Updated by Laura Flores 12 days ago

  • Related to Bug #71830: Upgrade tests stuck when upgrading ceph-mgr daemon added
Actions #13

Updated by Laura Flores 10 days ago

Commits were cleaned up, and the PR is ready for final reviews.

Actions #14

Updated by Laura Flores 10 days ago

  • Related to Bug #75422: Rocky10 - Module 'orchestrator' is not enabled/loaded added
Actions #15

Updated by Laura Flores 4 days ago

  • Copied to Backport #75564: tentacle: Commands using Mgr Modules fail if run immediately post a mgr failover/ restart added
Actions #16

Updated by Laura Flores 4 days ago

  • Copied to Backport #75565: squid: Commands using Mgr Modules fail if run immediately post a mgr failover/ restart added
Actions #17

Updated by Laura Flores 4 days ago

  • Tags (freeform) set to backport_processed
Actions #18

Updated by Upkeep Bot about 15 hours ago

  • Status changed from Fix Under Review to Pending Backport
  • Merge Commit set to bbfedafcf532f649edc771d5d03fcc8207b806f4
  • Fixed In set to v20.3.0-6266-gbbfedafcf5
  • Upkeep Timestamp set to 2026-03-20T21:42:02+00:00
Actions

Also available in: Atom PDF