Skip to content

mgr/mirroring: Display mon_host and fsid in daemon status command#67335

Merged
vshankar merged 4 commits intoceph:mainfrom
kotreshhr:fs_mirror_daemon_status
Mar 3, 2026
Merged

mgr/mirroring: Display mon_host and fsid in daemon status command#67335
vshankar merged 4 commits intoceph:mainfrom
kotreshhr:fs_mirror_daemon_status

Conversation

@kotreshhr
Copy link
Contributor

@kotreshhr kotreshhr commented Feb 12, 2026

The cephfs_mirror daemon returns the basic remote info.
Fetch the remote fsid and mon_host from the monitor
config map and append it

Sample output:

[
  {
    "daemon_id": 4153,
    "filesystems": [
      {
        "filesystem_id": 1,
        "name": "a",
        "directory_count": 0,
        "peers": [
          {
            "uuid": "29304477-1fd7-4709-b9f7-8153acebbafd",
            "remote": {
              "client_name": "client.mirror_remote",
              "cluster_name": "remote-site",
              "fs_name": "a",
              "mon_host": "[v2:192.168.64.5:40183,v1:192.168.64.5:40184]",
              "fsid": "5682c8e5-50cd-4cfd-b75c-5354dcdda487"
            },
            "stats": {
              "failure_count": 0,
              "recovery_count": 0
            }
          }
        ]
      }
    ]
  }
]

Fixes: https://tracker.ceph.com/issues/73455

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

@kotreshhr kotreshhr requested review from a team and vshankar February 12, 2026 19:45
@vshankar vshankar requested a review from joscollin February 13, 2026 05:19
@vshankar
Copy link
Contributor

@kotreshhr - Its been a while since I've closely looked at the mirroring related structures in FSMap, but, once a key is imported (via bootstrapping), then the remote cluster details are stored in monitor config map. Can't the relevant details be fetched from there instead?

@kotreshhr
Copy link
Contributor Author

@kotreshhr - Its been a while since I've closely looked at the mirroring related structures in FSMap, but, once a key is imported (via bootstrapping), then the remote cluster details are stored in monitor config map. Can't the relevant details be fetched from there instead?

@vshankar I too thought of it. But the daemon status code is fetching the remote info directly from FSMap's Peer. See the code at

f.dump_object("remote", peer.remote);

Asking it to fetch it from mon config map for every status update is an extra monitor command which I thought is unnecessary and adds a possible failure layer ? Also having it in FSMap keeps all peer information consistently at one place. Let me know your thoughts.

@kotreshhr kotreshhr force-pushed the fs_mirror_daemon_status branch from e67ae1e to 5914d19 Compare February 16, 2026 09:59
@kotreshhr
Copy link
Contributor Author

kotreshhr commented Feb 16, 2026

Fixed flake8 error which was causing make check failure and rebased

@kotreshhr kotreshhr force-pushed the fs_mirror_daemon_status branch from 5914d19 to 33cd7a5 Compare February 16, 2026 10:09
@vshankar
Copy link
Contributor

Asking it to fetch it from mon config map for every status update is an extra monitor command which I thought is unnecessary and adds a possible failure layer ? Also having it in FSMap keeps all peer information consistently at one place. Let me know your thoughts.

The thing is including it in FSMap redundant. If we ever would have to make a future modification to change these attributes, then it would have to be updated in two places, which I think is a good enough reason to not have the information additional stored in FSMap.

I would say, let's query the mon to get the attributes. If we are concerned about making mon queries over and over, maybe we can cache the attributes in mgr/mirroring. But for now, I would just fetch it from the mon every time.

@kotreshhr
Copy link
Contributor Author

Asking it to fetch it from mon config map for every status update is an extra monitor command which I thought is unnecessary and adds a possible failure layer ? Also having it in FSMap keeps all peer information consistently at one place. Let me know your thoughts.

The thing is including it in FSMap redundant. If we ever would have to make a future modification to change these attributes, then it would have to be updated in two places, which I think is a good enough reason to not have the information additional stored in FSMap.

I would say, let's query the mon to get the attributes. If we are concerned about making mon queries over and over, maybe we can cache the attributes in mgr/mirroring. But for now, I would just fetch it from the mon every time.

But why do we have Peer information in FSMap in that case ? Any historical reason ?

@vshankar
Copy link
Contributor

But why do we have Peer information in FSMap in that case ? Any historical reason ?

The mirror daemon subscribes to FSMap and thereby gets notified on Peer updates.

@kotreshhr kotreshhr force-pushed the fs_mirror_daemon_status branch from 33cd7a5 to 0d7effd Compare February 17, 2026 19:34
@kotreshhr kotreshhr changed the title mds: Store fsid and mon_host of remote peer into FSMap mgr/mirroring: Display mon_host and fsid in daemon status command Feb 17, 2026
@kotreshhr
Copy link
Contributor Author

But why do we have Peer information in FSMap in that case ? Any historical reason ?

The mirror daemon subscribes to FSMap and thereby gets notified on Peer updates.

Done. Removed all FSMap changes and modified only mgr/mirroring module to achieve the desired result

@kotreshhr
Copy link
Contributor Author

@kotreshhr
Copy link
Contributor Author

jenkins test make check

@kotreshhr
Copy link
Contributor Author

jenkins test make check arm64

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kotreshhr
Copy link
Contributor Author

Scheduled QA Run - https://pulpito.ceph.com/khiremat-2026-02-18_03:26:12-fs-wip-khiremat-mirror-status-67335-distro-default-trial/

@vshankar The relevant test ran fine. But I saw the following failure. This failure was seen at #67305 (comment) (2nd test failure). I initially thought it's not related to new retry logic PR as it was intermittent. I spent more time digging into this and I think I root caused the issue. Please find the analysis at https://tracker.ceph.com/issues/74998#note-2 and the fix #67385

Do we need one more run of qa with fix above or are we good as it's tracked separately and fix is available?

2026-02-18T04:21:16.766 INFO:tasks.cephfs_test_runner:
2026-02-18T04:21:16.766 INFO:tasks.cephfs_test_runner:======================================================================
2026-02-18T04:21:16.766 INFO:tasks.cephfs_test_runner:ERROR: test_cephfs_mirror_restart_sync_on_blocklist (tasks.cephfs.test_mirroring.TestMirroring.test_cephfs_mirror_restart_sync_on_blocklist)
2026-02-18T04:21:16.766 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2026-02-18T04:21:16.766 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2026-02-18T04:21:16.766 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_0d7effdd84f2ee0889c92326827ea07dffe3bed9/qa/tasks/cephfs/test_mirroring.py", line 839, in test_cephfs_mirror_restart_sync_on_blocklist
2026-02-18T04:21:16.766 INFO:tasks.cephfs_test_runner:    self.check_peer_status(self.primary_fs_name, self.primary_fs_id,
2026-02-18T04:21:16.766 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_0d7effdd84f2ee0889c92326827ea07dffe3bed9/qa/tasks/cephfs/test_mirroring.py", line 37, in wrapper
2026-02-18T04:21:16.766 INFO:tasks.cephfs_test_runner:    return func(*args, **kwargs)
2026-02-18T04:21:16.766 INFO:tasks.cephfs_test_runner:           ^^^^^^^^^^^^^^^^^^^^^
2026-02-18T04:21:16.766 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_0d7effdd84f2ee0889c92326827ea07dffe3bed9/qa/tasks/cephfs/test_mirroring.py", line 275, in check_peer_status
2026-02-18T04:21:16.766 INFO:tasks.cephfs_test_runner:    res = self.mirror_daemon_command(f'peer status for fs: {fs_name}',
2026-02-18T04:21:16.766 INFO:tasks.cephfs_test_runner:          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-18T04:21:16.766 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_0d7effdd84f2ee0889c92326827ea07dffe3bed9/qa/tasks/cephfs/test_mirroring.py", line 397, in mirror_daemon_command
2026-02-18T04:21:16.766 INFO:tasks.cephfs_test_runner:    p = self.mount_a.client_remote.run(args=
2026-02-18T04:21:16.766 INFO:tasks.cephfs_test_runner:        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-18T04:21:16.766 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_fa17720d0088c3ac28e473468bfc79eeaff5cd38/teuthology/orchestra/remote.py", line 596, in run
2026-02-18T04:21:16.767 INFO:tasks.cephfs_test_runner:    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
2026-02-18T04:21:16.767 INFO:tasks.cephfs_test_runner:        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-18T04:21:16.767 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_fa17720d0088c3ac28e473468bfc79eeaff5cd38/teuthology/orchestra/run.py", line 461, in run
2026-02-18T04:21:16.767 INFO:tasks.cephfs_test_runner:    r.wait()
2026-02-18T04:21:16.767 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_fa17720d0088c3ac28e473468bfc79eeaff5cd38/teuthology/orchestra/run.py", line 161, in wait
2026-02-18T04:21:16.767 INFO:tasks.cephfs_test_runner:    self._raise_for_status()
2026-02-18T04:21:16.767 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_fa17720d0088c3ac28e473468bfc79eeaff5cd38/teuthology/orchestra/run.py", line 181, in _raise_for_status
2026-02-18T04:21:16.767 INFO:tasks.cephfs_test_runner:    raise CommandFailedError(
2026-02-18T04:21:16.767 INFO:tasks.cephfs_test_runner:teuthology.exceptions.CommandFailedError: Command failed (peer status for fs: cephfs) on trial012 with status 22: 'ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror peer status cephfs@32 8
879b1b2-91a7-441c-b90e-848a8f2f9989'

@vshankar
Copy link
Contributor

Do we need one more run of qa with fix above or are we good as it's tracked separately and fix is available?

Let me check the qa fix. We can including that as it does not required rebuilding.

@kotreshhr
Copy link
Contributor Author

Do we need one more run of qa with fix above or are we good as it's tracked separately and fix is available?

Let me check the qa fix. We can including that as it does not required rebuilding.

I triggered the new qa run with the fix for https://tracker.ceph.com/issues/74998 - https://pulpito.ceph.com/khiremat-2026-02-18_07:31:59-fs-wip-khiremat-mirror-status-67335-distro-default-trial/

Everything passed, but unfortunately that race in issue https://tracker.ceph.com/issues/74998 isn't hit this time.

@kotreshhr
Copy link
Contributor Author

kotreshhr commented Feb 18, 2026

Do we need one more run of qa with fix above or are we good as it's tracked separately and fix is available?

Let me check the qa fix. We can including that as it does not required rebuilding.

I triggered the new qa run with the fix for https://tracker.ceph.com/issues/74998 - https://pulpito.ceph.com/khiremat-2026-02-18_07:31:59-fs-wip-khiremat-mirror-status-67335-distro-default-trial/

Everything passed, but unfortunately that race in issue https://tracker.ceph.com/issues/74998 isn't hit this time.

No luck with second iteration as well :) https://pulpito.ceph.com/khiremat-2026-02-18_10:17:40-fs-wip-khiremat-mirror-status-67335-distro-default-trial/ I grepped for CommandFailedError in teuthology logs, it hasn't hit that issue.

@sseshasa
Copy link
Contributor

The cephfs_mirror daemon returns the basic remote info.
Fetch the remote fsid and mon_host from the monitor
config map and append it.

Sample output:
[
  {
    "daemon_id": 4153,
    "filesystems": [
      {
        "filesystem_id": 1,
        "name": "a",
        "directory_count": 0,
        "peers": [
          {
            "uuid": "29304477-1fd7-4709-b9f7-8153acebbafd",
            "remote": {
              "client_name": "client.mirror_remote",
              "cluster_name": "remote-site",
              "fs_name": "a",
              "mon_host": "[v2:192.168.64.5:40183,v1:192.168.64.5:40184]",
              "fsid": "5682c8e5-50cd-4cfd-b75c-5354dcdda487"
            },
            "stats": {
              "failure_count": 0,
              "recovery_count": 0
            }
          }
        ]
      }
    ]
  }
]

Fixes: https://tracker.ceph.com/issues/73455
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Fixes: https://tracker.ceph.com/issues/73455
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Fixes: https://tracker.ceph.com/issues/73455
Signed-off-by: Kotresh HR <khiremat@redhat.com>
@kotreshhr kotreshhr force-pushed the fs_mirror_daemon_status branch from 017c3c0 to 7db2efc Compare February 25, 2026 17:33
@kotreshhr
Copy link
Contributor Author

Rebased as there was a conflict with PendingNotes after #66572 is merged

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vshankar vshankar merged commit 8ca4bca into ceph:main Mar 3, 2026
13 checks passed
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

This is an automated message by src/script/redmine-upkeep.py.

I have resolved the following tracker ticket due to the merge of this PR:

No backports are pending for the ticket. If this is incorrect, please update the tracker
ticket and reset to Pending Backport state.

Update Log: https://github.com/ceph/ceph/actions/runs/22619816669

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants