mon: verify data pool is already not in use by any file system#44900
mon: verify data pool is already not in use by any file system#44900
Conversation
mchangir
left a comment
There was a problem hiding this comment.
small changes requested
Done. Updated PR |
src/mon/FSCommands.cc
Outdated
| ss << "Filesystem '" << fs->mds_map.get_fs_name() << "' is already using specified RADOS pools."; | ||
| return -EINVAL; | ||
| } | ||
| } |
There was a problem hiding this comment.
This check would probably be better in _check_pool. See also:
Lines 1573 to 1579 in e63f146
We have "application metadata" (a trivial k/v store) on pools to indicate its intended use. We're already marking meta/data pools:
Lines 924 to 926 in e63f146
You should check that as well. Consider also that application metadata is a newer feature in RADOS. So, we need to make sure the application metadata is consistent with the pools described in the FSMap. I'd also recommend a task in the MDSMonitor which sets the application metadata on the pool if unset. If the application metadata is wrong, then the MDSMonitor should raise a cluster warning.
There was a problem hiding this comment.
I have used application metadata to determine whether pool is already in use or not.
Also I have moved that code under _check_pool() function.
@batrick I just not able to understand why we need a task in MDSMonitor::tick() ?
Could you please help me out to understand what exactly expected in MDSMonitor::tick() ?
|
I'm also a bit concerned about the possibility of existing users who have a single data pool they're sharing across FSes with a disciplined process to make sure they're using namespaces to avoid collision, and maybe needing this in the future. So we should probably add a force flag to override any block we put in place? EDIT: Actually, I forgot about https://docs.ceph.com/en/latest/dev/cephfs-snapshots/#multi-fs and that makes me think we shouldn't worry about users doing this anyway. Go forth and block this misbehavior! |
Thanks for providing reference of documentation. |
ed7533d to
a00daf3
Compare
63178f3 to
bf2ae22
Compare
Please rerun the failed jobs from test run all failed/dead just to be sure that this does not break anything else. EDIT: and post the link here. |
|
I had a discussion with @vshankar (Please update if you find missing any point)
|
vshankar
left a comment
There was a problem hiding this comment.
@nmshelke PTAL - https://pulpito.ceph.com/vshankar-2022-06-03_10:03:27-fs-wip-vshankar-testing1-20220603-134300-testing-default-smithi/6862171/
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:FAIL: test_fs_rename_fs_new_fails_with_old_fsname_existing_pools (tasks.cephfs.test_admin.TestRenameCommand)
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_44457532553f59daa44f901930401f299f7ef2a5/qa/tasks/cephfs/test_admin.py", line 676, in test_fs_rename_fs_new_fails_with_old_fsname_existing_pools
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner: self.fail("expected creating new file system with old name, "
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:AssertionError: expected creating new file system with old name, existing pools, and --force flag to fail.
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
|
@nmshelke could you please run this through vstart_runner and share the run? |
pool should not be shared between multiple file system. Hence adding check to verify pool is already not in use. Fixes: https://tracker.ceph.com/issues/54111 Signed-off-by: Nikhilkumar Shelke <nshelke@redhat.com>
Fixes: https://tracker.ceph.com/issues/54111 Signed-off-by: Nikhilkumar Shelke <nshelke@redhat.com>
Fixes: https://tracker.ceph.com/issues/54111 Signed-off-by: Nikhilkumar Shelke <nshelke@redhat.com>
@vshankar Fixed the failure. |
I suggested that you run this through vstart_runner as share the result. |
@vshankar Please find test results through vstart_runner: |
|
@nmshelke - include in my test branch. |
|
@neesingh-rh @vshankar this has caused a regression in the rados suite: https://tracker.ceph.com/issues/56384 We caught the bug on the Pacific backport since that branch went through the rados suite, but it looks like this one only went through the fs suite. |
Sorry for that. (BTW, we are not backporting this change to p/q releases). |
|
Good to know, @vshankar! |
'ceph fs add_data_pool' command add data pool for file system.
data pool should not be shared between multiple file system.
Hence adding check to verify data pool is already not in use.
Fixes: https://tracker.ceph.com/issues/54111
Signed-off-by: Nikhilkumar Shelke nshelke@redhat.com
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox