mon: verify data pool is already not in use by any file system by nmshelke · Pull Request #44900 · ceph/ceph

nmshelke · 2022-02-04T18:07:32Z

'ceph fs add_data_pool' command add data pool for file system.
data pool should not be shared between multiple file system.
Hence adding check to verify data pool is already not in use.

Fixes: https://tracker.ceph.com/issues/54111
Signed-off-by: Nikhilkumar Shelke nshelke@redhat.com

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox

src/mon/FSCommands.cc

qa/tasks/cephfs/test_admin.py

mchangir

small changes requested

src/mon/FSCommands.cc

nmshelke · 2022-02-08T10:52:48Z

small changes requested

Done. Updated PR

batrick · 2022-02-08T14:37:21Z

src/mon/FSCommands.cc

+	ss << "Filesystem '" << fs->mds_map.get_fs_name() << "' is already using specified RADOS pools.";
+	return -EINVAL;
+      }
+    }


This check would probably be better in _check_pool. See also:

ceph/src/mon/FSCommands.cc

Lines 1573 to 1579 in e63f146

if (!force && !pool->application_metadata.empty() &&

pool->application_metadata.count(

pg_pool_t::APPLICATION_NAME_CEPHFS) == 0) {

*ss << " pool '" << pool_name << "' (id '" << pool_id

<< "') has a non-CephFS application enabled.";

return -EINVAL;

}

We have "application metadata" (a trivial k/v store) on pools to indicate its intended use. We're already marking meta/data pools:

ceph/src/mon/FSCommands.cc

Lines 924 to 926 in e63f146

mon->osdmon()->do_application_enable(poolid,

pg_pool_t::APPLICATION_NAME_CEPHFS,

"data", fs_name, true);

You should check that as well. Consider also that application metadata is a newer feature in RADOS. So, we need to make sure the application metadata is consistent with the pools described in the FSMap. I'd also recommend a task in the MDSMonitor which sets the application metadata on the pool if unset. If the application metadata is wrong, then the MDSMonitor should raise a cluster warning.

I have used application metadata to determine whether pool is already in use or not.
Also I have moved that code under _check_pool() function.
@batrick I just not able to understand why we need a task in MDSMonitor::tick() ?
Could you please help me out to understand what exactly expected in MDSMonitor::tick() ?

gregsfortytwo · 2022-02-08T15:42:53Z

I'm also a bit concerned about the possibility of existing users who have a single data pool they're sharing across FSes with a disciplined process to make sure they're using namespaces to avoid collision, and maybe needing this in the future. So we should probably add a force flag to override any block we put in place?

EDIT: Actually, I forgot about https://docs.ceph.com/en/latest/dev/cephfs-snapshots/#multi-fs and that makes me think we shouldn't worry about users doing this anyway. Go forth and block this misbehavior!

nmshelke · 2022-02-08T17:11:43Z

I'm also a bit concerned about the possibility of existing users who have a single data pool they're sharing across FSes with a disciplined process to make sure they're using namespaces to avoid collision, and maybe needing this in the future. So we should probably add a force flag to override any block we put in place?

EDIT: Actually, I forgot about https://docs.ceph.com/en/latest/dev/cephfs-snapshots/#multi-fs and that makes me think we shouldn't worry about users doing this anyway. Go forth and block this misbehavior!

Thanks for providing reference of documentation.

src/mon/FSCommands.cc

lxbsz

I think you should discard both the 4622393 and ef4ed03. And fold them to a00daf3 directly.

qa/tasks/cephfs/test_admin.py

src/mon/FSCommands.cc

src/mon/FSCommands.h

nmshelke · 2022-02-10T07:58:20Z

I think you should discard both the 4622393 and ef4ed03. And fold them to a00daf3 directly.

Squashed commits.

vshankar · 2022-05-12T05:31:55Z

Reverting code which disallow users to overlay datapools because it is breaking some existing use-cases/test-cases.
Eg: https://pulpito.ceph.com/vshankar-2022-05-01_13:18:44-fs-wip-vshankar-testing1-20220428-204527-testing-default-smithi/6817573/

Please rerun the failed jobs from test run

    https://pulpito.ceph.com/vshankar-2022-05-01_13:18:44-fs-wip-vshankar-testing1-20220428-204527-testing-default-smithi/

all failed/dead just to be sure that this does not break anything else.

EDIT: and post the link here.

nmshelke · 2022-05-24T09:42:10Z

I had a discussion with @vshankar (Please update if you find missing any point)
Regarding failure: https://pulpito.ceph.com/nshelke-2022-05-18_07:35:50-fs-wip-nshelke-pools-testing-testing-default-smithi/6839508/
Please find points discussed as below:

ceph fs new command having option --force and --allow_dangerous_metadata_overlay. These options provides flexibility to reuse data/metadata pools while new fs creation. Hence we should skip checking pool is in-use or not if any of above option provided.
Please find use of --force option.
a. --force option is useful if we want to use metadata pool already having objects.
b. Erasure coded pool is not encouraged as default data pool, but --force option allows you to set EC pool as default data pool.
If RADOS pools are already in-use then --allow_dangerous_metadata_overlay is permit you to reuse pools for new fs. This should ONLY be done in emergencies and after careful reading of the documentation.
Documentation for options --force and --allow_dangerous_metadata_overlay should have more details about their usage.
Also update test and provide --allow_dangerous_metadata_overlay option if we are using already in-use data pool

vshankar

Nice work @nmshelke

doc/cephfs/createfs.rst

vshankar

@nmshelke PTAL - https://pulpito.ceph.com/vshankar-2022-06-03_10:03:27-fs-wip-vshankar-testing1-20220603-134300-testing-default-smithi/6862171/

2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:FAIL: test_fs_rename_fs_new_fails_with_old_fsname_existing_pools (tasks.cephfs.test_admin.TestRenameCommand)
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_44457532553f59daa44f901930401f299f7ef2a5/qa/tasks/cephfs/test_admin.py", line 676, in test_fs_rename_fs_new_fails_with_old_fsname_existing_pools
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:    self.fail("expected creating new file system with old name, "
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:AssertionError: expected creating new file system with old name, existing pools, and --force flag to fail.
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------

vshankar · 2022-06-09T12:53:07Z

@nmshelke could you please run this through vstart_runner and share the run?

pool should not be shared between multiple file system. Hence adding check to verify pool is already not in use. Fixes: https://tracker.ceph.com/issues/54111 Signed-off-by: Nikhilkumar Shelke <nshelke@redhat.com>

Fixes: https://tracker.ceph.com/issues/54111 Signed-off-by: Nikhilkumar Shelke <nshelke@redhat.com>

nmshelke · 2022-06-13T03:39:07Z

@nmshelke could you please run this through vstart_runner and share the run?

@vshankar Updated code, now tests are passing (confirmed through vstart_runner)

nmshelke · 2022-06-13T03:40:08Z

@nmshelke PTAL - https://pulpito.ceph.com/vshankar-2022-06-03_10:03:27-fs-wip-vshankar-testing1-20220603-134300-testing-default-smithi/6862171/

2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:FAIL: test_fs_rename_fs_new_fails_with_old_fsname_existing_pools (tasks.cephfs.test_admin.TestRenameCommand)
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_44457532553f59daa44f901930401f299f7ef2a5/qa/tasks/cephfs/test_admin.py", line 676, in test_fs_rename_fs_new_fails_with_old_fsname_existing_pools
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:    self.fail("expected creating new file system with old name, "
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:AssertionError: expected creating new file system with old name, existing pools, and --force flag to fail.
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------

@vshankar Fixed the failure.
Could you please add changes in the test branch again?

vshankar · 2022-06-13T03:54:25Z

@nmshelke PTAL - https://pulpito.ceph.com/vshankar-2022-06-03_10:03:27-fs-wip-vshankar-testing1-20220603-134300-testing-default-smithi/6862171/

2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:FAIL: test_fs_rename_fs_new_fails_with_old_fsname_existing_pools (tasks.cephfs.test_admin.TestRenameCommand)
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_44457532553f59daa44f901930401f299f7ef2a5/qa/tasks/cephfs/test_admin.py", line 676, in test_fs_rename_fs_new_fails_with_old_fsname_existing_pools
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:    self.fail("expected creating new file system with old name, "
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:AssertionError: expected creating new file system with old name, existing pools, and --force flag to fail.
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------

@vshankar Fixed the failure. Could you please add changes in the test branch again?

I suggested that you run this through vstart_runner as share the result.

nmshelke · 2022-06-13T06:10:40Z

@nmshelke PTAL - https://pulpito.ceph.com/vshankar-2022-06-03_10:03:27-fs-wip-vshankar-testing1-20220603-134300-testing-default-smithi/6862171/

2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:FAIL: test_fs_rename_fs_new_fails_with_old_fsname_existing_pools (tasks.cephfs.test_admin.TestRenameCommand)
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_44457532553f59daa44f901930401f299f7ef2a5/qa/tasks/cephfs/test_admin.py", line 676, in test_fs_rename_fs_new_fails_with_old_fsname_existing_pools
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:    self.fail("expected creating new file system with old name, "
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:AssertionError: expected creating new file system with old name, existing pools, and --force flag to fail.
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:
2022-06-03T13:25:54.555 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------

@vshankar Fixed the failure. Could you please add changes in the test branch again?

I suggested that you run this through vstart_runner as share the result.

@vshankar Please find test results through vstart_runner:

2022-06-13 11:39:23,988.988 INFO:__main__:Stopped test: test_fs_rename_fs_new_fails_with_old_fsname_existing_pools (tasks.cephfs.test_admin.TestRenameCommand)
That after renaming a file system, creating a file system with in 32.401316s
2022-06-13 11:39:23,989.989 INFO:__main__:test_fs_rename_fs_new_fails_with_old_fsname_existing_pools (tasks.cephfs.test_admin.TestRenameCommand)
2022-06-13 11:39:23,989.989 INFO:__main__:That after renaming a file system, creating a file system with ... ok
2022-06-13 11:39:23,989.989 INFO:__main__:
2022-06-13 11:39:23,989.989 INFO:__main__:----------------------------------------------------------------------
2022-06-13 11:39:23,989.989 INFO:__main__:Ran 1 test in 32.402s
2022-06-13 11:39:23,989.989 INFO:__main__:
2022-06-13 11:39:23,989.989 INFO:__main__:
2022-06-13 11:39:23,990.990 DEBUG:__main__:> ip netns list
2022-06-13 11:39:23,993.993 DEBUG:__main__:> sudo ip link delete ceph-brx
Cannot find device "ceph-brx"
2022-06-13 11:39:24,028.028 INFO:__main__:OK
2022-06-13 11:39:24,028.028 INFO:__main__:

vshankar · 2022-06-14T09:48:09Z

@nmshelke - include in my test branch.

vshankar

https://tracker.ceph.com/projects/cephfs/wiki/Master#2022-June-20

ljflores · 2022-06-24T15:19:35Z

@neesingh-rh @vshankar this has caused a regression in the rados suite: https://tracker.ceph.com/issues/56384

We caught the bug on the Pacific backport since that branch went through the rados suite, but it looks like this one only went through the fs suite.

vshankar · 2022-06-24T15:31:33Z

@neesingh-rh @vshankar this has caused a regression in the rados suite: https://tracker.ceph.com/issues/56384

We caught the bug on the Pacific backport since that branch went through the rados suite, but it looks like this one only went through the fs suite.

Sorry for that. (BTW, we are not backporting this change to p/q releases).

ljflores · 2022-06-24T15:52:04Z

Good to know, @vshankar!

ljflores · 2022-06-24T18:42:23Z

@vshankar is there an ETA for a fix/revert? Just so I can communicate this to @yuriw as he tests main PRs.

vshankar · 2022-06-27T01:35:26Z

@vshankar is there an ETA for a fix/revert? Just so I can communicate this to @yuriw as he tests main PRs.

@nmshelke PTAL at the failure. I'm a bit tied up today morning with some other high prio stuff.

github-actions bot added cephfs Ceph File System core mon labels Feb 4, 2022

nmshelke requested review from a team and mchangir February 4, 2022 18:10

nmshelke force-pushed the fix-54111 branch from fbddd99 to 6da00fc Compare February 4, 2022 18:33

lxbsz reviewed Feb 7, 2022

View reviewed changes

src/mon/FSCommands.cc Outdated Show resolved Hide resolved

github-actions bot added the tests label Feb 7, 2022

nmshelke force-pushed the fix-54111 branch from df87332 to e5ff05a Compare February 7, 2022 18:03

nmshelke requested a review from lxbsz February 7, 2022 18:22

lxbsz reviewed Feb 8, 2022

View reviewed changes

qa/tasks/cephfs/test_admin.py Outdated Show resolved Hide resolved

qa/tasks/cephfs/test_admin.py Outdated Show resolved Hide resolved

qa/tasks/cephfs/test_admin.py Outdated Show resolved Hide resolved

mchangir reviewed Feb 8, 2022

View reviewed changes

src/mon/FSCommands.cc Outdated Show resolved Hide resolved

src/mon/FSCommands.cc Outdated Show resolved Hide resolved

nmshelke force-pushed the fix-54111 branch from e5ff05a to 5e15cd5 Compare February 8, 2022 10:50

nmshelke requested review from lxbsz and mchangir February 8, 2022 13:37

batrick requested changes Feb 8, 2022

View reviewed changes

nmshelke force-pushed the fix-54111 branch from 0520244 to cf347ba Compare February 9, 2022 14:17

batrick requested changes Feb 9, 2022

View reviewed changes

src/mon/FSCommands.cc Outdated Show resolved Hide resolved

src/mon/FSCommands.cc Show resolved Hide resolved

src/mon/FSCommands.cc Outdated Show resolved Hide resolved

nmshelke force-pushed the fix-54111 branch 2 times, most recently from ed7533d to a00daf3 Compare February 9, 2022 18:30

lxbsz reviewed Feb 10, 2022

View reviewed changes

qa/tasks/cephfs/test_admin.py Outdated Show resolved Hide resolved

lxbsz reviewed Feb 10, 2022

View reviewed changes

src/mon/FSCommands.cc Show resolved Hide resolved

src/mon/FSCommands.h Outdated Show resolved Hide resolved

nmshelke force-pushed the fix-54111 branch 2 times, most recently from 63178f3 to bf2ae22 Compare February 10, 2022 06:35

nmshelke requested a review from lxbsz February 10, 2022 06:51

nmshelke requested a review from batrick February 11, 2022 15:53

nmshelke force-pushed the fix-54111 branch from 88ea61a to 15ee0f1 Compare May 25, 2022 07:51

nmshelke requested a review from a team as a code owner May 25, 2022 07:51

github-actions bot added the documentation label May 25, 2022

nmshelke force-pushed the fix-54111 branch from 15ee0f1 to ba684aa Compare May 25, 2022 08:34

nmshelke changed the base branch from master to main May 25, 2022 11:04

vshankar reviewed May 25, 2022

View reviewed changes

doc/cephfs/createfs.rst Outdated Show resolved Hide resolved

doc/cephfs/createfs.rst Outdated Show resolved Hide resolved

doc/cephfs/createfs.rst Outdated Show resolved Hide resolved

anthonyeleven reviewed May 26, 2022

View reviewed changes

doc/cephfs/createfs.rst Outdated Show resolved Hide resolved

anthonyeleven reviewed May 26, 2022

View reviewed changes

doc/cephfs/createfs.rst Show resolved Hide resolved

anthonyeleven requested changes May 26, 2022

View reviewed changes

doc/cephfs/createfs.rst Outdated Show resolved Hide resolved

anthonyeleven approved these changes May 26, 2022

View reviewed changes

vshankar requested changes Jun 6, 2022

View reviewed changes

nmshelke added 3 commits June 12, 2022 16:24

mon: verify pool is already not in use by any other app or fs

55b7e28

pool should not be shared between multiple file system. Hence adding check to verify pool is already not in use. Fixes: https://tracker.ceph.com/issues/54111 Signed-off-by: Nikhilkumar Shelke <nshelke@redhat.com>

qa: verify command status if data or metadata pool already in use

0f4add6

Fixes: https://tracker.ceph.com/issues/54111 Signed-off-by: Nikhilkumar Shelke <nshelke@redhat.com>

docs: add details about all options used in 'ceph fs new' command

2b0ffbb

Fixes: https://tracker.ceph.com/issues/54111 Signed-off-by: Nikhilkumar Shelke <nshelke@redhat.com>

vshankar approved these changes Jun 20, 2022

View reviewed changes

This was referenced Jun 20, 2022

[DNM] pacific: mon: verify data pool is already not in use by any file system #46751

Closed

quincy: mon: verify data pool is already not in use by any file system #46752

Closed

	if (!force && !pool->application_metadata.empty() &&
	pool->application_metadata.count(
	pg_pool_t::APPLICATION_NAME_CEPHFS) == 0) {
	*ss << " pool '" << pool_name << "' (id '" << pool_id
	<< "') has a non-CephFS application enabled.";
	return -EINVAL;
	}

	mon->osdmon()->do_application_enable(poolid,
	pg_pool_t::APPLICATION_NAME_CEPHFS,
	"data", fs_name, true);

Conversation

nmshelke commented Feb 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mchangir left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nmshelke commented Feb 8, 2022

Uh oh!

batrick Feb 8, 2022

Choose a reason for hiding this comment

Uh oh!

nmshelke Feb 11, 2022

Choose a reason for hiding this comment

Uh oh!

gregsfortytwo commented Feb 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nmshelke commented Feb 8, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lxbsz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nmshelke commented Feb 10, 2022

Uh oh!

vshankar commented May 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nmshelke commented May 24, 2022

Uh oh!

vshankar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vshankar left a comment

Choose a reason for hiding this comment

Uh oh!

vshankar commented Jun 9, 2022

Uh oh!

nmshelke commented Jun 13, 2022

Uh oh!

nmshelke commented Jun 13, 2022

Uh oh!

vshankar commented Jun 13, 2022

Uh oh!

nmshelke commented Jun 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vshankar commented Jun 14, 2022

Uh oh!

vshankar left a comment

Choose a reason for hiding this comment

Uh oh!

ljflores commented Jun 24, 2022

Uh oh!

vshankar commented Jun 24, 2022

Uh oh!

ljflores commented Jun 24, 2022

Uh oh!

ljflores commented Jun 24, 2022

Uh oh!

vshankar commented Jun 27, 2022

nmshelke commented Feb 4, 2022 •

edited

Loading

gregsfortytwo commented Feb 8, 2022 •

edited

Loading

vshankar commented May 12, 2022 •

edited

Loading

nmshelke commented Jun 13, 2022 •

edited

Loading