Skip to content

mgr/vol: include group name in subvolume's pool namespace name#62668

Merged
rishabh-d-dave merged 5 commits intoceph:mainfrom
rishabh-d-dave:vols-namespace
Jun 26, 2025
Merged

mgr/vol: include group name in subvolume's pool namespace name#62668
rishabh-d-dave merged 5 commits intoceph:mainfrom
rishabh-d-dave:vols-namespace

Conversation

@rishabh-d-dave
Copy link
Contributor

Include subvolume group name along with subvolume name in the pool
namepsace string for the subvolume. IOW, include it in the string that
serves as the value of the extended attribute
"ceph.dir.layout.pool_namespace", which is set for a subvolume during
its creation.

Right now, due to not including it, two different subvolumes with same
name but located in different subvolume groups undesirably end up in the
same namespace.

Fixes: https://tracker.ceph.com/issues/70668

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

@gregsfortytwo
Copy link
Member

I presume this vol_spec.fs_namespace eventually gets turned into a directory layout. That layout is used when first writing to a file to determine which rados namespace to place it in — and rados namespaces impact placement and are basically just a special zone of the object name. (Object "foo" in namespace "bar" is a completely different object than one named "foo" that has no namespace!)

That means that any newly-created files will be written to the new namespace, but any old files will be in the old namespace. Changing directory layouts can't actively move any data.
If at any point we use this xattr to actually try and look at a rados object (unlikely), that will break on pre-change objects.

Rishabh brought up a good point to me as well about the cephx access being granted per-namespace: if you change the namespace in use, any previously-generated cephx keys that are granted access to the old namespace won't be able to access the new one, and vice-versa.

I don't have any good solutions to how we would move data within rados, nor change namespaces. I'm not sure from the security perspective what our best options are here, but we definitely need planning to not inadvertently report a different namespace from how a subvolume was created.

@vshankar
Copy link
Contributor

I presume this vol_spec.fs_namespace eventually gets turned into a directory layout. That layout is used when first writing to a file to determine which rados namespace to place it in — and rados namespaces impact placement and are basically just a special zone of the object name. (Object "foo" in namespace "bar" is a completely different object than one named "foo" that has no namespace!)

That means that any newly-created files will be written to the new namespace, but any old files will be in the old namespace. Changing directory layouts can't actively move any data. If at any point we use this xattr to actually try and look at a rados object (unlikely), that will break on pre-change objects.

That is what I was looking for as an answer to my question here: #62668 (comment)

So, unless we do some data copy magic with the volumes plugin, changing namespace on-the-fly isn't the way to go forward. What options do we have?

@vshankar
Copy link
Contributor

vshankar commented May 6, 2025

So, looks like the only way forward is to have this fix for newly created subvolumes. There isn't any form of security leakages as the cephx keys are path restricted.

cc @kotreshhr @mchangir for review and testing.

@rishabh-d-dave
Copy link
Contributor Author

jenkins test make check

@rishabh-d-dave
Copy link
Contributor Author

jenkins test make check arm64

@rishabh-d-dave
Copy link
Contributor Author

jenkins test windows

@rishabh-d-dave
Copy link
Contributor Author

jenkins test make check

@rishabh-d-dave
Copy link
Contributor Author

jenkins test make check arm64

@rishabh-d-dave
Copy link
Contributor Author

jenkins test make check

@rishabh-d-dave
Copy link
Contributor Author

jenkins test make check arm64

@rishabh-d-dave
Copy link
Contributor Author

jenkins test make check

Copy link
Contributor

@kotreshhr kotreshhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good otherwise

@kotreshhr
Copy link
Contributor

So, looks like the only way forward is to have this fix for newly created subvolumes. There isn't any form of security leakages as the cephx keys are path restricted.

This makes sense to me.

cc @kotreshhr @mchangir for review and testing.

@rishabh-d-dave
Copy link
Contributor Author

@kotreshhr PTAL

@rishabh-d-dave
Copy link
Contributor Author

jenkins test make check arm64

@rishabh-d-dave
Copy link
Contributor Author

jenkins test windows

@rishabh-d-dave rishabh-d-dave added the wip-rishabh-testing Rishabh's testing label label May 29, 2025
Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to mention this in the doc - the volumes document and in PRN. It should go as a NOTE and not as a warning since there isn't any security aspects to this.

namespaces.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
@rishabh-d-dave rishabh-d-dave requested a review from a team as a code owner June 3, 2025 19:00
@rishabh-d-dave
Copy link
Contributor Author

rishabh-d-dave commented Jun 3, 2025

@vshankar PTAL

@github-actions
Copy link

github-actions bot commented Jun 4, 2025

Config Diff Tool Output

- removed: rgw_bucket_eexist_override
- removed: breakpad
- removed: bluestore_allocator_lookup_policy
! changed: seastore_max_concurrent_transactions: old: maximum concurrent transactions that seastore allows (per reactor)
! changed: seastore_max_concurrent_transactions: new: maximum concurrent transactions that seastore allows
! changed: seastore_max_concurrent_transactions: old: 128
! changed: seastore_max_concurrent_transactions: new: 8
! changed: seastore_cache_lru_size: old: Size in bytes of extents to keep in cache (per reactor).
! changed: seastore_cache_lru_size: new: Size in bytes of extents to keep in cache.
! changed: seastore_cache_lru_size: old: 2_G
! changed: seastore_cache_lru_size: new: 64_M
! changed: rgw_sts_key: old: Key used for encrypting/ decrypting role session tokens. This key must consist of 16 hexadecimal characters, which can be generated by the command 'openssl rand -hex 16'. All radosgw instances in a zone should use the same key. In multisite configurations, all zones in a realm should use the same key.
! changed: rgw_sts_key: new: Key used for encrypting/ decrypting session token.
! changed: osd_deep_scrub_interval: old: Deep scrub each PG (i.e., verify data checksums) at least this often. Note that this option must be set at ``global`` scope, or for both ``mgr`` and``osd``.
! changed: osd_deep_scrub_interval: new: Deep scrub each PG (i.e., verify data checksums) at least this often
! changed: osd_deep_scrub_interval_cv: old: deep scrub intervals are varied by a random amount to prevent stampedes. This parameter determines the amount of variation. Technically ``osd_deep_scrub_interval_cv`` is the coefficient of variation for the deep scrub interval.
! changed: osd_deep_scrub_interval_cv: new: deep scrub intervals are varied by a random amount to prevent stampedes. This parameter determines the amount of variation. Technically - osd_deep_scrub_interval_cv is the coefficient of variation for the deep scrub interval.
! changed: osd_deep_scrub_interval_cv: old: The coefficient of variation for the deep scrub interval, specified as a ratio. On average, the next deep scrub for a PG is scheduled osd_deep_scrub_interval after the last deep scrub . The actual time is randomized to a normal distribution with a standard deviation of osd_deep_scrub_interval * osd_deep_scrub_interval_cv (clamped to within 2 standard deviations). The default value guarantees that 95% of deep scrubs will be scheduled in the range [0.8 * osd_deep_scrub_interval, 1.2 * osd_deep_scrub_interval].
! changed: osd_deep_scrub_interval_cv: new: The coefficient of variation for the deep scrub interval, specified as a ratio. On average, the next deep scrub for a PG is scheduled osd_deep_scrub_interval after the last deep scrub . The actual time is randomized to a normal distribution with a standard deviation of osd_deep_scrub_interval * osd_deep_scrub_interval_cv (clamped to within 2 standard deviations). The default value guarantees that 95% of the deep scrubs will be scheduled in the range [0.8 * osd_deep_scrub_interval, 1.2 * osd_deep_scrub_interval].
! changed: osd_scrub_min_interval: old: The desired interval between scrubs of a specific PG. Note that this option must be set at ``global`` scope, or for both ``mgr`` and``osd``.
! changed: osd_scrub_min_interval: new: The desired interval between scrubs of a specific PG.
! changed: osd_scrub_max_interval: old: Scrub each PG no less often than this interval. Note that this option must be set at ``global`` scope, or for both ``mgr`` and``osd``.
! changed: osd_scrub_max_interval: new: Scrub each PG no less often than this interval

The above configuration changes are found in the PR. Please update the relevant release documentation if necessary.

@rishabh-d-dave
Copy link
Contributor Author

jenkins test api

@rishabh-d-dave rishabh-d-dave force-pushed the vols-namespace branch 2 times, most recently from 181821e to 8bd4f22 Compare June 5, 2025 12:03
Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

namespace of CephFS volumes.

Signed-off-by: Rishabh Dave <ridave@redhat.com>
@rishabh-d-dave
Copy link
Contributor Author

jenkins test make check

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rishabh-d-dave
Copy link
Contributor Author

This PR is under test in https://tracker.ceph.com/issues/71663.

@rishabh-d-dave rishabh-d-dave removed the wip-rishabh-testing Rishabh's testing label label Jun 16, 2025
@rishabh-d-dave
Copy link
Contributor Author

This PR is under test in https://tracker.ceph.com/issues/71663.

A large number of jobs failed to an unrelated error, deferring QA run until it is fixed.

@rishabh-d-dave rishabh-d-dave added the wip-rishabh-testing Rishabh's testing label label Jun 17, 2025
@rishabh-d-dave
Copy link
Contributor Author

This PR is under test in https://tracker.ceph.com/issues/71703.

@rishabh-d-dave
Copy link
Contributor Author

@rishabh-d-dave rishabh-d-dave merged commit eb46abf into ceph:main Jun 26, 2025
12 checks passed
@rishabh-d-dave rishabh-d-dave deleted the vols-namespace branch June 26, 2025 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants