Skip to content

debian: recursively adjust permissions of /var/lib/ceph/crash#55917

Merged
tchaikov merged 1 commit intoceph:mainfrom
Aequitosh:fix-ceph-crash-permissions
Jul 8, 2024
Merged

debian: recursively adjust permissions of /var/lib/ceph/crash#55917
tchaikov merged 1 commit intoceph:mainfrom
Aequitosh:fix-ceph-crash-permissions

Conversation

@Aequitosh
Copy link
Contributor

A rather recent PR made ceph-crash run as ceph user instead of root1. However, because /var/lib/ceph/crash/posted belongs to root, ceph-crash cannot actually post any crash logs now.

This commit fixes this by recursively updating the permissions of /var/lib/ceph/crash, which ensures that all files and directories used by ceph-crash.service are actually owned by the user configured for Ceph.

The previously existing loop has also been replaced by an invocation of find | xargs.

Fixes: https://tracker.ceph.com/issues/64548

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)

Further Notes

I'm not 100% sure if this is the best way to fix this issue, so please let me know if you prefer a different solution. This patch is originally from our mailing list2 - we've decided that it's what works best for us, due to find | xargs being a bit more fine-grained than, let's say, something like for FILE in /var/lib/ceph/**/*; do ...; done.

Also, should this get merged (in whatever form), I'll also gladly provide backports for Reef and Quincy.

Footnotes

  1. https://github.com/ceph/ceph/pull/48713

  2. https://lists.proxmox.com/pipermail/pve-devel/2024-February/061803.html

ProxBot pushed a commit to proxmox/ceph that referenced this pull request Mar 11, 2024
Ceph has a postinst hook that sets the ownership of '/var/lib/ceph/*'
to ceph:ceph (in our case), but misses out on the contents of
'/var/lib/ceph/crash'.

This patch therefore also recursively updates the permissions of
'/var/lib/ceph/crash'.

The change was also proposed upstream [0].

[0]: ceph/ceph#55917

Signed-off-by: Max Carrara <m.carrara@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
ProxBot pushed a commit to proxmox/ceph that referenced this pull request Mar 11, 2024
Ceph has a postinst hook that sets the ownership of '/var/lib/ceph/*'
to ceph:ceph (in our case), but misses out on the contents of
'/var/lib/ceph/crash'.

This patch therefore also recursively updates the permissions of
'/var/lib/ceph/crash'.

The change was also proposed upstream [0].

[0]: ceph/ceph#55917

Signed-off-by: Max Carrara <m.carrara@proxmox.com>
ProxBot pushed a commit to proxmox/ceph that referenced this pull request Mar 12, 2024
Ceph has a postinst hook that sets the ownership of '/var/lib/ceph/*'
to ceph:ceph (in our case), but misses out on the contents of
'/var/lib/ceph/crash'.

This patch therefore also recursively updates the permissions of
'/var/lib/ceph/crash'.

The change was also proposed upstream [0].

[0]: ceph/ceph#55917

Signed-off-by: Max Carrara <m.carrara@proxmox.com>
(cherry picked from commit d52158f)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
@tchaikov tchaikov self-requested a review March 21, 2024 04:06
@frittentheke
Copy link
Contributor

@tchaikov could this please be backported to at least Reef to be fixed with the next point release?

@Aequitosh
Copy link
Contributor Author

@tchaikov could this please be backported to at least Reef to be fixed with the next point release?

I can also provide backports for Reef and Quincy if you don't mind - either once this gets merged or whenever you wish.

@tchaikov
Copy link
Contributor

tchaikov commented Apr 3, 2024

@tchaikov could this please be backported to at least Reef to be fixed with the next point release?

@frittentheke hi Christian, i think so. since https://tracker.ceph.com/issues/64548 includes Reef in its "Backport" field.

and sorry for the latency, gentlemen. i will try to squeeze more time for a proper review asap.

@tchaikov could this please be backported to at least Reef to be fixed with the next point release?

I can also provide backports for Reef and Quincy if you don't mind - either once this gets merged or whenever you wish.

better off waiting until this PR gets merged, so that we can reference it with the correct sha1 in the backport PR.

@frittentheke
Copy link
Contributor

frittentheke commented Apr 3, 2024

@frittentheke hi Christian, i think so. since https://tracker.ceph.com/issues/64548 includes Reef in its "Backport" field.
and sorry for the latency, gentlemen. i will try to squeeze more time for a proper review asap.

Thanks @tchaikov for the quick update!

While we certainly adjusted this on our clusters already, I am just really afraid this causes less crash reports to come in from other installs out there. Less crash reports, means potentially more undiscovered bugs. Selfish me 🙈

[ -x /sbin/start ] && start ceph-all || :

# adjust file and directory permissions
for DIR in /var/lib/ceph/* ; do
Copy link
Contributor

@frittentheke frittentheke Apr 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just add the missing /var/lib/ceph/crash folder here? The new code treats this folder individually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, what we need to do is to fix the permissions of the directories listed by debian/ceph-base.dirs, right? if that's the case, i concur with @frittentheke , that probably we can just add var/lib/ceph/crash here, like

for DIR in /var/lib/ceph/* /var/lib/ceph/crash/posted; do
    if ! dpkg-statoverride --list $DIR >/dev/null
    then
        chown $SERVER_USER:$SERVER_GROUP $DIR
    fi
done

please note, i am listing /var/lib/ceph/crash/posted here, as /var/lib/ceph/crash has been covered by /var/lib/ceph/*.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just add the missing /var/lib/ceph/crash folder here? The new code treats this folder individually.

That's actually a very fair point. In some iteration of this patch I ended up using the find+xargs approach, as that seemed just more logical at the time. (IIRC it was a globbing thing first that then became redundant anyway; the find+xargs combo remained anyway.)

for DIR in /var/lib/ceph/* /var/lib/ceph/crash/posted; do
    if ! dpkg-statoverride --list $DIR >/dev/null
    then
        chown $SERVER_USER:$SERVER_GROUP $DIR
    fi
done

please note, i am listing /var/lib/ceph/crash/posted here, as /var/lib/ceph/crash has been covered by /var/lib/ceph/*.

While this does obviously chown the crash dir, it doesn't update the already existing entries in crash/posted that might've been created as root prior to #48713. That's why I chose to treat /var/lib/ceph/crash/ separately.


# adjust file and directory permissions
find /var/lib/ceph -mindepth 1 -maxdepth 1 -print0 \
| xargs -0 -I '{}' sh -c "${PERM_COMMAND}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why restrict find to maxdepth 1 here and not let it recurse to cover all the files?
Or maybe do a chown -r to ensure all dirs and files below are owned by cheph:ceph?
This avoids the need to deal with /var/lib/ceph/crash in yet another find + xargs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't 100% sure what's appropriate here - I simply wanted to preserve the original behavior, as I wasn't certain whether something deep inside /var/lib/ceph would perhaps require different ownership (at some point).

@frittentheke
Copy link
Contributor

@tchaikov sorry to nag about this again, but I just noticed 18.2.3 QA phase already started (https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/KRC33HNRBCZC5Z2GUDN5DQO43I2RJBJJ/).

This PR is "just" about fixing some file permissions and we are likely discussion implementations styles.
But FWIW, I left a few lines of review myself.

Copy link
Contributor

@tchaikov tchaikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd suggest just add /var/lib/ceph/crash to /var/lib/ceph/*, without adding the find commands. simpler this way.

[ -x /sbin/start ] && start ceph-all || :

# adjust file and directory permissions
for DIR in /var/lib/ceph/* ; do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, what we need to do is to fix the permissions of the directories listed by debian/ceph-base.dirs, right? if that's the case, i concur with @frittentheke , that probably we can just add var/lib/ceph/crash here, like

for DIR in /var/lib/ceph/* /var/lib/ceph/crash/posted; do
    if ! dpkg-statoverride --list $DIR >/dev/null
    then
        chown $SERVER_USER:$SERVER_GROUP $DIR
    fi
done

please note, i am listing /var/lib/ceph/crash/posted here, as /var/lib/ceph/crash has been covered by /var/lib/ceph/*.

@tchaikov
Copy link
Contributor

@tchaikov sorry to nag about this again, but I just noticed 18.2.3 QA phase already started (https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/KRC33HNRBCZC5Z2GUDN5DQO43I2RJBJJ/).

This PR is "just" about fixing some file permissions and we are likely discussion implementations styles. But FWIW, I left a few lines of review myself.

@frittentheke my apologies. indeed.

@frittentheke
Copy link
Contributor

@frittentheke my apologies. indeed.

No need @tchaikov ... I am still just a selfish dude looking for other installations' crash reports to make my Ceph installations more stable ;-)

Speaking of Reef ... any remote chance this change can still make it into 18.2.3 ?

@ljflores ljflores added the core label Apr 18, 2024
@ljflores ljflores modified the milestones: 18.2.14, v18.2.4 Apr 18, 2024
@yuriw yuriw modified the milestone: v18.2.4 Apr 18, 2024
@ljflores ljflores removed this from the v18.2.4 milestone Apr 18, 2024
@ljflores
Copy link
Member

I should actually add the milestone to the Reef backport when it's ready. Instead, I've added the "needs-reef-backport" label.

@Aequitosh
Copy link
Contributor Author

@frittentheke @tchaikov Thanks for the reviews, much appreciated! I left some more comments inline.

What's now obvious to me is that the original loop can just stay - the find+xargs replacement isn't necessary.

However, I'd personally suggest to do something like chown -R /var/lib/ceph/crash/*, as some entries in /var/lib/ceph/crash/posted might still belong to root e.g. - entries that were posted prior to #48713.

I'll update the PR accordingly, if you don't object.

@frittentheke
Copy link
Contributor

However, I'd personally suggest to do something like chown -R /var/lib/ceph/crash/*, as some entries in /var/lib/ceph/crash/posted might still belong to root e.g. - entries that were posted prior to #48713.

I'll update the PR accordingly, if you don't object.

FWIW, I like the idea to "ensure" (and potentially fix) ownership also for existing files.

@frittentheke
Copy link
Contributor

@Aequitosh @tchaikov could I nag you once more to finalize the discussion on how to implement the chown ?

@tchaikov
Copy link
Contributor

@frittentheke @tchaikov Thanks for the reviews, much appreciated! I left some more comments inline.

What's now obvious to me is that the original loop can just stay - the find+xargs replacement isn't necessary.

However, I'd personally suggest to do something like chown -R /var/lib/ceph/crash/*, as some entries in /var/lib/ceph/crash/posted might still belong to root e.g. - entries that were posted prior to #48713.

I'll update the PR accordingly, if you don't object.

@Aequitosh hi Max, please go ahead. i didn't take the "existing files" use case into consideration. but if that case exists, then, yeah, i concur with you.

A rather recent PR made ceph-crash run as "ceph" user instead of
root [0]. However, because /var/lib/ceph/crash/posted belongs to root,
ceph-crash cannot actually post any crash logs now.

This commit fixes this by recursively updating the permissions of
'/var/lib/ceph/crash', which ensures that all files and directories
used by 'ceph-crash.service' are actually owned by the user configured
for Ceph. This also accounts for existing installations.

Additionally, quote interpolated variables and use curly braces [1].

[0]: ceph#48713
[1]: https://www.shellcheck.net/wiki/SC2086

Fixes: https://tracker.ceph.com/issues/64548
Signed-off-by: Max Carrara <m.carrara@proxmox.com>
@Aequitosh Aequitosh force-pushed the fix-ceph-crash-permissions branch from 76df6f5 to 97a1ec4 Compare April 30, 2024 15:45
@Aequitosh
Copy link
Contributor Author

Updated the PR. See the commit message for details.

Basically did as I described, but also quoted variables where they're interpolated. Also added curly braces there.

I highly doubt that globbing & word splitting are ever going to be issues, but still wanted to include it for good measure. (Though, should that not be desired, I can quickly update the PR again.)

Copy link
Contributor

@tchaikov tchaikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@frittentheke
Copy link
Contributor

Is this fix on track to be merged?

@ljflores
Copy link
Member

ljflores commented Jul 3, 2024

@frittentheke I'm not sure what happened with the testing. It will be included in the next main batch.

Copy link
Contributor

@pereman2 pereman2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchaikov tchaikov merged commit 85d4bc8 into ceph:main Jul 8, 2024
NitzanMordhai pushed a commit to NitzanMordhai/ceph that referenced this pull request Aug 1, 2024
debian: recursively adjust permissions of /var/lib/ceph/crash

Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
Reviewed-by: Kefu Chai <tchaikov@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants