Skip to content

mon/MonCap: Update osd profile to allow cmd to set iops capacity on mon db#42853

Merged
yuriw merged 2 commits intoceph:masterfrom
sseshasa:wip-fix-vstart-mon-permissions
Sep 9, 2021
Merged

mon/MonCap: Update osd profile to allow cmd to set iops capacity on mon db#42853
yuriw merged 2 commits intoceph:masterfrom
sseshasa:wip-fix-vstart-mon-permissions

Conversation

@sseshasa
Copy link
Contributor

@sseshasa sseshasa commented Aug 19, 2021

The default mon caps for osds is set to "allow profile osd", which allows
only "rw" capability. Osds with mclock scheduler enabled store their max
iops capacity on the mon config store. This can be achieved by executing
the "config set" command. However, since the osd(s) by default do not have
the execute permission, the command fails with "Permission denied" error.

Therefore, modify the default osd profile to allow running the "config set"
command with restriction to only set keys with name matching either (regex)
"osd_mclock_max_capacity_iops_hdd" or "osd_mclock_max_capacity_iops_ssd"
so that the osd has the permission to update the mon config store with the
desired information.

Fixes: https://tracker.ceph.com/issues/52329
Signed-off-by: Sridhar Seshasayee sseshasa@redhat.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@sseshasa sseshasa requested review from jdurgin and neha-ojha August 19, 2021 14:33
@sseshasa sseshasa force-pushed the wip-fix-vstart-mon-permissions branch from 043482f to 21a7a0d Compare August 19, 2021 14:35
@sseshasa
Copy link
Contributor Author

sseshasa commented Aug 19, 2021

Mon logs showing that the command is capable after the fix is applied:

2021-08-23T17:37:18.136+0000 7fb6154cc700  1 -- [v2:172.21.3.216:40276/0,v1:172.21.3.216:40277/0] <== osd.0 v2:172.21.3.216:6836/1680917 9 ==== mon_command([{prefix=config set, name=osd_mclock_max_capacity_iops_hdd}] v 0) v1 ==== 149+0+0 (secure 0 0 0) 0x561b5a5c3e00 con 0x561b5a950800$
2021-08-23T17:37:18.136+0000 7fb6154cc700  0 mon.a@0(leader) e1 handle_command mon_command([{prefix=config set, name=osd_mclock_max_capacity_iops_hdd}] v 0) v1$
2021-08-23T17:37:18.136+0000 7fb6154cc700 10 mon.a@0(leader) e1 _allowed_command capable$
2021-08-23T17:37:18.136+0000 7fb6154cc700  7 mon.a@0(leader).config prepare_update mon_command([{prefix=config set, name=osd_mclock_max_capacity_iops_hdd}] v 0) v1 from osd.0 v2:172.21.3.216:6836/1680917

@sseshasa sseshasa force-pushed the wip-fix-vstart-mon-permissions branch from 21a7a0d to ed191e9 Compare August 20, 2021 13:13
@sseshasa sseshasa changed the title src/vstart: Update mon caps for osds to "rwx" mon/AuthMonitor: Update mon caps for osds to "rwx" Aug 20, 2021
@sseshasa sseshasa force-pushed the wip-fix-vstart-mon-permissions branch from ed191e9 to 471a175 Compare August 23, 2021 08:59
@sseshasa sseshasa changed the title mon/AuthMonitor: Update mon caps for osds to "rwx" mon/AuthMonitor: Update mon caps to allow osds to set their max capacity Aug 23, 2021
@sseshasa sseshasa force-pushed the wip-fix-vstart-mon-permissions branch from 471a175 to fd30128 Compare August 23, 2021 17:46
@sseshasa sseshasa changed the title mon/AuthMonitor: Update mon caps to allow osds to set their max capacity mon/MonCap: Update osd profile to allow cmd to set iops capacity on mon db Aug 23, 2021
@neha-ojha
Copy link
Member

Shouldn't we use default caps in teuthology testing to discover future issues like https://tracker.ceph.com/issues/52329, instead of overriding them in the ceph task?

@sseshasa
Copy link
Contributor Author

sseshasa commented Aug 24, 2021

Shouldn't we use default caps in teuthology testing to discover future issues like https://tracker.ceph.com/issues/52329, instead of overriding them in the ceph task?

Yes, I can try disabling the generation of the caps and run some teuthology jobs to see what effect it has.

@neha-ojha
Copy link
Member

Shouldn't we use default caps in teuthology testing to discover future issues like https://tracker.ceph.com/issues/52329, instead of overriding them in the ceph task?

Yes, I can try disabling the generation of the caps and run some teuthology jobs to see what effect it has.

@sseshasa sounds good, we can address this in a separate PR, if your test uncovers more issues

@github-actions github-actions bot added the tests label Sep 1, 2021
…on db

The default mon caps for osds is set to "allow profile osd", which allows
only "rw" capability. Osds with mclock scheduler enabled store their max
iops capacity on the mon config store. This can be achieved by executing
the "config set" command. However, since the osd(s) by default do not have
the execute permission, the command fails with "Permission denied" error.

Therefore, modify the default osd profile to allow running the "config set"
command with restriction to only set keys with name matching either (regex)
"osd_mclock_max_capacity_iops_hdd" or "osd_mclock_max_capacity_iops_ssd"
so that the osd has the permission to update the mon config store with the
desired information.

Fixes: https://tracker.ceph.com/issues/52329
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Assign the default caps for osds to be the same as what the AuthMonitor
sets for a new osd. See AuthMonitor::validate_osd_new() which sets the
following caps for a new osd:

 mon='allow profile osd'
 mgr='allow profile osd'
 osd=''allow *'

When an actual real world cluster is deployed, the above caps are applied.
Unless the user modifies the defaults, a cluster will operate with the
above caps. Therefore, it makes sense to use the defaults when testing
Ceph so that issues if any due to the default settings may be caught and
fixed.

Therefore, the caps for the 'osd' type is reset to the default in
generate_caps(). The caps for 'mgr' already reflects the system defaults.
The caps for 'mds' type is not changed in this commit and will be
investigated and changed if necessary later.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
@sseshasa sseshasa force-pushed the wip-fix-vstart-mon-permissions branch from 9cb64fe to 4b0dba2 Compare September 1, 2021 08:16
@sseshasa
Copy link
Contributor Author

sseshasa commented Sep 2, 2021

Pulpito Run: (With changes to default caps in teuthology script see)
http://pulpito.front.sepia.ceph.com/sseshasa-2021-09-01_13:45:19-rados-wip-sseshasa-testing-2021-09-01-1349-distro-basic-smithi/

None of the failures appear to be related. There were many dead jobs.

@neha-ojha @jdurgin Please review the changes. Once you review, this can go through another round of testing. Thanks.

@jdurgin
Copy link
Member

jdurgin commented Sep 7, 2021

@yuriw can you run this through another round of tests? the baseline looked quite clean this week so we should be able to tell if this is causing any problems easily

@sseshasa
Copy link
Contributor Author

sseshasa commented Sep 9, 2021

Teuthology Testing Result:

Original Run:
https://pulpito.ceph.com/yuriw-2021-09-07_21:38:25-rados-wip-yuri2-testing-2021-09-07-1258-distro-basic-smithi/

Re-run of dead and failed jobs from the above run:
https://pulpito.ceph.com/yuriw-2021-09-08_15:10:21-rados-wip-yuri2-testing-2021-09-07-1258-distro-basic-smithi/

Failures from the Re-run (Unrelated):

  1. https://pulpito.ceph.com/yuriw-2021-09-08_15:10:21-rados-wip-yuri2-testing-2021-09-07-1258-distro-basic-smithi/6379886

    osd.0 didn't respond to the following command after it was revived as part of _do_thrash().
    First error:

  • 2021-09-08T15:47:47.945 DEBUG:teuthology.orchestra.run.smithi064:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph -- tell osd.0 injectargs --filestore_debug_random_read_err=0.0

  • 2021-09-08T15:47:48.067 INFO:teuthology.orchestra.run.smithi064.stderr:Error ENXIO: problem getting command descriptions from osd.0
    New tracker added: https://tracker.ceph.com/issues/52562

  1. https://pulpito.ceph.com/yuriw-2021-09-08_15:10:21-rados-wip-yuri2-testing-2021-09-07-1258-distro-basic-smithi/6379887

    test_dashboard_e2e.sh failure in orchestrator/01-hosts-force-maintenance.e2e-spec.ts
    Tracker: https://tracker.ceph.com/issues/51728 (Fix under review)

  2. https://pulpito.ceph.com/yuriw-2021-09-08_15:10:21-rados-wip-yuri2-testing-2021-09-07-1258-distro-basic-smithi/6379888

    Rook related: 'check osd count' reached maximum tries
    Tracked by: https://tracker.ceph.com/issues/52321

  3. https://pulpito.ceph.com/yuriw-2021-09-08_15:10:21-rados-wip-yuri2-testing-2021-09-07-1258-distro-basic-smithi/6379899

    Rook related: 'check osd count' reached maximum tries
    Tracked by: https://tracker.ceph.com/issues/52321

  4. https://pulpito.ceph.com/yuriw-2021-09-08_15:10:21-rados-wip-yuri2-testing-2021-09-07-1258-distro-basic-smithi/6379901

    Cephadm related: 2021-09-08T15:25:50.215 INFO:teuthology.orchestra.run.smithi043.stderr:Error: OCI runtime error: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: Unit libpod 86831cf6c897f95a9d72d8d1b74dab77bc37c86c0b5dceba03fc16656e6f160d.scope not found. 2021-09-08T15:25:50.235 DEBUG:teuthology.orchestra.run:got remote process result: 127
    Tracked by: https://tracker.ceph.com/issues/49287

  5. https://pulpito.ceph.com/yuriw-2021-09-08_15:10:21-rados-wip-yuri2-testing-2021-09-07-1258-distro-basic-smithi/6379908

    First error reported: raise RuntimeError("Synthetic exception in serve")
    Test that failed: tasks.mgr.test_module_selftest.TestModuleSelftest.
    Tracked by: https://tracker.ceph.com/issues/38455

  6. https://pulpito.ceph.com/yuriw-2021-09-08_15:10:21-rados-wip-yuri2-testing-2021-09-07-1258-distro-basic-smithi/6379917

    Same as 2 above.

@yuriw yuriw merged commit 3b779e7 into ceph:master Sep 9, 2021
@sseshasa sseshasa deleted the wip-fix-vstart-mon-permissions branch September 9, 2021 17:16
sseshasa added a commit to sseshasa/ceph that referenced this pull request May 16, 2023
…tion

This is a follow-up to PR: ceph#48703.
Modify the mon caps to allow OSDs to run the "config rm" command with
restriction to remove only the following config keys from the mon store:
- osd_max_backfills
- osd_recovery_max_active(.*)
  - osd_recovery_max_active and
  - osd_recovery_max_active_(hdd|ssd)
- osd_mclock_scheduler_(.*) -> all the QoS specific config options.

The above is similar to the change in mon caps to run the "config set"
command as implemented in PR: ceph#42853.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
sseshasa added a commit to sseshasa/ceph that referenced this pull request May 18, 2023
…tion

This is a follow-up to PR: ceph#48703.
Modify the mon caps to allow OSDs to run the "config rm" command with
restriction to remove only the following config keys from the mon store:
- osd_max_backfills
- osd_recovery_max_active(.*)
  - osd_recovery_max_active and
  - osd_recovery_max_active_(hdd|ssd)
- osd_mclock_scheduler_(.*) -> all the QoS specific config options.

The above is similar to the change in mon caps to run the "config set"
command as implemented in PR: ceph#42853.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
sseshasa added a commit to sseshasa/ceph that referenced this pull request May 22, 2023
…tion

This is a follow-up to PR: ceph#48703.
Modify the mon caps to allow OSDs to run the "config rm" command with
restriction to remove only the following config keys from the mon store:
- osd_max_backfills
- osd_recovery_max_active(.*)
  - osd_recovery_max_active and
  - osd_recovery_max_active_(hdd|ssd)
- osd_mclock_scheduler_(.*) -> all the QoS specific config options.

The above is similar to the change in mon caps to run the "config set"
command as implemented in PR: ceph#42853.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit 6916008)
sseshasa added a commit to sseshasa/ceph that referenced this pull request May 22, 2023
…tion

This is a follow-up to PR: ceph#48703.
Modify the mon caps to allow OSDs to run the "config rm" command with
restriction to remove only the following config keys from the mon store:
- osd_max_backfills
- osd_recovery_max_active(.*)
  - osd_recovery_max_active and
  - osd_recovery_max_active_(hdd|ssd)
- osd_mclock_scheduler_(.*) -> all the QoS specific config options.

The above is similar to the change in mon caps to run the "config set"
command as implemented in PR: ceph#42853.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit 6916008)
esmaeil-mirvakili pushed a commit to esmaeil-mirvakili/ceph that referenced this pull request May 22, 2023
…tion

This is a follow-up to PR: ceph#48703.
Modify the mon caps to allow OSDs to run the "config rm" command with
restriction to remove only the following config keys from the mon store:
- osd_max_backfills
- osd_recovery_max_active(.*)
  - osd_recovery_max_active and
  - osd_recovery_max_active_(hdd|ssd)
- osd_mclock_scheduler_(.*) -> all the QoS specific config options.

The above is similar to the change in mon caps to run the "config set"
command as implemented in PR: ceph#42853.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
rsacherer pushed a commit to rsacherer/ceph that referenced this pull request May 26, 2023
…tion

This is a follow-up to PR: ceph#48703.
Modify the mon caps to allow OSDs to run the "config rm" command with
restriction to remove only the following config keys from the mon store:
- osd_max_backfills
- osd_recovery_max_active(.*)
  - osd_recovery_max_active and
  - osd_recovery_max_active_(hdd|ssd)
- osd_mclock_scheduler_(.*) -> all the QoS specific config options.

The above is similar to the change in mon caps to run the "config set"
command as implemented in PR: ceph#42853.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
joscollin pushed a commit to joscollin/ceph that referenced this pull request Jul 31, 2023
…tion

This is a follow-up to PR: ceph#48703.
Modify the mon caps to allow OSDs to run the "config rm" command with
restriction to remove only the following config keys from the mon store:
- osd_max_backfills
- osd_recovery_max_active(.*)
  - osd_recovery_max_active and
  - osd_recovery_max_active_(hdd|ssd)
- osd_mclock_scheduler_(.*) -> all the QoS specific config options.

The above is similar to the change in mon caps to run the "config set"
command as implemented in PR: ceph#42853.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
yuvalif pushed a commit to yuvalif/ceph that referenced this pull request Aug 14, 2023
…tion

This is a follow-up to PR: ceph#48703.
Modify the mon caps to allow OSDs to run the "config rm" command with
restriction to remove only the following config keys from the mon store:
- osd_max_backfills
- osd_recovery_max_active(.*)
  - osd_recovery_max_active and
  - osd_recovery_max_active_(hdd|ssd)
- osd_mclock_scheduler_(.*) -> all the QoS specific config options.

The above is similar to the change in mon caps to run the "config set"
command as implemented in PR: ceph#42853.

Fixes: https://tracker.ceph.com/issues/61155
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
(cherry picked from commit 6916008)
(cherry picked from commit 431e3ed)

Resolves: rhbz#2124137
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants