Skip to content

[RFC] ServiceSpec validation health check#36393

Closed
mgfritch wants to merge 3 commits intoceph:masterfrom
mgfritch:cephadm-warn-spec-valid
Closed

[RFC] ServiceSpec validation health check#36393
mgfritch wants to merge 3 commits intoceph:masterfrom
mgfritch:cephadm-warn-spec-valid

Conversation

@mgfritch
Copy link
Contributor

@mgfritch mgfritch commented Jul 31, 2020

report a health check error when a ServiceSpec contained in the spec store fails validation

$ ceph -s
  cluster:
    id:     85c196c3-e837-4c32-8aa6-cd0c2e6870d6
    health: HEALTH_ERR
            1 service spec(s) are not valid
            10 stray daemons(s) not managed by cephadm
 
  services:
    mon: 3 daemons, quorum a,b,c (age 17h)
    mgr: x(active, since 7m)
    mds: a:1 {0=a=up:active} 3 up:standby
    osd: 3 osds: 3 up (since 17h), 3 in (since 17h)
 
  data:
    pools:   3 pools, 65 pgs
    objects: 23 objects, 111 KiB
    usage:   6.0 GiB used, 297 GiB / 303 GiB avail
    pgs:     65 active+clean
 
$ ceph health detail
HEALTH_ERR 1 service spec(s) are not valid; 10 stray daemons(s) not managed by cephadm
[ERR] CEPHADM_SPEC_VALIDATE: 1 service spec(s) are not valid
    ServiceSpec nfs.foo is not valid: Cannot add NFS: No Pool specified
[WRN] CEPHADM_STRAY_DAEMON: 10 stray daemons(s) not managed by cephadm
    stray daemon mds.a on host host3 not managed by cephadm
    stray daemon mds.b on host host3 not managed by cephadm
    stray daemon mds.c on host host3 not managed by cephadm
    stray daemon mgr.x on host host3 not managed by cephadm
    stray daemon mon.a on host host3 not managed by cephadm
    stray daemon mon.b on host host3 not managed by cephadm
    stray daemon mon.c on host host3 not managed by cephadm
    stray daemon osd.0 on host host3 not managed by cephadm
    stray daemon osd.1 on host host3 not managed by cephadm
    stray daemon osd.2 on host host3 not managed by cephadm

Signed-off-by: Michael Fritch mfritch@suse.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard backend
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

mgfritch added 3 commits July 31, 2020 08:04
report a health check error when a ServiceSpec contained in the spec
store fails validation

Signed-off-by: Michael Fritch <mfritch@suse.com>
- allow ctor of a ServiceSpec to succeed
- explicitly validate the spec during spec_store save
  (apply service, migrate, etc.)

Signed-off-by: Michael Fritch <mfritch@suse.com>
@mgfritch
Copy link
Contributor Author

jenkins test make check

Comment on lines -70 to +74
self.spec.validate()
try:
self.spec.validate()
except Exception as e:
raise ServiceSpecValidationError(str(e))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any thoughts how to better pickle these exceptions so there in not a need to catch/rethrow?

@stale
Copy link

stale bot commented Oct 22, 2020

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added the stale label Oct 22, 2020
@sebastian-philipp
Copy link
Contributor

unstale

@stale stale bot removed the stale label Dec 9, 2020
@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@sebastian-philipp
Copy link
Contributor

ping?

spec_errors = []
for service_name, spec in self.spec_store.specs.items():
try:
spec.validate()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from ceph.deplo... import ServiceSpec

ServiceSpec.validate(spec)

should circumvent the sub-interpereter issues

@stale
Copy link

stale bot commented Jul 21, 2021

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added the stale label Jul 21, 2021
@sebastian-philipp
Copy link
Contributor

unstale

@stale stale bot removed the stale label Jul 21, 2021
@sebastian-philipp
Copy link
Contributor

Actually I'm -1 here. In case we have invalid specs in our store, we should bail out and let users report a bug instead of adding a new kind of health warning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants