Skip to content

mgr/cephadm: rework osd removals/replacements#36151

Merged
sebastian-philipp merged 5 commits intoceph:masterfrom
jschmid1:osd_support_reweight_impl
Jul 31, 2020
Merged

mgr/cephadm: rework osd removals/replacements#36151
sebastian-philipp merged 5 commits intoceph:masterfrom
jschmid1:osd_support_reweight_impl

Conversation

@jschmid1
Copy link
Contributor

@jschmid1 jschmid1 commented Jul 17, 2020

Fixes: https://tracker.ceph.com/issues/44548
Fixes: https://tracker.ceph.com/issues/45594

Also removes the osd_support module to simplify the removal logic.

  • Added ceph orch osd rm stop
  • Added more info in ceph orch osd rm status


# ceph orch osd rm 1

Scheduled OSD(s) for removal

# ceph orch osd rm 2 --replace

Scheduled OSD(s) for removal

# ceph orch osd rm 3 --force

Scheduled OSD(s) for removal

# ceph orch osd rm status

OSD_ID  HOST         STATE                    PG_COUNT  REPLACE  FORCE  STARTED_AT
1       cephadm-dev  done, waiting for purge  40        False    False  2020-07-17 13:01:44.134509
2       cephadm-dev  done, waiting for purge  9         True     False  2020-07-17 13:01:44.147684
3       cephadm-dev  draining                 17        False    True   2020-07-17 13:01:44.162158

# ceph orch osd rm stop 2

Stopped OSD(s) removal

# ceph orch osd rm status

OSD_ID  HOST         STATE                    PG_COUNT  REPLACE  FORCE  STARTED_AT
1       cephadm-dev  done, waiting for purge  40        False    False  2020-07-17 13:01:44.134509
3       cephadm-dev  draining                 17        False    True   2020-07-17 13:01:44.162158

TODO:

  • React to osd_stats changes using the notify() mechanism

Signed-off-by: Joshua Schmid jschmid@suse.de

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard backend
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@jschmid1 jschmid1 force-pushed the osd_support_reweight_impl branch 2 times, most recently from a279e10 to 6b1d0ad Compare July 17, 2020 13:19
@sebastian-philipp
Copy link
Contributor

________________ ERROR collecting cephadm/tests/test_cephadm.py ________________
cephadm/tests/test_cephadm.py:9: in <module>
    from cephadm.services.osd import OSDRemoval
E   ImportError: cannot import name 'OSDRemoval'
________________ ERROR collecting cephadm/tests/test_cephadm.py ________________
ImportError while importing test module '/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/cephadm/tests/test_cephadm.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
cephadm/tests/test_cephadm.py:9: in <module>
    from cephadm.services.osd import OSDRemoval
E   ImportError: cannot import name 'OSDRemoval'

https://jenkins.ceph.com/job/ceph-pull-requests/55815/consoleFull#-524050101e840cee4-f4a4-4183-81dd-42855615f2c1

@jschmid1
Copy link
Contributor Author

________________ ERROR collecting cephadm/tests/test_cephadm.py ________________
cephadm/tests/test_cephadm.py:9: in <module>
    from cephadm.services.osd import OSDRemoval
E   ImportError: cannot import name 'OSDRemoval'
________________ ERROR collecting cephadm/tests/test_cephadm.py ________________
ImportError while importing test module '/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/cephadm/tests/test_cephadm.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
cephadm/tests/test_cephadm.py:9: in <module>
    from cephadm.services.osd import OSDRemoval
E   ImportError: cannot import name 'OSDRemoval'

https://jenkins.ceph.com/job/ceph-pull-requests/55815/consoleFull#-524050101e840cee4-f4a4-4183-81dd-42855615f2c1

currently in the process of working on the tests. note that this is a draft pullrequest

@jschmid1 jschmid1 force-pushed the osd_support_reweight_impl branch 5 times, most recently from 2275ad5 to 4485daf Compare July 23, 2020 10:55
@jschmid1 jschmid1 marked this pull request as ready for review July 23, 2020 12:03
@jschmid1 jschmid1 requested a review from a team as a code owner July 23, 2020 12:03
@sebastian-philipp
Copy link
Contributor

sigh. would be easier to review, if there were multiple commits in this PR

@jschmid1 jschmid1 force-pushed the osd_support_reweight_impl branch 5 times, most recently from 9019f20 to 00f6903 Compare July 24, 2020 13:53
@jschmid1 jschmid1 force-pushed the osd_support_reweight_impl branch from 00f6903 to 8261c9c Compare July 28, 2020 07:01
@jschmid1
Copy link
Contributor Author

jenkins render docs

@ceph-jenkins
Copy link
Collaborator

Doc render available at http://docs.ceph.com/ceph-prs/36151/

@jschmid1
Copy link
Contributor Author

@jschmid1
Copy link
Contributor Author

jenkins test make check

@sebastian-philipp sebastian-philipp added the wip-swagner-testing My Teuthology tests label Jul 28, 2020
@sebastian-philipp
Copy link
Contributor

@sebastian-philipp
Copy link
Contributor

        cp --reflink=auto -a debian/tmp/usr/share/ceph/mgr/diskprediction_cloud debian/ceph-mgr-diskprediction-cloud//usr/share/ceph/mgr/
dh_install: warning: Cannot find (any matches for) "usr/share/ceph/mgr/osd_support" (tried in ., debian/tmp)

dh_install: warning: ceph-mgr-modules-core missing files: usr/share/ceph/mgr/osd_support
        install -d debian/.debhelper/generated/ceph-mgr-diskprediction-cloud
        install -d debian/ceph-mgr-modules-core//usr/share/ceph/mgr
        cp --reflink=auto -a debian/tmp/usr/share/ceph/mgr/alerts debian/tmp/usr/share/ceph/mgr/balancer debian/tmp/usr/share/ceph/mgr/crash debian/tmp/usr/share/ceph/mgr/devicehealth debian/tmp/usr/share/ceph/mgr/influx debian/tmp/usr/share/ceph/mgr/insights de
bian/tmp/usr/share/ceph/mgr/iostat debian/tmp/usr/share/ceph/mgr/localpool debian/tmp/usr/share/ceph/mgr/orchestrator debian/tmp/usr/share/ceph/mgr/osd_perf_query debian/ceph-mgr-modules-core//usr/share/ceph/mgr/
dh_install: error: missing files, aborting
make: *** [debian/rules:45: binary] Error 25
dpkg-buildpackage: error: fakeroot debian/rules binary subprocess returned exit status 2
E: Failed autobuilding of package
I: unmounting dev/ptmx filesystem
I: unmounting dev/pts filesystem
I: unmounting dev/shm filesystem
I: unmounting proc filesystem
I: unmounting sys filesystem
I: cleaning the build env 
I: removing directory /var/cache/pbuilder/build/15188 and its subdirectories
+ rm -fr /tmp/install-deps.14837

https://jenkins.ceph.com/job/ceph-dev-new-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=focal,DIST=focal,MACHINE_SIZE=gigantic/43791//consoleFull

RPM will likely fail, too: https://shaman.ceph.com/builds/ceph/wip-swagner-testing-2020-07-28-1211/3b57d22029fee37a1a4eca44065089f9847e4b5b/default/220075/

@sebastian-philipp sebastian-philipp removed the wip-swagner-testing My Teuthology tests label Jul 28, 2020
@jschmid1
Copy link
Contributor Author

RPM will likely fail, too: https://shaman.ceph.com/builds/ceph/wip-swagner-testing-2020-07-28-1211/3b57d22029fee37a1a4eca44065089f9847e4b5b/default/220075/

RPMs seem to build just fine. I'll dig into the deb part

Copy link

@Devp00l Devp00l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested it, it works fine, but I get a strange result when passing it to JSON.

# ceph orch osd rm status json     
["{\"osd_id\": 10, \"started\": true, \"draining\": true, \"stopped\": false, \"replace\": true, \"force\": false, \"nodename\": \"osd0\", \"drain_started_at\": \"2020-07-28T14:55:51.173164\", \"drain_stopped_at\": null, \"drain_done_at\": null, \"process_started_at\": \"2020-07-28T14:55:46.989791\"}"]

# ceph orch osd rm status json | jq
[
  "{\"osd_id\": 10, \"started\": true, \"draining\": true, \"stopped\": false, \"replace\": true, \"force\": false, \"nodename\": \"osd0\", \"drain_started_at\": \"2020-07-28T14:55:51.173164\", \"drain_stopped_at\": null, \"drain_done_at\": null, \"process_started_at\": \"2020-07-28T14:55:46.989791\"}"
]

But what I actually wanted to see is this:

# ceph orch osd rm status json | sed 's/"{/{/;s/}"/}/;s/\\"/"/g' | jq
[
  {
    "osd_id": 10,
    "started": true,
    "draining": true,
    "stopped": false,
    "replace": true,
    "force": false,
    "nodename": "osd0",
    "drain_started_at": "2020-07-28T14:55:51.173164",
    "drain_stopped_at": null,
    "drain_done_at": null,
    "process_started_at": "2020-07-28T14:55:46.989791"
  }
]

@jschmid1
Copy link
Contributor Author

@Devp00l

def to_format(what, format: str, many: bool, cls):
def to_json_1(obj):
if hasattr(obj, 'to_json'):
return obj.to_json()
return obj
def to_json_n(objs):
return [to_json_1(o) for o in objs]
to_json = to_json_n if many else to_json_1

Is responsible for creating json objects. I'm passing List[OSD] in and this is what we get. I agree though, the formatting is pretty much broken

@jschmid1
Copy link
Contributor Author

@sebastian-philipp
Copy link
Contributor

https://pulpito.ceph.com/jschmid-2020-07-29_07:09:49-rados:cephadm-wip-jschmid1-testing-2020-07-28-1453-distro-basic-smithi/

can you run a cephadm smoke test as well? orchestrator_cli doesn't use cephadm

@jschmid1
Copy link
Contributor Author

https://pulpito.ceph.com/jschmid-2020-07-29_07:09:49-rados:cephadm-wip-jschmid1-testing-2020-07-28-1453-distro-basic-smithi/

can you run a cephadm smoke test as well? orchestrator_cli doesn't use cephadm

right, right.. I used the wrong command from my history :>

https://pulpito.ceph.com/jschmid-2020-07-29_12:37:09-rados:cephadm-wip-jschmid1-testing-2020-07-28-1453-distro-basic-smithi/

@jschmid1
Copy link
Contributor Author

1/31 failed with:

2020-07-29T13:02:46.631 INFO:teuthology.orchestra.run.smithi140:> sudo DEBIAN_FRONTEND=noninteractive apt-get -y install linux-image-current-generic
2020-07-29T13:02:46.678 INFO:teuthology.orchestra.run.smithi140.stderr:E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable)
2020-07-29T13:02:46.679 INFO:teuthology.orchestra.run.smithi140.stderr:E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?
2020-07-29T13:02:46.680 DEBUG:teuthology.orchestra.run:got remote process result: 100

which seems unrelated

Joshua Schmid added 2 commits July 31, 2020 10:11
@jschmid1 jschmid1 force-pushed the osd_support_reweight_impl branch from 9c436dc to d83f7b2 Compare July 31, 2020 08:15
Joshua Schmid added 3 commits July 31, 2020 10:23
Signed-off-by: Joshua Schmid <jschmid@suse.de>
Signed-off-by: Joshua Schmid <jschmid@suse.de>
Signed-off-by: Joshua Schmid <jschmid@suse.de>
@sebastian-philipp
Copy link
Contributor

Comment on lines +608 to +623
def to_json(self) -> str:
out = dict()
out['osd_id'] = self.osd_id
out['started'] = self.started
out['draining'] = self.draining
out['stopped'] = self.stopped
out['replace'] = self.replace
out['force'] = self.force
out['nodename'] = self.nodename # type: ignore

for k in ['drain_started_at', 'drain_stopped_at', 'drain_done_at', 'process_started_at']:
if getattr(self, k):
out[k] = getattr(self, k).strftime(DATEFMT)
else:
out[k] = getattr(self, k)
return json.dumps(out)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def to_json(self) -> str:
out = dict()
out['osd_id'] = self.osd_id
out['started'] = self.started
out['draining'] = self.draining
out['stopped'] = self.stopped
out['replace'] = self.replace
out['force'] = self.force
out['nodename'] = self.nodename # type: ignore
for k in ['drain_started_at', 'drain_stopped_at', 'drain_done_at', 'process_started_at']:
if getattr(self, k):
out[k] = getattr(self, k).strftime(DATEFMT)
else:
out[k] = getattr(self, k)
return json.dumps(out)
def to_json(self) -> dict:
out = dict()
out['osd_id'] = self.osd_id
out['started'] = self.started
out['draining'] = self.draining
out['stopped'] = self.stopped
out['replace'] = self.replace
out['force'] = self.force
out['nodename'] = self.nodename # type: ignore
for k in ['drain_started_at', 'drain_stopped_at', 'drain_done_at', 'process_started_at']:
if getattr(self, k):
out[k] = getattr(self, k).strftime(DATEFMT)
else:
out[k] = getattr(self, k)
return out

Let's keep the types aligned with the rest of the orchestrator. This is the reason, the output looks so wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, right! Let's fix this with a followup PR

@sebastian-philipp
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants