Skip to content

osd: add --osdspec-affinity flag#34835

Merged
tchaikov merged 2 commits intoceph:masterfrom
jschmid1:osdspec_affinity
May 28, 2020
Merged

osd: add --osdspec-affinity flag#34835
tchaikov merged 2 commits intoceph:masterfrom
jschmid1:osdspec_affinity

Conversation

@jschmid1
Copy link
Contributor

@jschmid1 jschmid1 commented Apr 29, 2020

This is useful for tracking osds when using OSDSpecs in cephadm
and later on Rook.
Please see
https://docs.ceph.com/docs/master/cephadm/drivegroups/#osd-service-specification

for details

This patch is needed by other pull-requests:

Signed-off-by: Joshua Schmid jschmid@suse.de

@jschmid1
Copy link
Contributor Author

@jdurgin
Copy link
Member

jdurgin commented May 1, 2020

will the docs for osdspec affinity in general be updated in a separate PR? the man page for this (doc/man/8/ceph-osd.rst) could be updated with the new option

This is useful for tracking osds when using OSDSpecs in cephadm
and later on Rook.
Please see
https://docs.ceph.com/docs/master/cephadm/drivegroups/#osd-service-specification

for details

Signed-off-by: Joshua Schmid <jschmid@suse.de>
@jschmid1 jschmid1 force-pushed the osdspec_affinity branch from 72e34ac to 1b44b67 Compare May 4, 2020 12:19
@jschmid1
Copy link
Contributor Author

jschmid1 commented May 4, 2020

will the docs for osdspec affinity in general be updated in a separate PR? the man page for this (doc/man/8/ceph-osd.rst) could be updated with the new option

Once we agreed on naming and usage, I'll a separate doc PR that can be merged together with all the relevant patches
edit: added it to this PR.

@jschmid1 jschmid1 requested a review from jdurgin May 4, 2020 12:51
Signed-off-by: Joshua Schmid <jschmid@suse.de>
@jschmid1
Copy link
Contributor Author

jschmid1 commented May 4, 2020

jenkins test make check

@jschmid1
Copy link
Contributor Author

jschmid1 commented May 5, 2020

@jschmid1
Copy link
Contributor Author

jschmid1 commented May 7, 2020

@tchaikov Anything that would hold us back from merging this?

@tchaikov
Copy link
Contributor

tchaikov commented May 7, 2020

@tchaikov Anything that would hold us back from merging this?

nothing specific. if you are sure that the failures in http://pulpito.ceph.com/jschmid-2020-05-05_06:55:50-rados-wip-jschmid1-testing-2020-05-04-1738-distro-basic-smithi/ are not related, i think it's good to go since Josh has approved this PR.

my test batch has lots of noises http://pulpito.ceph.com/kchai-2020-05-05_13:45:45-rados-wip-kefu-testing-2020-05-05-1333-distro-basic-smithi/ , i am still waiting ceph/teuthology#1465 to get pulled by the cron job and rerun the failed tests..

@jschmid1
Copy link
Contributor Author

jschmid1 commented May 7, 2020

while it is fishy that the osds are not coming back after an upgrade, I can't immediately see why this would be related to this patch:

2020-05-05T07:20:50.154 INFO:ceph.mon.b.smithi163.stdout:May 05 07:20:49 smithi163 bash[21839]: cluster 2020-05-05T07:20:48.612246+0000 mgr.x (mgr.34547) 107 : cluster [DBG] pgmap v47: 1 pgs: 1 active+undersized+degraded; 0 B data, 4.1 MiB used, 707 GiB / 715 GiB avail; 1/3 objects degraded (33.333%)
2020-05-05T07:20:50.851 INFO:ceph.osd.0.smithi112.stdout:May 05 07:20:50 smithi112 bash[453]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
2020-05-05T07:20:50.852 INFO:ceph.osd.0.smithi112.stdout:May 05 07:20:50 smithi112 bash[453]: Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/vg_nvme/lv_4 --path /var/lib/ceph/osd/ceph-0 --no-mon-config
2020-05-05T07:20:50.852 INFO:ceph.osd.0.smithi112.stdout:May 05 07:20:50 smithi112 bash[453]: Running command: /usr/bin/ln -snf /dev/vg_nvme/lv_4 /var/lib/ceph/osd/ceph-0/block
2020-05-05T07:20:50.852 INFO:ceph.osd.0.smithi112.stdout:May 05 07:20:50 smithi112 bash[453]: Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-0/block
2020-05-05T07:20:50.852 INFO:ceph.osd.0.smithi112.stdout:May 05 07:20:50 smithi112 bash[453]: Running command: /usr/bin/chown -R ceph:ceph /dev/dm-3
2020-05-05T07:20:50.852 INFO:ceph.osd.0.smithi112.stdout:May 05 07:20:50 smithi112 bash[453]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
2020-05-05T07:20:50.853 INFO:ceph.osd.0.smithi112.stdout:May 05 07:20:50 smithi112 bash[453]: --> ceph-volume lvm activate successful for osd ID: 0
2020-05-05T07:20:52.102 INFO:ceph.mon.c.smithi112.stdout:May 05 07:20:51 smithi112 bash[29375]: cluster 2020-05-05T07:20:50.612941+0000 mgr.x (mgr.34547) 108 : cluster [DBG] pgmap v48: 1 pgs: 1 active+undersized+degraded; 0 B data, 4.1 MiB used, 707 GiB / 715 GiB avail; 1/3 objects degraded (33.333%)
2020-05-05T07:20:52.102 INFO:ceph.mon.a.smithi112.stdout:May 05 07:20:51 smithi112 bash[28482]: cluster 2020-05-05T07:20:50.612941+0000 mgr.x (mgr.34547) 108 : cluster [DBG] pgmap v48: 1 pgs: 1 active+undersized+degraded; 0 B data, 4.1 MiB used, 707 GiB / 715 GiB avail; 1/3 objects degraded (33.333%)
2020-05-05T07:20:52.170 INFO:ceph.mon.b.smithi163.stdout:May 05 07:20:51 smithi163 bash[21839]: cluster 2020-05-05T07:20:50.612941+0000 mgr.x (mgr.34547) 108 : cluster [DBG] pgmap v48: 1 pgs: 1 active+undersized+degraded; 0 B data, 4.1 MiB used, 707 GiB / 715 GiB avail; 1/3 objects degraded (33.333%)
2020-05-05T07:20:52.851 INFO:ceph.osd.1.smithi112.stdout:May 05 07:20:52 smithi112 systemd[1]: ceph-529464e0-8e9f-11ea-a068-001a4aab830c@osd.1.service: Failed with result 'exit-code'.
2020-05-05T07:20:53.852 INFO:ceph.mon.a.smithi112.stdout:May 05 07:20:53 smithi112 bash[28482]: cluster 2020-05-05T07:20:52.613646+0000 mgr.x (mgr.34547) 109 : cluster [DBG] pgmap v49: 1 pgs: 1 active+undersized+degraded; 0 B data, 4.1 MiB used, 707 GiB / 715 GiB avail; 1/3 objects degraded (33.333%)
2020-05-05T07:20:53.852 INFO:ceph.mon.c.smithi112.stdout:May 05 07:20:53 smithi112 bash[29375]: cluster 2020-05-05T07:20:52.613646+0000 mgr.x (mgr.34547) 109 : cluster [DBG] pgmap v49: 1 pgs: 1 active+undersized+degraded; 0 B data, 4.1 MiB used, 707 GiB / 715 GiB avail; 1/3 objects degraded (33.333%)
2020-05-05T07:20:54.154 INFO:ceph.mon.b.smithi163.stdout:May 05 07:20:53 smithi163 bash[21839]: cluster 2020-05-05T07:20:52.613646+0000 mgr.x (mgr.34547) 109 : cluster [DBG] pgmap v49: 1 pgs: 1 active+undersized+degraded; 0 B data, 4.1 MiB used, 707 GiB / 715 GiB avail; 1/3 objects degraded (33.333%)
2020-05-05T07:20:54.523 INFO:ceph.osd.0.smithi112.stdout:May 05 07:20:54 smithi112 bash[453]: debug 2020-05-05T07:20:54.189+0000 7f8ebe92dec0 -1 Falling back to public interface
2020-05-05T07:20:54.851 INFO:ceph.osd.0.smithi112.stdout:May 05 07:20:54 smithi112 bash[453]: debug 2020-05-05T07:20:54.517+0000 7f8ebe92dec0 -1 rocksdb: verify_sharding mismatch on sharding. requested = [(L,1,0-,),(O,3,0-13,),(m,3,0-,)] stored = []
2020-05-05T07:20:54.852 INFO:ceph.osd.0.smithi112.stdout:May 05 07:20:54 smithi112 bash[453]: debug 2020-05-05T07:20:54.517+0000 7f8ebe92dec0 -1 bluestore(/var/lib/ceph/osd/ceph-0) _open_db erroring opening db:
2020-05-05T07:20:55.351 INFO:ceph.osd.0.smithi112.stdout:May 05 07:20:55 smithi112 bash[453]: debug 2020-05-05T07:20:55.081+0000 7f8ebe92dec0 -1 osd.0 0 OSD:init: unable to mount object store
2020-05-05T07:20:55.352 INFO:ceph.osd.0.smithi112.stdout:May 05 07:20:55 smithi112 bash[453]: debug 2020-05-05T07:20:55.081+0000 7f8ebe92dec0 -1  ** ERROR: osd init failed: (5) Input/output error

@jschmid1
Copy link
Contributor Author

jschmid1 commented May 7, 2020

I do remember that we caught a bluestore/rocksdb bug during osd upgrades. See https://tracker.ceph.com/issues/45335

This looks very much like the issue we're seeing here.

@jschmid1
Copy link
Contributor Author

any news?

@jschmid1
Copy link
Contributor Author

ping @tchaikov ?

@tchaikov
Copy link
Contributor

jschmid1 pushed a commit to jschmid1/ceph that referenced this pull request May 12, 2020
Revert this commit when ceph#34835 is
merged.

Signed-off-by: Joshua Schmid <jschmid@suse.de>
@jschmid1
Copy link
Contributor Author

ping @tchaikov ?

@tchaikov
Copy link
Contributor

@jschmid1 i am still struggling with the test failures:

http://pulpito.ceph.com/kchai-2020-05-17_05:02:01-rados-wip-kefu-testing-2020-05-15-0016-distro-basic-smithi/

i am hence removing wip-kefu-testing label, hopefully this can unblock you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants