Skip to content

mgr/cephadm:Enable cephadm device scan to use libstoragemgmt#39599

Merged
sebastian-philipp merged 2 commits intoceph:masterfrom
pcuzner:cephadm-with-lsm
Mar 10, 2021
Merged

mgr/cephadm:Enable cephadm device scan to use libstoragemgmt#39599
sebastian-philipp merged 2 commits intoceph:masterfrom
pcuzner:cephadm-with-lsm

Conversation

@pcuzner
Copy link
Contributor

@pcuzner pcuzner commented Feb 22, 2021

Using libstoragemgmt (LSM) in ceph-volume was disabled by default,
(nov 2020) which meant cephadm's inventory never had a way to
request the LSM data. This patch adds a module option called
'device_enhanced_scan' (bool), that if set will append the
--with-lsm parameter to the ceph-volume inventory call.

Signed-off-by: Paul Cuzner pcuzner@redhat.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@pcuzner pcuzner requested a review from a team as a code owner February 22, 2021 01:37
@sebastian-philipp
Copy link
Contributor

#38149

Copy link
Member

@tserong tserong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. This means sites that don't have the crusty old hardware mentioned in #38149 have a way of getting extra metadata from LSM

@dmick
Copy link
Member

dmick commented Feb 23, 2021

Seems okayish to me with caveats, as I said to Paul in private email; it seems to me that we ought to, at a minimum:

  1. be very very careful about how we document this, since it can wedge an entire server immediately if things go wrong, and perhaps
  2. even supply a small test program (a little python snippet is plenty; I may even have put it in the original bug) to allow people to test an offline server, not running ceph, or at least not involving it, to find out if they are going to have issues.

I had also suggested looking for a way to use sg directly (or more likely through libsgutils2.so.2), but I can't find any healthy Python bindings, or even documentation (other than the header files). The only advantage is trial-by-fire; sg has been through a lot of buggy hardware, and with SCSI that's a lot of hardware.

I know it may seem overcautious, but the consequences are large, and the affected hardware is really unknown. We've seen one instance but that definitely doesn't mean there aren't others (and I've got past experience at Sun with a lot of others).

@pcuzner
Copy link
Contributor Author

pcuzner commented Feb 24, 2021

@dmick the test could even be cephadm itself - i.e. cephadm ceph-volume inventory --with-lsm --format=json

@dmick
Copy link
Member

dmick commented Feb 24, 2021

Yeah, probably so

@sebastian-philipp
Copy link
Contributor

@tasleson
Copy link

@pcuzner @dmick @tbzatek

I had also suggested looking for a way to use sg directly (or more likely through libsgutils2.so.2), but I can't find any healthy
Python bindings, or even documentation (other than the header files). The only advantage is trial-by-fire; sg has been through a lot of buggy hardware, and with SCSI that's a lot of hardware.

If you want the benefit of trial-by-fire with regards to sg utils I believe you need to leverage the sg command line utilities. For example, looking at the header files for issuing an inquiry command sg_ll_inquiry the caller needs to specify the return buffer size. If you look at the sg_inq utility source code there is a good comment on the approach one should take when issuing an inquiry, e.g. initial inquiry should be limited to 36 bytes. Thus the acquired knowledge of safely talking to devices appears lost if one utilizes the sg library calls.

@dmick
Copy link
Member

dmick commented Feb 25, 2021

@tasleson good point. Perhaps opt-in with libstoragemgmt is our safest option at this point.

@pcuzner
Copy link
Contributor Author

pcuzner commented Mar 2, 2021

@sebastian-philipp where should this be documented? I was thinking cephadm/osd.rst ?

@sebastian-philipp
Copy link
Contributor

@sebastian-philipp where should this be documented? I was thinking cephadm/osd.rst ?

yeah, maybe somewhere around https://docs.ceph.com/en/latest/cephadm/osd/#list-devices ?

pcuzner added 2 commits March 3, 2021 09:24
Using libstoragemgmt (LSM) in ceph-volume was disabled by default,
(nov 2020) which meant cephadm's inventory never had a way to
request the LSM data. This patch adds a module option called
'device_enhanced_scan' (bool), that if set will append the
--with-lsm parameter to the ceph-volume inventory call.

Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
Updates the cephadm osd documentation to include details about
including integration with libstoragement - including the potential
hardware issues that may arise.

Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
@pcuzner
Copy link
Contributor Author

pcuzner commented Mar 3, 2021

@sebastian-philipp @dmick docs added to cover the benefit and potential issues that turning libstoragemgmt introduces.

@pcuzner pcuzner requested a review from tserong March 3, 2021 00:17
@sebastian-philipp sebastian-philipp added the wip-swagner-testing My Teuthology tests label Mar 5, 2021
@sebastian-philipp
Copy link
Contributor

@sebastian-philipp sebastian-philipp merged commit d93d1ed into ceph:master Mar 10, 2021
@pcuzner pcuzner deleted the cephadm-with-lsm branch March 11, 2021 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants