ceph-volume add device_id to inventory listing by jan--f · Pull Request #25201 · ceph/ceph

jan--f · 2018-11-21T12:53:13Z

This PR changes the device_id implementation from using ID_SERIAL to ID_MODEL + _ + ID_SERIAL_SHORT (1) and adds an equivalent python implementation to ceph-volume (2).

The reasoning is that the new implementation seem more reliable. ID_SERIAL often contains the model and serial, but sometimes only the serial.
The python implementation is not equivalent yet for the FreeBSD case. In C++ this uses sysfs and ioctl with a (Free?)BSD specific constant. I'm not sure yet of the same call can be made with python's ioctl implementation. If so this should also be a RFC if we should move this to python/ceph-volume and call it from C++ to avoid the duplicate implementation.

Fixes: https://tracker.ceph.com/issues/37083

This is hopefully more stable then using ID_SERIAL. While often ID_SERIAL contains the model name and serial, in some cases it only contains something close to the actual serial number. Signed-off-by: Jan Fajerski <jfajerski@suse.com>

ErwanAliasr1 · 2018-11-21T13:02:40Z

src/ceph-volume/ceph_volume/util/device.py

        output['lvs'] = [lv.report() for lv in self.lvs]
        return output

+    def _get_device_id(self):


I would suggest adding a comment explaining the fact we mimic the blkdev implementation so IDs collected in the inventory can be used in inside ceph.

Right...lets see where this is going in regards to duplicate implementations.

leseb · 2018-11-21T13:04:57Z

src/ceph-volume/ceph_volume/util/disk.py

+    Optionally pass a list of properties to return. A requested property might
+    not be returned if not present.
+    """
+    cmd = ['udevadm', 'info', '--query=property', device]


Something good to keep in mind for containers, this will only work if the container is privileged (--privileged).

Actually not really, my container setup (highly privileged) returns only a few fields and not the one you're looking for:

[root@60a8b848d66e /]# udevadm info --query=property /dev/sdb DEVNAME=/dev/sdb DEVPATH=/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:1:0/0:1:0:1/block/sdb DEVTYPE=disk MAJOR=8 MINOR=16 SUBSYSTEM=block

Ok answering my own question :)
We must bindmount /var/run/udev/ to get what we want, see -v /var/run/udev/:/var/run/udev/

Right. I kept the implementation close to what c++ land was doing. I think something based on /sys/block/$dev/ would also work.

Also afaiu its yet unclear if and when we can run Ceph in unpreviliged containers.

I don't think we can achieve non-privileged containers yet, but if we can retry the number of bindmount to apply by doing a "walkpath" in /sys approach that'd be easier for everyone using containers I believe :).

ceph-volume has to require super user permissions in a ton of calls (look for @needs_root decorator in class methods). Not only because LVM requires it, but also because there are a lot of utilities that need it. I doubt this can be simplified for only a subset of paths

IMO we should rely on udev here. Getting the same info ourselves out of /sys is complex and error-prone--that's why we are using udev in the first place.

I have put in a fair bit of thought to running Ceph without privileged containers, and my thoughts have been that in the short-term, privileged containers are the best option for getting main functionality in place initially. Once things are working well, paring down privileges in containers are still extremely valuable optimizations to make.

Thoughts I have around OSDs in Rook are that the relatively short-lived rook-ceph-osd-prepare-... pods in rook are at less risk for security concerns due to privilege than are the constantly-running rook-ceph-osd-<id> pods. If ceph-volume can be limited to the prepare pod (which I believe is the intent), focusing effort on reducing permissions in the osd pods themselves is the more important place to focus on privilege needs. OSD daemons also likely represent the largest percentage of running Ceph daemons by far, so to my reasoning, security enhancements to the OSD daemons have the biggest return on investment for developer time spent.

A related writeup of my now probably outdated investigation as pertains to Rook is here: rook/rook#1944

alfredodeza · 2018-11-21T14:46:06Z

src/ceph-volume/ceph_volume/util/disk.py

+    out, _err, _rc = process.call(cmd)
+    ret = {}
+    for line in out:
+        p, v = line.split('=', 1)


this should catch a TypeError as that can be the case when the split doesn't have the two items being assigned here.

I would recommend splitting the system call with a parser, similar to what we've done with blkid and lsblk because it is easier to test the parsing vs. mixing it with the system call (this PR needs a few of those to ensure that this function is working as expected).

Curious that you have implemented this, I'm working on removing partitions as part of lvm zap which would need udevadm, here is where I left that implementation at:

def udevadm_info(device): """ Capture device information with udevadm, useful in situations where blkid and lsblk aren't producing enough useful output. """ command = ['udevadm', 'info', '--query=property', '--name', device] out, err, rc = process.call(command) return _udevadm_parser(out) def _udevadm_parser(output): """ Parses the output from a udevadm call, requires output to be produced using the ``info``, specifying the actual --query type, to remove prefixes in the output. Expected output:: # udevadm info --query=property --name /dev/sdb DEVLINKS=/dev/disk/by-path/pci-0000:00:14.0-scsi-0:0:3:0 DEVNAME=/dev/sdb DEVPATH=/devices/pci0000:00/0000:00:14.0/host3/target3:0:3/3:0:3:0/block/sdb DEVTYPE=disk ID_BUS=scsi ID_MODEL=HARDDISK ID_MODEL_ENC=HARDDISK\x20\x20\x20\x20\x20\x20\x20\x20 ID_PART_TABLE_TYPE=gpt ID_PART_TABLE_UUID=bea7a874-c406-46b2-98bd-414f062f2eb5 ID_PATH=pci-0000:00:14.0-scsi-0:0:3:0 ID_PATH_TAG=pci-0000_00_14_0-scsi-0_0_3_0 ID_REVISION=1.0 ID_SCSI=1 ID_TYPE=disk ID_VENDOR=VBOX ID_VENDOR_ENC=VBOX\x20\x20\x20\x20 MAJOR=8 ... """ processed = {} for line in output: try: key, value = line.split('=') except ValueError: continue processed[key] = value return processed

I would strongly argue against catching an error here.

It's highly unlikely the output of udevadm is ever going to change.

You propose to silently eat an error in the udevadm output. What is the benefit of that? Fail early is imho much better here as this is an error (udevadm giving output that is not dividable at a =) that seems impossible to recover from. Catching exceptions is meant for error handling, not error hiding.

The output of udevadm might not change at all for a while, but the parser must be able to withstand lines that don't comply exactly to what it needs.

There is no actual error eating that is silent here, as all system calls that fail are logged to the configured file (would be caused by a non-zero exit status). This is one of the benefits of implementing a parser separately: it doesn't care about error handling from a non-zero exit status, makes testing far easier (just a list of strings is needed).

If we wanted to fail hard when udevadm is not giving us exactly what we need because of incomplete/invalid data or a non-zero exit status we could improve the handling in the function that does the system call.

My example is crufty (work in progress) and you are catching nits that need to be improved for sure! The main idea is to separate the concerns for easier/better unit testing.

My 2 cents:

My reading of the code posted by @alfredodeza continuing after ValueError is that I would expect that code to silently eat the failure if the output changes, which also seems a bad idea to me. If that is not the actual behavior of the code, then it is obscured from me, and I believe that section can be improved to be more clear about the real-world behavior.

Separation of concerns (i.e., parsing output separate from calling udevadm) does seem like a good thing to do for unit testing, and I like that the expected output is spelled out clearly in Alfredo's example. It may be verbose, but as a newcomer to this codebase, it's incredibly helpful to me.

Allowing the parser to withstand lines that don't exactly match the expected input seems a double-edged sword to me. It may be able to continue functioning in the case of some output change, and it also may begin to function erroneously in an unexpected way in such a case. In my opinion, matching the line exactly is a better solution, as it will be much easier to debug an error at the location of the parser failure than to debug an error which is caused later in the program's lifetime due to an invalid value being parsed and passed on.

@BlaineEXE I think we are in agreement :) Like I mentioned, the patch should not be taken as final/production quality because it is a work-in-progress on another branch that hasn't gone out for review.

I agree that parsing shouldn't eat things up (although failed udev calls are logged), and logging them sounds reasonable.

In short, my ask is to separate concerns for better unit testing, just like the other system calls/functions that exist in that module.

The output of udevadm might not change at all for a while, but the parser must be able to withstand lines that don't comply exactly to what it needs.

There is no actual error eating that is silent here, as all system calls that fail are logged to the configured file (would be caused by a non-zero exit status). This is one of the benefits of implementing a parser separately: it doesn't care about error handling from a non-zero exit status, makes testing far easier (just a list of strings is needed).

@alfredodeza We are starting to mix issues here. A non-zero exit status is (and should not) be handled here. That is what process.call does. The issue here is udevadm returning something other then $KEY=$VALUE with a zero exit status. My code will raise a ValueError (in case split returns a list of length 1...i.e. the output has no =) and I think its bad to catch this and not handle the error. If we could handle the error so the code can continue I'd be all for it, but I don't see a way to handle an unknown udevadm return format.

If we wanted to fail hard when udevadm is not giving us exactly what we need because of incomplete/invalid data or a non-zero exit status we could improve the handling in the function that does the system call.

How and why would process.call care about the output format of udevadm? We could add a function parameter to call so it does call an injected parse method, but that would not change the fact, that we can not recover from udevadm returning an unknown format.

@BlaineEXE sees my point I think:

Allowing the parser to withstand lines that don't exactly match the expected input seems a double-edged sword to me. It may be able to continue functioning in the case of some output change, and it also may begin to function erroneously in an unexpected way in such a case. In my opinion, matching the line exactly is a better solution, as it will be much easier to debug an error at the location of the parser failure than to debug an error which is caused later in the program's lifetime due to an invalid value being parsed and passed on.

Just split it into two parts: a parser, and a CLI handler, like the rest of the module. This is easier to test. I'll leave the other details on error handling to you, happy to review that.

Seems like I caused more confusion with my wip example than it help clarify intent. Sorry about that

Oh certainly, a function split makes sense. I was really just arguing about error handling. And no worries, I like a good techinical discussion.

jan--f · 2018-11-22T08:15:50Z

@wjwithagen any new info from the FreeBSD side?

In order to be able to retrieve udev information, we must expose its socket. As per, ceph/ceph#25201 ceph-volume will start consuming udev output. Signed-off-by: Sébastien Han <seb@redhat.com>

ErwanAliasr1 · 2018-11-26T17:04:23Z

We'll end up exposing / to get the whole thing working. That starts to be worrying. Containers were made to run simple tasks and we endup with a nearly full host exposing :(

sebastian-philipp · 2018-11-26T17:07:25Z

Containers were made to run simple tasks and we endup with a nearly full host exposing :(

Yeah, Containers were not designed with OSDs in mind.

In order to be able to retrieve udev information, we must expose its socket. As per, ceph/ceph#25201 ceph-volume will start consuming udev output. Signed-off-by: Sébastien Han <seb@redhat.com>

This intends to mimic the C++ implementation in src/common/blkdev.cc. Signed-off-by: Jan Fajerski <jfajerski@suse.com>

Signed-off-by: Jan Fajerski <jfajerski@suse.com>

jan--f · 2018-11-30T15:25:26Z

@Sage @ErwanAliasr1 @leseb @alfredodeza I think we can go ahead with this if everybody is on board to switch to using ID_MODEL + _ + ID_SERIAL_SHORT instead of only ID_SERIAL. The python implementation returns an empty string on FreeBSD but it seems like ceph-volume isn't fully functional there anyway.

liewegas

LGTM

ErwanAliasr1 · 2018-11-30T15:49:52Z

Looks good to me. Thx for your work. Le ven. 30 nov. 2018 à 16:29, Jan Fajerski <notifications@github.com> a écrit :

…

@Sage <https://github.com/Sage> @ErwanAliasr1 <https://github.com/ErwanAliasr1> @leseb <https://github.com/leseb> @alfredodeza <https://github.com/alfredodeza> I think we can go ahead with this if everybody is on board to switch to using ID_MODEL + _ + ID_SERIAL_SHORT instead of only ID_SERIAL. The python implementation returns an empty string on FreeBSD but it seems like ceph-volume isn't fully functional there anyway. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#25201 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEVEHfmgWkcBL4BUH3eUXKtS5UA0qvGUks5u0U7wgaJpZM4YtD2w> .

leseb

ID_ from udev is fine, thanks!

alfredodeza · 2018-11-30T16:12:33Z

@jan--f one last question (the PR looks fine), we have been backporting ceph-volume PRs all the way back to Luminous. Is the C++ change able to be backported to Luminous along with this PR?

It might get complicated to get this branch in Luminous otherwise

liewegas · 2018-11-30T16:15:21Z

On Fri, 30 Nov 2018, Alfredo Deza wrote: @jan--f one last question (the PR looks fine), we have been backporting ceph-volume PRs all the way back to Luminous. Is the C++ change able to be backported to Luminous along with this PR? It might get complicated to get this branch in Luminous otherwise

Nope... the device_id code is new in nautilus. As long as it's in a separate commit message it should be trivial to skip that patch!

In order to be able to retrieve udev information, we must expose its socket. As per, ceph/ceph#25201 ceph-volume will start consuming udev output. Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit 997667a)

jan--f added common DNM ceph-volume labels Nov 21, 2018

jan--f requested review from ErwanAliasr1, alfredodeza and liewegas November 21, 2018 12:53

jan--f force-pushed the c-v-inventory-add-device_id branch from d46eec5 to 56b3186 Compare November 21, 2018 12:54

ErwanAliasr1 reviewed Nov 21, 2018

View reviewed changes

leseb reviewed Nov 21, 2018

View reviewed changes

alfredodeza suggested changes Nov 21, 2018

View reviewed changes

leseb mentioned this pull request Nov 26, 2018

osd: expose udev into the container ceph/ceph-ansible#3360

Merged

Jan Fajerski added 2 commits November 28, 2018 10:21

ceph-volume: add device_id field to inventory output

41da5e9

This intends to mimic the C++ implementation in src/common/blkdev.cc. Signed-off-by: Jan Fajerski <jfajerski@suse.com>

ceph-volume: add test for disk.udevadm_property

58316e3

Signed-off-by: Jan Fajerski <jfajerski@suse.com>

jan--f force-pushed the c-v-inventory-add-device_id branch from 56b3186 to 58316e3 Compare November 28, 2018 09:23

alfredodeza mentioned this pull request Nov 29, 2018

ceph-volume zap: improve zapping to remove all partitions and all LVs, encrypted or not #25330

Merged

jan--f changed the title ~~[DNM] ceph-volume add device_id to inventory listing~~ ceph-volume add device_id to inventory listing Nov 30, 2018

jan--f removed the DNM label Nov 30, 2018

liewegas approved these changes Nov 30, 2018

View reviewed changes

leseb approved these changes Nov 30, 2018

View reviewed changes

alfredodeza approved these changes Nov 30, 2018

View reviewed changes

alfredodeza merged commit b8a64b5 into ceph:master Nov 30, 2018

This was referenced Nov 30, 2018

mimic ceph-volume add device_id to inventory listing #25349

Merged

luminous ceph-volume add device_id to inventory listing #25350

Merged

alfredodeza mentioned this pull request Jan 2, 2019

common/blkdev, ceph-volume: improve get_device_id #25425

Merged

Conversation

jan--f commented Nov 21, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jan--f Nov 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jan--f commented Nov 22, 2018

Uh oh!

ErwanAliasr1 commented Nov 26, 2018

Uh oh!

sebastian-philipp commented Nov 26, 2018

Uh oh!

jan--f commented Nov 30, 2018

Uh oh!

liewegas left a comment

Choose a reason for hiding this comment

Uh oh!

ErwanAliasr1 commented Nov 30, 2018 via email

Uh oh!

leseb left a comment

Choose a reason for hiding this comment

Uh oh!

alfredodeza commented Nov 30, 2018

Uh oh!

liewegas commented Nov 30, 2018 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

jan--f Nov 27, 2018 •

edited

Loading