Skip to content

common/blkdev, ceph-volume: improve get_device_id#25425

Merged
liewegas merged 5 commits intoceph:masterfrom
liewegas:wip-better-devid
Dec 8, 2018
Merged

common/blkdev, ceph-volume: improve get_device_id#25425
liewegas merged 5 commits intoceph:masterfrom
liewegas:wip-better-devid

Conversation

@liewegas
Copy link
Member

@liewegas liewegas commented Dec 6, 2018

Sigh... the Areca controllers in the mira machines have weird results.

Also, my desktop doesn't have ID_SERIAL_SHORT, just ID_SERIAL. So, let's try a few different things!

@liewegas liewegas requested review from alfredodeza and jan--f and removed request for alfredodeza December 6, 2018 15:44
@liewegas liewegas added the core label Dec 6, 2018
@ErwanAliasr1
Copy link
Contributor

That situation will be present on many versions of controllers. Why not having some sample udev output and test them in a TDD fashion so we can add more cases later while checking we don't break the format ?

@ErwanAliasr1
Copy link
Contributor

looks like there is some. oups.

@ErwanAliasr1
Copy link
Contributor

I have a couple of some devices from various hardware to offer : http://pastebin.test.redhat.com/679978

@leseb
Copy link
Member

leseb commented Dec 6, 2018

Looks like we need the output with --property instead to remove the first field, based on https://github.com/ceph/ceph/blob/master/src/ceph-volume/ceph_volume/util/disk.py#L226-L227

@alfredodeza
Copy link
Contributor

Seems like there is some room to still get this incorrectly for some manufacturers. Are we fine by being correct in a bunch of cases and be uncertain about the rest?

@liewegas
Copy link
Member Author

liewegas commented Dec 6, 2018

I agree. I don't think we really have a choice, though! It's either this, or we give up and use the WWID, which is just some hex gibberish.

I think this will work for the vast majority of cases, and we can whittle away at the outliers over time.

@alfredodeza
Copy link
Contributor

jenkins test ceph-volume tox

Copy link
Contributor

@jan--f jan--f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should've been surprised if things would just work...

This looks good to me. I suppose its good enough as long as we end up with something in dev_id.

Copy link
Contributor

@ErwanAliasr1 ErwanAliasr1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This samples are about megaraid, perc & hpa controllers.

I added the Shared_PERC8 because its a pretty uncommon setup.
The PERC (raid controller) is part of the chassis of the blade server and configured on the chassis too.
Once the RAID array is setup, the logical volume is attached via the pci-express via an SR-IOV virtual device to the selected compute blade server.
The compute server detects a new megaraid controller that have a single logical volume attached too it.

This is where the shared is coming from. A single real raid controller on the chassis, shared to compute blades via an SR-IOV.

A few interesting cases:

1. autriche (my desktop)

$ sudo udevadm info /dev/nvme0n1 | grep ID_
E: ID_PART_TABLE_TYPE=gpt
E: ID_PART_TABLE_UUID=c83d5616-676b-4667-bcf3-c82fd4fc7e64
E: ID_SERIAL=Samsung SSD 960 EVO 250GB_S3ESNX0J958081E
E: ID_SERIAL_SHORT=S3ESNX0J958081E

- no ID_MODEL or ID_VENDOR
-> use ID_SERIAL

2. my dev box

gnit:~ (master) 09:09 AM $ udevadm info /dev/nvme0n1 | grep ID_
E: ID_FS_TYPE=xfs
E: ID_FS_USAGE=filesystem
E: ID_FS_UUID=860d4503-9c9d-4c24-af09-4266b7717a5c
E: ID_FS_UUID_ENC=860d4503-9c9d-4c24-af09-4266b7717a5c
E: ID_MODEL=INTEL SSDPEDMD400G4
E: ID_PATH=pci-0000:82:00.0-nvme-1
E: ID_PATH_TAG=pci-0000_82_00_0-nvme-1
E: ID_SERIAL=INTEL SSDPEDMD400G4_CVFT520200G7400BGN
E: ID_SERIAL_SHORT=CVFT520200G7400BGN
E: ID_WWN=nvme.8086-43564654353230323030473734303042474e-494e54454c205353445045444d443430304734-00000001

- no ID_VENDOR
-> ID_MODEL + ID_SERIAL_SHORT

gnit:~ (master) 09:12 AM $ udevadm info /dev/sda | grep ID_
E: ID_ATA=1
E: ID_ATA_DOWNLOAD_MICROCODE=1
E: ID_ATA_FEATURE_SET_HPA=1
E: ID_ATA_FEATURE_SET_HPA_ENABLED=1
E: ID_ATA_FEATURE_SET_PM=1
E: ID_ATA_FEATURE_SET_PM_ENABLED=1
E: ID_ATA_FEATURE_SET_SECURITY=1
E: ID_ATA_FEATURE_SET_SECURITY_ENABLED=0
E: ID_ATA_FEATURE_SET_SECURITY_ENHANCED_ERASE_UNIT_MIN=2
E: ID_ATA_FEATURE_SET_SECURITY_ERASE_UNIT_MIN=2
E: ID_ATA_FEATURE_SET_SECURITY_FROZEN=1
E: ID_ATA_FEATURE_SET_SMART=1
E: ID_ATA_FEATURE_SET_SMART_ENABLED=1
E: ID_ATA_ROTATION_RATE_RPM=0
E: ID_ATA_SATA=1
E: ID_ATA_SATA_SIGNAL_RATE_GEN1=1
E: ID_ATA_SATA_SIGNAL_RATE_GEN2=1
E: ID_ATA_WRITE_CACHE=1
E: ID_ATA_WRITE_CACHE_ENABLED=1
E: ID_BUS=ata
E: ID_MODEL=INTEL_SSDSC2BB240G4
E: ID_MODEL_ENC=INTEL\x20SSDSC2BB240G4\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20
E: ID_PART_TABLE_TYPE=dos
E: ID_PART_TABLE_UUID=bb35118c
E: ID_PATH=pci-0000:00:1f.2-ata-1
E: ID_PATH_TAG=pci-0000_00_1f_2-ata-1
E: ID_REVISION=D2010355
E: ID_SERIAL=INTEL_SSDSC2BB240G4_BTWL3414034J240NGN
E: ID_SERIAL_SHORT=BTWL3414034J240NGN
E: ID_TYPE=disk
E: ID_WWN=0x55cd2e404b4e47d8
E: ID_WWN_WITH_EXTENSION=0x55cd2e404b4e47d8

- no ID_VENDOR
-> ID_MODEL + ID_SERIAL_SHORT

3. mira lab machine (old areca controller in JBOD mode, I think)

root@mira055:~# udevadm info /dev/sdb | grep ID_
E: ID_BUS=scsi
E: ID_MODEL=HUS724040ALA640
E: ID_MODEL_ENC=HUS724040ALA640\x20
E: ID_PART_TABLE_TYPE=gpt
E: ID_PART_TABLE_UUID=957b2db6-de5c-46cb-a672-243fa12d55b2
E: ID_PATH=pci-0000:01:00.0-scsi-0:0:0:1
E: ID_PATH_TAG=pci-0000_01_00_0-scsi-0_0_0_1
E: ID_REVISION=R001
E: ID_SCSI=1
E: ID_SCSI_SERIAL=PN1334PBH5JMJS
E: ID_SERIAL=2001b4d2058da3a00
E: ID_SERIAL_SHORT=001b4d2058da3a00
E: ID_TYPE=disk
E: ID_VENDOR=HGST
E: ID_VENDOR_ENC=HGST\x20\x20\x20\x20

- ID_VENDOR and ID_MODEL
- ID_MODEL doesn't include vendor name + _
- ID_SERIAL and _SHORT are junk
- ID_SCSI_SERIAL has the serial number!  wth
-> ID_VENDOR + ID_MODEL + ID_SCSI_SHORT

Added a bunch of udevadm output samples.

So,

- if ID_VENDOR + ID_MODEL + ID_SCSI_SERIAL are present, use them.
- if ID_MODEL + ID_SERIAL_SHORT are present, use them
- if ID_SERIAL is present, use it.
- fail

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
According to Ryan Meredith <rmeredith@micron.com>, the NVMes begin with
MTFD, but their other devices use the standard Micron_$model_$serial.

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
The erwan1 one unfortunately has no model information :(.  The other two
work with the current implementation, although erwan.v1.sdb has
"Shared_PERC8" for the model, and I suspect the Shared_ prefix is not
part of the real model?

Signed-off-by: Sage Weil <sage@redhat.com>
@liewegas liewegas merged commit eef391d into ceph:master Dec 8, 2018
liewegas added a commit that referenced this pull request Dec 8, 2018
* refs/pull/25425/head:
	test/common/blkdev-udevadm-info-samples: add a few test cases
	ceph-volume: Micron SSDs don't include vendor name in ID_SERIAL
	common/blkdev: micron SSDs don't include vendor name in ID_SERIAL
	ceph-volume: update get_device_id to match in-tree implementation
	common/blkdev: improve device_id generation

Reviewed-by: Jan Fajerski <jfajerski@suse.com>
@alfredodeza
Copy link
Contributor

The lack of tests for this is killing us :( These changes just broke everything in master for ceph-volume:

[root@osd0 vagrant]# CEPH_VOLUME_DEBUG=1 ceph-volume lvm batch --bluestore --report /dev/sdb
usage: ceph-volume lvm batch [-h] [--bluestore] [--filestore] [--report]
                             [--yes] [--format {json,pretty}] [--dmcrypt]
                             [--crush-device-class CRUSH_DEVICE_CLASS]
                             [--no-systemd]
                             [--osds-per-device OSDS_PER_DEVICE]
                             [--block-db-size BLOCK_DB_SIZE]
                             [--journal-size JOURNAL_SIZE] [--prepare]
 ceph-volume lvm batch: error: argument DEVICES: invalid <ceph_volume.util.arg_validators.ValidDevice object at 0x7f68072497d0> value: '/dev/sdb'

The error is eaten up by argparse:

>>> from ceph_volume.util import device
>>> device.Device('/dev/sdb')
 Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/ceph_volume/util/device.py", line 92, in __init__
    self.device_id = self._get_device_id()
  File "/usr/lib/python2.7/site-packages/ceph_volume/util/device.py", line 198, in _get_device_id
    dev_id = '_'.join(p['ID_MODEL'], p['ID_SERIAL_SHORT'])
TypeError: join() takes exactly one argument (2 given)

@LenzGr
Copy link
Contributor

LenzGr commented Dec 10, 2018

These changes just broke everything in master for ceph-volume

Looks like @sebastian-philipp has just submitted #25469 to fix this.

@alfredodeza
Copy link
Contributor

@liewegas just want to confirm that we shouldn't backport the C++ changes for this right? (similar to #25201 )

@liewegas
Copy link
Member Author

liewegas commented Jan 2, 2019

@liewegas just want to confirm that we shouldn't backport the C++ changes for this right? (similar to #25201 )

Correct!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants