rbd-nbd: generate and send device cookie with netlink connect request#41323
rbd-nbd: generate and send device cookie with netlink connect request#41323trociny merged 8 commits intoceph:masterfrom
Conversation
There was a problem hiding this comment.
I think it is important to provide some reasonable behavior for the case when the kernel does not support cookie, because I expect we will have a long period when this will be a common situation.
I suppose we can't just forbid attach for this case, because users may have already been using it. Also there may be a third party software that has already been using it and will break after rbd-nbd upgrade.
Making --cookie optional would resolve this or may be it will just delay the issue if we want to make it mandatory eventually. I have no a good idea about the solution. Any thoughts?
And I am not sure I like --all option for the list command. I was thinking we would just make list always print it. Actually I don't have a strong opinion. I see some advantages in printing it only with --all, e.g. we will still have the same list format by default for all device types. But if we want the --all option, we will also need to add it rbd device list.
Also I think you will need to update qa/workunits/rbd/rbd-nbd.sh and make sure it will work for both new and old kernels.
With Kernel Changes:Without Kernel Changes:
Maybe we can allow
I don't have a strong opinion either, was just implementing it as per our discussion from the previous PR :-)
I will check on what needs to be updated in Thanks for your thoughts @trociny |
|
And with the latest V2 patches, the changes respect backward compatibility and works with old and new kernels: |
|
...
Maybe we should forbid |
That will break the existing users Xiubo, IMHO which is something that we do not want. The latest changes here behave as usual when kernel changes are not available, and when kernel changes are available, they demand a cookie at Thanks! |
|
Hi Prasanna, As we discussed, maybe we should do: 1, use the |
Hello Xiubo, Example: Move the image to a different pool Now claim the same device, but for the migrated destination (target) image, on the same host node In this case, one needs to supply the source image_spec's(pool1/image1) md5sum cookie manually. Now, the question is, do we want to give two ways?
Thinking about automation scripts, they still have to save the cookie and keep it ready at least to address this special case. Maybe @idryomov @trociny might add more thoughts here? Thanks! |
Yes it will not work for migration. That is why we need something different than image spec for the identifier.
Don't use
Actually you may/should attach after this step. This is the main reason why we have migration in several steps -- to make the downtime small.
I don't think it is necessary, as it does not improve it radically, but I would not object. |
Cool, make sense, I didn't know this part. Thanks! |
Okay, I didn't notice the migration case. |
trociny
left a comment
There was a problem hiding this comment.
Just a note for discussion. We could allow to specify --cookie on map command, so a user could set it to any value she likes. Not sure we need this functionality, it might be useful for some cases, but seems to introduce yet another chance to shoot in the foot.
I agree it is useful for some cases, I'm happy to provide the changes. But I will wait to hear @idryomov thoughts on this before making the changes. Thanks @trociny for the review. |
|
The kernel patch adding a way to identify device backends is applied and should be part of kernel version 5.14 I'm yet to work on enabling users to specify --cookie on map command, this is something I'm waiting on @idryomov to decide. Many Thanks! |
trociny
left a comment
There was a problem hiding this comment.
LGTM.
BTW, what it is the status of the kernel patch? Just wondering if there a way to try it already without building a custom kernel...
idryomov
left a comment
There was a problem hiding this comment.
Looks good! (pending rbd-nbd.sh fixup)
|
@idryomov please help move this to the finish-line. Thanks! |
|
@pkalever Do you want this to be backported to the pacific? If you do, then I suggest to open a tracker ticket, link this PR there, and set the backport field to "pacific", so it will create the backport ticket automatically after after this is merged. |
|
And by some reason the "Pull Request Needs Rebase?" check failed. May be it indeed nees rebase? |
[root@linux-vm1]# rbd-nbd map rbd-pool/image0 --try-netlink /dev/nbd0 [root@linux-vm1]# cat /sys/block/nbd0/backend c704cb91-c6cf-466e-a335-0e935c0d5e47 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
[root@linux-vm1]# rbd-nbd map rbd-pool/image0 --try-netlink --show-cookie /dev/nbd0 c704cb91-c6cf-466e-a335-0e935c0d5e47 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
[root@linux-vm1]# rbd-nbd attach rbd-pool/image0 --device /dev/nbd0 \
--cookie c704cb91-c6cf-466e-a335-0e935c0d5e47
/dev/nbd0
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
[root@linux-vm1]# rbd-nbd list-mapped id pool namespace image snap device cookie 8133 rbd-pool image0 - /dev/nbd0 c704cb91-c6cf-466e-a335-0e935c0d5e47 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
For backward compatibility allow attach without --cookie option: [root@linux-vm1]# rbd-nbd attach rbd-pool/image0 --device /dev/nbd0 /dev/nbd0 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Example:
$ rbd device map rbd-pool/image --show-cookie --try-netlink --device-type nbd
$ rbd device attach rbd-pool/image --device /dev/nbd0 \
--cookie 6f85d970-10b2-456b-8baf-676aa4d782e4 --device-type nbd
older Kernel versions can use --force to skip the cookie validation
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Allow user to specify cookie of choice at the time of map $ rbd device attach rbd-pool/image --device /dev/nbd0 \ --cookie 6f85d970-10b2-456b-8baf-676aa4d782e4 --options try-netlink Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
While I was navigating to another tab, the issue request got submitted without PR link. Please help adjust the fields as needed, here is the tracker link: https://tracker.ceph.com/issues/53046#note-1
I have rebased this to latest master now. Thanks @trociny |
|
jenkins test make check |
|
jenkins test api |
Problem: On remap/attach of device (i.e. nodeplugin restart), there is no way for rbd-nbd to defend if the backend storage is matching with the initial backend storage. Say, if an initial map request for backend "pool1/image1" got mapped to /dev/nbd0 and the userspace process is terminated (on nodeplugin restart). A next remap/attach (nodeplugin start) request within reattach-timeout is allowed to use /dev/nbd0 for a different backend "pool1/image2" For example, an operation like below could be dangerous: $ sudo rbd-nbd map --try-netlink rbd-pool/ext4-image /dev/nbd0 $ sudo blkid /dev/nbd0 /dev/nbd0: UUID="bfc444b4-64b1-418f-8b36-6e0d170cfc04" TYPE="ext4" $ sudo pkill -15 rbd-nbd <-- nodeplugin terminate $ sudo rbd-nbd attach --try-netlink --device /dev/nbd0 rbd-pool/xfs-image /dev/nbd0 $ sudo blkid /dev/nbd0 /dev/nbd0: UUID="d29bf343-6570-4069-a9ea-2fa156ced908" TYPE="xfs" Solution: rbd-nbd/kernel now provides a way to keep some metadata in sysfs to identify between the device and the backend, so that when a remap/attach request is made, rbd-nbd can compare and avoid such dangerous operations. With the provided solution, as part of the initial map request, backend cookie (ceph-csi VOLID) can be stored in the sysfs per device config, so that on a remap/attach request rbd-nbd will check and validate if the backend per device cookie matches with the initial map backend with the help of cookie. At Ceph-csi we use VOLID as device cookie, which will be unique, we pass the VOLID as cookie at map and use the same at the time of attach, that way rbd-nbd can identify backends and their matching devices. Requires: ceph/ceph#41323 https://lkml.org/lkml/2021/4/29/274 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Problem: On remap/attach of device (i.e. nodeplugin restart), there is no way for rbd-nbd to defend if the backend storage is matching with the initial backend storage. Say, if an initial map request for backend "pool1/image1" got mapped to /dev/nbd0 and the userspace process is terminated (on nodeplugin restart). A next remap/attach (nodeplugin start) request within reattach-timeout is allowed to use /dev/nbd0 for a different backend "pool1/image2" For example, an operation like below could be dangerous: $ sudo rbd-nbd map --try-netlink rbd-pool/ext4-image /dev/nbd0 $ sudo blkid /dev/nbd0 /dev/nbd0: UUID="bfc444b4-64b1-418f-8b36-6e0d170cfc04" TYPE="ext4" $ sudo pkill -15 rbd-nbd <-- nodeplugin terminate $ sudo rbd-nbd attach --try-netlink --device /dev/nbd0 rbd-pool/xfs-image /dev/nbd0 $ sudo blkid /dev/nbd0 /dev/nbd0: UUID="d29bf343-6570-4069-a9ea-2fa156ced908" TYPE="xfs" Solution: rbd-nbd/kernel now provides a way to keep some metadata in sysfs to identify between the device and the backend, so that when a remap/attach request is made, rbd-nbd can compare and avoid such dangerous operations. With the provided solution, as part of the initial map request, backend cookie (ceph-csi VOLID) can be stored in the sysfs per device config, so that on a remap/attach request rbd-nbd will check and validate if the backend per device cookie matches with the initial map backend with the help of cookie. At Ceph-csi we use VOLID as device cookie, which will be unique, we pass the VOLID as cookie at map and use the same at the time of attach, that way rbd-nbd can identify backends and their matching devices. Requires: ceph/ceph#41323 https://lkml.org/lkml/2021/4/29/274 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Problem: On remap/attach of device (i.e. nodeplugin restart), there is no way for rbd-nbd to defend if the backend storage is matching with the initial backend storage. Say, if an initial map request for backend "pool1/image1" got mapped to /dev/nbd0 and the userspace process is terminated (on nodeplugin restart). A next remap/attach (nodeplugin start) request within reattach-timeout is allowed to use /dev/nbd0 for a different backend "pool1/image2" For example, an operation like below could be dangerous: $ sudo rbd-nbd map --try-netlink rbd-pool/ext4-image /dev/nbd0 $ sudo blkid /dev/nbd0 /dev/nbd0: UUID="bfc444b4-64b1-418f-8b36-6e0d170cfc04" TYPE="ext4" $ sudo pkill -15 rbd-nbd <-- nodeplugin terminate $ sudo rbd-nbd attach --try-netlink --device /dev/nbd0 rbd-pool/xfs-image /dev/nbd0 $ sudo blkid /dev/nbd0 /dev/nbd0: UUID="d29bf343-6570-4069-a9ea-2fa156ced908" TYPE="xfs" Solution: rbd-nbd/kernel now provides a way to keep some metadata in sysfs to identify between the device and the backend, so that when a remap/attach request is made, rbd-nbd can compare and avoid such dangerous operations. With the provided solution, as part of the initial map request, backend cookie (ceph-csi VOLID) can be stored in the sysfs per device config, so that on a remap/attach request rbd-nbd will check and validate if the backend per device cookie matches with the initial map backend with the help of cookie. At Ceph-csi we use VOLID as device cookie, which will be unique, we pass the VOLID as cookie at map and use the same at the time of attach, that way rbd-nbd can identify backends and their matching devices. Requires: ceph/ceph#41323 https://lkml.org/lkml/2021/4/29/274 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Problem: On remap/attach of device (i.e. nodeplugin restart), there is no way for rbd-nbd to defend if the backend storage is matching with the initial backend storage. Say, if an initial map request for backend "pool1/image1" got mapped to /dev/nbd0 and the userspace process is terminated (on nodeplugin restart). A next remap/attach (nodeplugin start) request within reattach-timeout is allowed to use /dev/nbd0 for a different backend "pool1/image2" For example, an operation like below could be dangerous: $ sudo rbd-nbd map --try-netlink rbd-pool/ext4-image /dev/nbd0 $ sudo blkid /dev/nbd0 /dev/nbd0: UUID="bfc444b4-64b1-418f-8b36-6e0d170cfc04" TYPE="ext4" $ sudo pkill -15 rbd-nbd <-- nodeplugin terminate $ sudo rbd-nbd attach --try-netlink --device /dev/nbd0 rbd-pool/xfs-image /dev/nbd0 $ sudo blkid /dev/nbd0 /dev/nbd0: UUID="d29bf343-6570-4069-a9ea-2fa156ced908" TYPE="xfs" Solution: rbd-nbd/kernel now provides a way to keep some metadata in sysfs to identify between the device and the backend, so that when a remap/attach request is made, rbd-nbd can compare and avoid such dangerous operations. With the provided solution, as part of the initial map request, backend cookie (ceph-csi VOLID) can be stored in the sysfs per device config, so that on a remap/attach request rbd-nbd will check and validate if the backend per device cookie matches with the initial map backend with the help of cookie. At Ceph-csi we use VOLID as device cookie, which will be unique, we pass the VOLID as cookie at map and use the same at the time of attach, that way rbd-nbd can identify backends and their matching devices. Requires: ceph/ceph#41323 https://lkml.org/lkml/2021/4/29/274 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Fixes: https://tracker.ceph.com/issues/53046
Based on the discussion at #40809, this PR adds below options for rbd-nbd cookie generation and management
Needs: https://lkml.org/lkml/2021/4/29/274 (Applied now and should be part of kernel release 5.14)
Credits: To @idryomov and @trociny for all the discussions around this topic.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>