Skip to content

rbd-nbd: use netlink interface by default#55234

Merged
idryomov merged 1 commit intoceph:mainfrom
ajarr:wip-64063
Jan 26, 2024
Merged

rbd-nbd: use netlink interface by default#55234
idryomov merged 1 commit intoceph:mainfrom
ajarr:wip-64063

Conversation

@ajarr
Copy link
Contributor

@ajarr ajarr commented Jan 18, 2024

Mapping rbd images to nbd devices using ioctl interface is not robust. It was discovered that the device size or the md5 checksum of the nbd device was incorrect immediately after mapping using ioctl method. When using the nbd netlink interface to map RBD images the issue was not encountered. Switch to using nbd netlink interface for mapping.

Fixes: https://tracker.ceph.com/issues/64063

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@ajarr ajarr requested a review from a team as a code owner January 18, 2024 15:01
@ajarr ajarr marked this pull request as draft January 18, 2024 15:01
@ajarr
Copy link
Contributor Author

ajarr commented Jan 18, 2024

After switching the default rbd-nbd mapping to use netlink interface, ran the rbd/nbd qa-suite twice,
https://pulpito.ceph.com/rraja-2024-01-18_00:51:14-rbd:nbd-wip-ajarr-64063-distro-default-smithi/
https://pulpito.ceph.com/rraja-2024-01-18_00:51:14-rbd:nbd-wip-ajarr-64063-distro-default-smithi/

workloads/rbd_nbd consistently fails at https://github.com/ceph/ceph/blob/v19.0.0/qa/workunits/rbd/rbd-nbd.sh#L214
workloads/rbd_fsx_nbd also fails
workloads/rbd_nbd_diff_continuous passes
Need to investigate the failures

@ajarr
Copy link
Contributor Author

ajarr commented Jan 18, 2024

Investigated the rbd/nbd workload failure.

After a RBD image is mapped to a NBD device using the netlink interface, resizing of the image doesn't update the number of blocks of the NBD device (as seen in /proc/partitions). Only after the device is un-mapped and the image is re-mapped using the netlink interface, the correct block device size of the NBD device is reported.

$ rbd --cluster=site-a -p data create img-use-netlink --size 64M
$ sudo ./bin/rbd --cluster=site-a device map -t nbd -o try-netlink data/img-use-netlink
/dev/nbd4
$ awk -v dev="nbd4" '$4 == dev {print $3}' /proc/partitions
65536
$ rbd --cluster=site-a resize data/img-use-netlink --size 128M
Resizing image: 100% complete...done.
$ awk -v dev="nbd4" '$4 == dev {print $3}' /proc/partitions
65536
$  sleep 60; awk -v dev="nbd4" '$4 == dev {print $3}' /proc/partitions
65536
$ sudo ./bin/rbd --cluster=site-a device unmap -t nbd /dev/nbd4
$ sudo ./bin/rbd --cluster=site-a device map -t nbd -o try-netlink data/img-use-netlink
$ # correct value reported after remapping
$ awk -v dev="nbd4" '$4 == dev {print $3}' /proc/partitions
131072

However, when an image was mapped to a NBD device using the ioctl interface, resizing the image also updated the NBD block device size.

$ rbd --cluster=site-a -p data create img-ioctl --size 64M
$ # Please note the default nbd mapping is using the ioctl interface, i.e., without this PR change
$ sudo ./bin/rbd --cluster=site-a device map -t nbd data/img-ioctl
/dev/nbd5
$ rbd --cluster=site-a resize data/img-ioctl --size 192M
Resizing image: 100% complete...done.
$ # Correct block size of device immediately reported without needing to remap the image
$ awk -v dev="nbd5" '$4 == dev {print $3}' /proc/partitions
 196608

@ajarr
Copy link
Contributor Author

ajarr commented Jan 18, 2024

Also ran the rbd/nbd workload after commenting out the resize test in rbd-nbd.sh, and used netlink interface as the default method of mapping to NBD devices.
main...ajarr:ceph:wip-ajarr-64063-comment-out-resize-jan-18-1

The workload passed,
https://pulpito.ceph.com/rraja-2024-01-18_20:02:15-rbd:nbd-wip-ajarr-64063-distro-default-smithi/7520747/
https://pulpito.ceph.com/rraja-2024-01-18_20:02:15-rbd:nbd-wip-ajarr-64063-distro-default-smithi/7520749/

It appears that only the resize test in rbd-nbd.sh fails, after switching the default mapping to use netlink interface.

@idryomov
Copy link
Contributor

After a RBD image is mapped to a NBD device using the netlink interface, resizing of the image doesn't update the number of blocks of the NBD device (as seen in /proc/partitions). Only after the device is un-mapped and the image is re-mapped using the netlink interface, the correct block device size of the NBD device is reported.

This bit in handle_notify_entry() is suspicious:

      if (use_netlink) {
        ret = netlink_resize(nbd_index, new_size);
      } else {
        ret = ioctl(fd, NBD_SET_SIZE, new_size);
        if (ret < 0) {
          derr << "resize failed: " << cpp_strerror(errno) << dendl;
        }
      }
      if (!ret) {
        size = new_size;
      }

Potential errors from netlink_resize() aren't logged with derr like errors from ioctl() are. Now netlink_resize() logs errors itself, but only with cerr, so those log messages might be going nowhere. The first step would be to double check whether netlink_resize() succeeds here.

@ajarr
Copy link
Contributor Author

ajarr commented Jan 22, 2024

Potential errors from netlink_resize() aren't logged with derr like errors from ioctl() are. Now netlink_resize() logs errors itself, but only with cerr, so those log messages might be going nowhere. The first step would be to double check whether netlink_resize() succeeds here.

netlink_resize() fails. Added dout/derr messages in netlink_resize().

   sock = netlink_init(&nl_id);
   if (!sock) {
+    derr << __func__ << " Netlink interface not supported." << dendl;
     cerr << "rbd-nbd: Netlink interface not supported." << std::endl;
     return 1;
   }
+  dout(10) << __func__ <<  "Netlink interface initialized" <<dendl;
 
   nl_socket_modify_cb(sock, NL_CB_VALID, NL_CB_CUSTOM, genl_handle_msg, NULL);
 
   msg = nlmsg_alloc();
   if (!msg) {
+    derr << __func__ << " Could not allocate netlink message." << dendl;
     cerr << "rbd-nbd: Could not allocate netlink message." << std::endl;
     goto free_sock;
   }
+  dout(10) << __func__ << " Netlink message allocated" <<dendl;
 
   if (!genlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, nl_id, 0, 0,
                    NBD_CMD_RECONFIGURE, 0)) {
+    derr << __func__ << "Could not setup message." << dendl;
     cerr << "rbd-nbd: Could not setup message." << std::endl;
     goto free_msg;
   }
+  dout(10) << __func__ << " Netlink message setup" <<dendl;
 
   NLA_PUT_U32(msg, NBD_ATTR_INDEX, nbd_index);
   NLA_PUT_U64(msg, NBD_ATTR_SIZE_BYTES, size);
 
   ret = nl_send_sync(sock, msg);
   if (ret < 0) {
+    derr << " netlink resize failed: " << nl_geterror(ret) << dendl;
     cerr << "rbd-nbd: netlink resize failed: " << nl_geterror(ret) << std::endl;
     goto free_sock;
   }


The netlink_resize() fails when sending the NBD_CMD_RECONFIGURE message using nl_send_sync . nl_send_sync fails with Invalid input data or parameter.

2024-01-22T11:06:18.352-0500 7f755effd6c0  5 rbd-nbd: resize detected
2024-01-22T11:06:18.352-0500 7f755effd6c0 10 rbd-nbd: netlink_resizeNetlink interface initialized
2024-01-22T11:06:18.352-0500 7f755effd6c0 10 rbd-nbd: netlink_resize Netlink message allocated 
2024-01-22T11:06:18.352-0500 7f755effd6c0 10 rbd-nbd: netlink_resize Netlink message setup
2024-01-22T11:06:18.352-0500 7f755effd6c0 -1 rbd-nbd:  netlink resize failed: Invalid input data or parameter
2024-01-22T11:06:18.352-0500 7f755effd6c0 -1 rbd-nbd: netlink resize failed: (0) Success

@ajarr
Copy link
Contributor Author

ajarr commented Jan 23, 2024

Fixed the netlink_resize() issue here, #55287

The rbd/nbd qa suite show now pass with netlink set as default mapping interface

@ajarr ajarr marked this pull request as ready for review January 23, 2024 21:59
@github-actions github-actions bot added the tests label Jan 23, 2024
@idryomov idryomov changed the title rbd-nbd: map using netlink interface by default rbd-nbd: use netlink interface by default Jan 24, 2024
@ajarr ajarr requested a review from idryomov January 24, 2024 22:16
@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@rosinL
Copy link
Member

rosinL commented Jan 25, 2024

jenkins test make check arm64

Mapping rbd images to nbd devices using ioctl interface is not
robust. It was discovered that the device size or the md5 checksum
of the nbd device was incorrect immediately after mapping using
ioctl method. When using the nbd netlink interface to map RBD images
the issue was not encountered. Switch to using nbd netlink interface
for mapping.

Fixes: https://tracker.ceph.com/issues/64063
Signed-off-by: Ramana Raja <rraja@redhat.com>
@idryomov
Copy link
Contributor

jenkins test make check

@idryomov
Copy link
Contributor

jenkins test api

@idryomov idryomov merged commit 2b11aa3 into ceph:main Jan 26, 2024
@ajarr ajarr deleted the wip-64063 branch January 26, 2024 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants