-
Notifications
You must be signed in to change notification settings - Fork 86
Introduce vhost crate to support vhost-kernel and vhost-user backend drivers #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sys_util/src/eventfd.rs
Outdated
| /// | ||
| /// An eventfd is useful because it is sendable across processes and can be used for signaling in | ||
| /// and out of the KVM API. They can also be polled like any other file descriptor. | ||
| pub struct EventFd { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the status on this crate vmm-sys-utils? I'm asking because we cannot merge the PR as it is now, right? I don't think we want to pull sys_utils/... into the vhost crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's a problem. This crate depends on the vmm-sys-utils and vm-memory.
|
|
||
| /* Auto-generated by bindgen then manually modified for simplicity */ | ||
|
|
||
| #![allow(non_upper_case_globals)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is being discussed on the ML, and this is not clear yet, but it seems you will have to put those bindings in a separate crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have replied to the ML thread. For the vhost case, I do feel it's better to include the binding into the crate with manual modifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sameo @bonzini @andreeaflorescu @zachreizner need feedback here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the coverage test perspective is better to have the bindings in a different crate. In Firecracker we are excluding auto-generated files when computing the coverage. If we have them in the same crate it means we will have to make changes to the test_coverage.py so that it is more flexible and we can exclude independent files depending on the repository being tested. The context is the plan to have common integration tests in one repository and all other repositories to consumes them.
But if we don't use most of the things that are auto-generated then maybe we should just have a minimal vhost_bindings file inside the repsository. I don't have any preference between these two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about the test_coverage.py. Could we solve this issue by adding more test cases?
|
@jiangliu @bonzini @andreeaflorescu what's the status here? |
I'm refine the crate to better support reconnection and almost ready, plan to send out early next week. I have also decoupled Unix domain socket client/server from host-user master/slave roles to enable the use case that the vhost-user master acts as servers. |
|
@jiangliu the PR LGTM! Are we waiting for review from some folks or do you still have some work planned on this PR? |
|
src/backend.rs
Outdated
| /// | ||
| /// Vhost-based virtio devices are different from regular virtio devices because the the vhost | ||
| /// backend takes care of handling all the data transfer. The device itself only needs to deal with | ||
| /// setting up the the backend driver and managing the control channel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not generally applicable and relates to vhost-net only. It may cause confusion.
The comment is essentially saying that in vhost-net the rx/tx virtqueues are handled by vhost and the ctrl virtqueue is handled by the VMM.
In the general case the responsibility of which virtqueue is handled by vhost and which is handled by the VMM is decided when the first vhost implementation of a device is designed. Usually the dataplane is handled by vhost and anything else is handled by the VMM, but this is not a strict rule or enforced in any way.
I think a clearer way of explaining vhost is that a vhost device is a subset of a virtio device. Sometimes all virtqueues but often just those considered the dataplane. The purpose of vhost is to implement a subset of a virtio device's functionality outside the VMM process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment has been updated as:
/// Vhost devices are subset of virtio devices, which improve virtio device's performance by
/// delegating data plane operations to dedicated IO service processes. Vhost devices use the
/// same virtqueue layout as virtio devices to allow vhost devices to be mapped directly to
/// virtio devices.
/// The purpose of vhost is to implement a subset of a virtio device's functionality outside the
/// VMM process. Typically fast paths for IO operations are delegated to the dedicated IO service
/// processes, and slow path for device configuration are still handled by the VMM process. It may
/// also be used to control access permissions of virtio backend devices.
src/lib.rs
Outdated
| //! | ||
| //! Virtio devices use virtqueues to transport data efficiently. Virtqueue is a set of three | ||
| //! different single-producer, single-consumer ring structures designed to store generic | ||
| //! scatter-gather I/O. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is outdated as of VIRTIO 1.1. The new Packed Virtqueue layout is a single ring.
This comment could be updated to:
Virtqueues are often implemented as VIRTIO 1.1 Split Virtqueues with a set of three ...
Eventually this crate should support Packed Virtqueues too for better performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment has been updated as:
Virtio devices use virtqueues to transport data efficiently. The first generation of virtqueue
is a set of three different single-producer, single-consumer ring structures designed to store
generic scatter-gather I/O. The virtio specification 1.1 introduces an alternative compact
virtqueue layout named "Packed Virtqueue", which is more friendly to memory cache system and
hardware implemented virtio devices. The packed virtqueue uses read-write memory, that means
the memory will be both read and written by both host and guest. The new Packed Virtqueue is
preferred for performance.
This appears to be wrong. Here is what I've found: In vhost-user "multiple virtqueues" and "multiqueue" are two separate concepts. Devices with multiple virtqueues have always been possible. From the beginning of vhost the vhost-net device had an rx virtqueue and a tx virtqueue. There was just no way for the master to ask the slave if more virtqueues are supported. Devices with fixed virtqueue layouts are NOT required to advertise VHOST_USER_PROTOCOL_F_MQ because the VMM already knows the number of virtqueues. When multiqueue devices were implemented it was necessary to extend the protocol so that the slave could report how many virtqueues are available. Multiqueue devices have a variable number of virtqueues that depends on the capabilities of the device (i.e. the vhost-user slave). I think that ideally all vhost-user device backends should implement VHOST_USER_PROTOCOL_F_MQ and VHOST_USER_GET_QUEUE_NUM so the master can fetch max_queue_num from the vhost device backend. However, according to the vhost-user specification and how existing devices are implemented, it is not possible to rely on this today. Here is how it works:
Therefore Sebastien's patch or something similar is necessary. The current master code conflates multiqueue with multiple virtqueues when they are actually separate concepts. I have sent QEMU patches to make libvhost-user always advertise VHOST_USER_PROTOCOL_F_MQ but please do not rely on this. |
|
@stefanha There is one thing I would to confirm with you, that max_queue_num represents the queue pair number, rather than the total number of virtqueues, right? |
max_queue_num is the number of virtqueues. |
These interfaces are derived from the firecracker and crosvm projects. They may be extended when enable support of vhost-user protocol. Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
I plan to integrate this workaround into vhost with comments stating that it's a temporary workaround. |
|
The auto-generated vhost binding is small, it would be better to include it into the vhost crate instead of building another dedicated crate for it. Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
The implementation is derived from the crosvm project. Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Implement vhost_kern::Vsock to control the in-kernel vhost-vsock driver. The implementation is derived from the crosvm project. Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Only basic messages are defined, and the vhost-user spec is also under development. So feel free to add needed messages. Signed-off-by: Liu Jiang <gerry@linux.alibaba.com> Signed-off-by: Daniel Prilik <daniel@prilik.com>
Implement unix domain socket based communication channel for vhost-user, which could pass file decriptors between processes. Signed-off-by: Liu Jiang <gerry@linux.alibaba.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel Prilik <daniel@prilik.com>
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com> Signed-off-by: Daniel Prilik <daniel@prilik.com>
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel Prilik <daniel@prilik.com>
Not all slave requests are implemented, only the subset of messages used by virtio-fs. The SlaveReqHandler struct creates the slave-master socketpair, and handles all incoming / outgoing message validation and handler routing. SlaveReqHandler::new accepts a struct that implements VhostUserSlaveReqHandler, overriding the default handlers for relevant slave requests. SlaveReqHandler::new also returns a UnixStream that must be sent to the slave via Master::set_slave_request_fd. As both Master and Slave requests share the same wire-format, it could be easy to mix up the two when handling requests, so this commit introduces PhantomData markers to Endpoints and VhostUserMsgHeader that specify if they operate on MasterReqs or SlaveReqs. Lastly, some library internals, notably the Listener and Endpoint structs, have been marked as pub(crate). This necessitates a small refactor of how Slaves are constructed, introducing a SlaveListener struct which returns a valid Slave when the socket accepts a connection. Signed-off-by: Daniel Prilik <danielprilik@gmail.com> Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
We need the vhost-user protocol crate to be more generic to allow communication with a configurable amount of virtqueues. This is a workaround for the virtio-fs backend. According to the vhost-user specification, the master should issue get_queue_num() to get the maximum number of queues supported by the slave. With current virtio-fs implementations, it needs multiple virtques, but it doesn't support the get_queue_num() request and not set the MQ feature bit too. So this is a workaround to hard-code the max_queue_num on the master side. Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The latest vhost user spec from Qemu doc only define two members in VhostSetConfigType, master and live migration. These changes can make rust-vmm compatible with vhost user backend. Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Use acked_protocol_features to replace acked_virtio_features in get_config()/set_config() for protocol features like CONFIG. This patch also fix wrong GET_CONFIG setting for set_config(). Signed-off-by: Yang Zhong <yang.zhong@intel.com>
The VhostUserConfig carries a message with a payload, the contents of which depend on the kind of device being emulated. With this change, we calculate the offset of the payload within the message, check its size corresponds to the expected one, and pass it to the backend as a reference to a slice adjusted to the payload dimensions. The backend will be responsible of validating the payload, as it's the one aware of its expected contents. Signed-off-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
This patch replace send_request_with_payload() with send_request_with_body() to make get_config work. From vhost-user Spec: A vhost-user message consists of 3 header fields and a payload. +---------+-------+------+---------+ | request | flags | size | payload | +---------+-------+------+---------+ but for config space, the payload include: Virtio device config space ^^^^^^^^^^^^^^^^^^^^^^^^^^ +--------+------+-------+---------+ | offset | size | flags | payload | +--------+------+-------+---------+ Signed-off-by: Yang Zhong <yang.zhong@intel.com> Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
|
This is superseded by PR #9 |
Gpu set socket feature
This PR migrates vhost_backend and vhost_gen code from the firecracker project into the vhost crate.
Support of vhost-user protocol will be added later.