Skip to content

Conversation

@jiangliu
Copy link
Member

This PR migrates vhost_backend and vhost_gen code from the firecracker project into the vhost crate.
Support of vhost-user protocol will be added later.

///
/// An eventfd is useful because it is sendable across processes and can be used for signaling in
/// and out of the KVM API. They can also be polled like any other file descriptor.
pub struct EventFd {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the status on this crate vmm-sys-utils? I'm asking because we cannot merge the PR as it is now, right? I don't think we want to pull sys_utils/... into the vhost crate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a problem. This crate depends on the vmm-sys-utils and vm-memory.


/* Auto-generated by bindgen then manually modified for simplicity */

#![allow(non_upper_case_globals)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is being discussed on the ML, and this is not clear yet, but it seems you will have to put those bindings in a separate crate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have replied to the ML thread. For the vhost case, I do feel it's better to include the binding into the crate with manual modifications.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the coverage test perspective is better to have the bindings in a different crate. In Firecracker we are excluding auto-generated files when computing the coverage. If we have them in the same crate it means we will have to make changes to the test_coverage.py so that it is more flexible and we can exclude independent files depending on the repository being tested. The context is the plan to have common integration tests in one repository and all other repositories to consumes them.

But if we don't use most of the things that are auto-generated then maybe we should just have a minimal vhost_bindings file inside the repsository. I don't have any preference between these two.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about the test_coverage.py. Could we solve this issue by adding more test cases?

@sboeuf
Copy link
Collaborator

sboeuf commented May 22, 2019

@jiangliu @bonzini @andreeaflorescu what's the status here?
I'm very interested in seeing this crate moving forward in order to base some vhost-user implementations off of it.

@jiangliu
Copy link
Member Author

@jiangliu @bonzini @andreeaflorescu what's the status here?
I'm very interested in seeing this crate moving forward in order to base some vhost-user implementations off of it.

I'm refine the crate to better support reconnection and almost ready, plan to send out early next week. I have also decoupled Unix domain socket client/server from host-user master/slave roles to enable the use case that the vhost-user master acts as servers.

@sboeuf
Copy link
Collaborator

sboeuf commented Jun 3, 2019

@jiangliu the PR LGTM! Are we waiting for review from some folks or do you still have some work planned on this PR?
/cc @andreeaflorescu @bonzini @sameo

@jiangliu
Copy link
Member Author

jiangliu commented Jun 4, 2019

@jiangliu the PR LGTM! Are we waiting for review from some folks or do you still have some work planned on this PR?
/cc @andreeaflorescu @bonzini @sameo
Currently waiting reviews, and then need to refine documentation and other requirements of the rust-vmm project.

src/backend.rs Outdated
///
/// Vhost-based virtio devices are different from regular virtio devices because the the vhost
/// backend takes care of handling all the data transfer. The device itself only needs to deal with
/// setting up the the backend driver and managing the control channel.
Copy link

@stefanha stefanha Jun 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not generally applicable and relates to vhost-net only. It may cause confusion.

The comment is essentially saying that in vhost-net the rx/tx virtqueues are handled by vhost and the ctrl virtqueue is handled by the VMM.

In the general case the responsibility of which virtqueue is handled by vhost and which is handled by the VMM is decided when the first vhost implementation of a device is designed. Usually the dataplane is handled by vhost and anything else is handled by the VMM, but this is not a strict rule or enforced in any way.

I think a clearer way of explaining vhost is that a vhost device is a subset of a virtio device. Sometimes all virtqueues but often just those considered the dataplane. The purpose of vhost is to implement a subset of a virtio device's functionality outside the VMM process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment has been updated as:
/// Vhost devices are subset of virtio devices, which improve virtio device's performance by
/// delegating data plane operations to dedicated IO service processes. Vhost devices use the
/// same virtqueue layout as virtio devices to allow vhost devices to be mapped directly to
/// virtio devices.
/// The purpose of vhost is to implement a subset of a virtio device's functionality outside the
/// VMM process. Typically fast paths for IO operations are delegated to the dedicated IO service
/// processes, and slow path for device configuration are still handled by the VMM process. It may
/// also be used to control access permissions of virtio backend devices.

src/lib.rs Outdated
//!
//! Virtio devices use virtqueues to transport data efficiently. Virtqueue is a set of three
//! different single-producer, single-consumer ring structures designed to store generic
//! scatter-gather I/O.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outdated as of VIRTIO 1.1. The new Packed Virtqueue layout is a single ring.

This comment could be updated to:
Virtqueues are often implemented as VIRTIO 1.1 Split Virtqueues with a set of three ...

Eventually this crate should support Packed Virtqueues too for better performance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment has been updated as:
Virtio devices use virtqueues to transport data efficiently. The first generation of virtqueue
is a set of three different single-producer, single-consumer ring structures designed to store
generic scatter-gather I/O. The virtio specification 1.1 introduces an alternative compact
virtqueue layout named "Packed Virtqueue", which is more friendly to memory cache system and
hardware implemented virtio devices. The packed virtqueue uses read-write memory, that means
the memory will be both read and written by both host and guest. The new Packed Virtqueue is
preferred for performance.

@stefanha
Copy link

@sboeuf @jiangliu It seems that the item max_queue_num in node should not be configured by VMM. max_queue_num is obtained from slave by calling get_queue_num, to indicate the maximum queue number slave could support. It could be a gatekeeper to check if VMM setup the queues more than slave cloud handle. I see the set_vring_* functions is handling like this. Thus, I thought the support added here should be removed. Correct me if I'm wrong :-P

This appears to be wrong. Here is what I've found:

In vhost-user "multiple virtqueues" and "multiqueue" are two separate concepts. Devices with multiple virtqueues have always been possible. From the beginning of vhost the vhost-net device had an rx virtqueue and a tx virtqueue. There was just no way for the master to ask the slave if more virtqueues are supported.

Devices with fixed virtqueue layouts are NOT required to advertise VHOST_USER_PROTOCOL_F_MQ because the VMM already knows the number of virtqueues.

When multiqueue devices were implemented it was necessary to extend the protocol so that the slave could report how many virtqueues are available. Multiqueue devices have a variable number of virtqueues that depends on the capabilities of the device (i.e. the vhost-user slave).

I think that ideally all vhost-user device backends should implement VHOST_USER_PROTOCOL_F_MQ and VHOST_USER_GET_QUEUE_NUM so the master can fetch max_queue_num from the vhost device backend.

However, according to the vhost-user specification and how existing devices are implemented, it is not possible to rely on this today. Here is how it works:

  1. The VMM configures max_queue_num based on the number of queues offered by its virtio device (NOT the vhost device).
  2. If the vhost-user device backend supports VHOST_USER_GET_QUEUE_NUM then max_queue_num can be fetched and updates its value.
  3. If the VMM user explicitly requested a specific multiqueue number (e.g. on the command-line or through a management API) then device creation fails if this number exceeds max_queue_num (e.g. "Cannot create virtio-net device with 1024 virtqueues, vhost-user slave only supports 64").

Therefore Sebastien's patch or something similar is necessary. The current master code conflates multiqueue with multiple virtqueues when they are actually separate concepts.

I have sent QEMU patches to make libvhost-user always advertise VHOST_USER_PROTOCOL_F_MQ but please do not rely on this.

@bjzhjing
Copy link
Contributor

@stefanha There is one thing I would to confirm with you, that max_queue_num represents the queue pair number, rather than the total number of virtqueues, right?

@stefanha
Copy link

@stefanha There is one thing I would to confirm with you, that max_queue_num represents the queue pair number, rather than the total number of virtqueues, right?

max_queue_num is the number of virtqueues.

These interfaces are derived from the firecracker and crosvm projects.
They may be extended when enable support of vhost-user protocol.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
@jiangliu
Copy link
Member Author

jiangliu commented Sep 2, 2019

Great, I'll let you know when this will be solved on the virtio-fs side of things.

I plan to integrate this workaround into vhost with comments stating that it's a temporary workaround.

@jiangliu jiangliu changed the title Migrate vhost code from the firecracker project into the vhost crate Introduce vhost crate to support vhost-kernel and vhost-user backend drivers Sep 2, 2019
@jiangliu
Copy link
Member Author

jiangliu commented Sep 2, 2019

In vhost-user "multiple virtqueues" and "multiqueue" are two separate concepts. Devices with multiple >virtqueues have always been possible. From the beginning of vhost the vhost-net device had an rx >virtqueue and a tx virtqueue. There was just no way for the master to ask the slave if more virtqueues are >supported.
Thanks for the explanation, it's easy to mix up these two concepts.
I will carry the patch from @sboeuf in next PR.

jiangliu and others added 8 commits September 2, 2019 14:22
The auto-generated vhost binding is small, it would be better to include
it into the vhost crate instead of building another dedicated crate for it.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
The implementation is derived from the crosvm project.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Implement vhost_kern::Vsock to control the in-kernel vhost-vsock driver.
The implementation is derived from the crosvm project.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Only basic messages are defined, and the vhost-user spec is also under
development. So feel free to add needed messages.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Daniel Prilik <daniel@prilik.com>
Implement unix domain socket based communication channel for vhost-user,
which could pass file decriptors between processes.

Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel Prilik <daniel@prilik.com>
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Daniel Prilik <daniel@prilik.com>
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel Prilik <daniel@prilik.com>
Not all slave requests are implemented, only the subset of messages used by
virtio-fs.

The SlaveReqHandler struct creates the slave-master socketpair, and handles
all incoming / outgoing message validation and handler routing.
SlaveReqHandler::new accepts a struct that implements VhostUserSlaveReqHandler,
overriding the default handlers for relevant slave requests.
SlaveReqHandler::new also returns a UnixStream that must be sent to the slave
via Master::set_slave_request_fd.

As both Master and Slave requests share the same wire-format, it could be easy
to mix up the two when handling requests, so this commit introduces PhantomData
markers to Endpoints and VhostUserMsgHeader that specify if they operate on
MasterReqs or SlaveReqs.

Lastly, some library internals, notably the Listener and Endpoint structs, have
been marked as pub(crate). This necessitates a small refactor of how Slaves
are constructed, introducing a SlaveListener struct which returns a valid Slave
when the socket accepts a connection.

Signed-off-by: Daniel Prilik <danielprilik@gmail.com>
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
jiangliu and others added 5 commits September 2, 2019 14:22
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
We need the vhost-user protocol crate to be more generic to allow
communication with a configurable amount of virtqueues.

This is a workaround for the virtio-fs backend. According to the
vhost-user specification, the master should issue get_queue_num()
to get the maximum number of queues supported by the slave. With
current virtio-fs implementations, it needs multiple virtques, but
it doesn't support the get_queue_num() request and not set the MQ
feature bit too. So this is a workaround to hard-code the max_queue_num
on the master side.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The latest vhost user spec from Qemu doc only define two members in
VhostSetConfigType, master and live migration. These changes can make
rust-vmm compatible with vhost user backend.

Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Use acked_protocol_features to replace acked_virtio_features in
get_config()/set_config() for protocol features like CONFIG.

This patch also fix wrong GET_CONFIG setting for set_config().

Signed-off-by: Yang Zhong <yang.zhong@intel.com>
The VhostUserConfig carries a message with a payload, the contents of
which depend on the kind of device being emulated.

With this change, we calculate the offset of the payload within the
message, check its size corresponds to the expected one, and pass it
to the backend as a reference to a slice adjusted to the payload
dimensions.

The backend will be responsible of validating the payload, as it's the
one aware of its expected contents.

Signed-off-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
This patch replace send_request_with_payload() with
send_request_with_body() to make get_config work.

From vhost-user Spec:
A vhost-user message consists of 3 header fields and a payload.

+---------+-------+------+---------+
| request | flags | size | payload |
+---------+-------+------+---------+

but for config space, the payload include:

Virtio device config space
^^^^^^^^^^^^^^^^^^^^^^^^^^
+--------+------+-------+---------+
| offset | size | flags | payload |
+--------+------+-------+---------+

Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
@sameo
Copy link

sameo commented Mar 17, 2020

This is superseded by PR #9

@sameo sameo closed this Mar 17, 2020
This was referenced Mar 17, 2020
dorindabassey pushed a commit to dorindabassey/vhost that referenced this pull request Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants