Skip to content

Conversation

@bjzhjing
Copy link
Contributor

@bjzhjing bjzhjing commented Jun 20, 2019

vhost-user-net could provide good performance during the communication with userspace proces, due to it's mechanism of sharing memory. Enable this support by this PR.

The main purpose to add this support, is aim to support DPDK. There is a design picture for how Rust-based VMM with vhost-user-net will work will DPDK + ovs shown in the following issue link.

The following basic functionalities are added in this support:

  • Hugetlbfs handling
  • VMM as Client to connect dpdkvhostuser port provided by OVS + DPDK
  • VMM as Server to connect dpdkvhostuserclient port provided by OVS + DPDK
  • Multiple queue pairs support for vhost-user-net device

Fixes #8

@bjzhjing bjzhjing changed the title [WIP] Add vhost-user-net support Add vhost-user-net support Jul 25, 2019
@rbradford
Copy link
Member

Good start. Just a few little comments.

src/main.rs Outdated
)
.takes_value(true)
.min_values(1),
.min_values(0),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

error!("No control queue info !\n");
}
} else if index == _kill_event {
info!("vhost device removed");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this helpful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean info! here? It's not a necessary, but will be better for debug or notify the user, I think so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We don't use information messages anywhere else in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed!

vmm/Cargo.toml Outdated
libc = ">=0.2.39"
log = "*"
net_util = { path = "../net_util" }
nix = "0.14.1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to think carefully about adding a dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, add it to use unistd::mkstemp. I see nix has maintained actively in https://github.com/nix-rust/nix, 1686 commits, 182 contributors, and a start number of 967. it seems to be safe for reference. Any other concern?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulling in a big dependency for two functions and one constant seems bad.

Copy link
Member

@rbradford rbradford Jul 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the implementation of nix::mkstemp. I think you can use libc crate directly.

pub fn mkstemp<P: ?Sized + NixPath>(template: &P) -> Result<(RawFd, PathBuf)> {
    let mut path = try!(template.with_nix_path(|path| {path.to_bytes_with_nul().to_owned()}));
    let p = path.as_mut_ptr() as *mut _;
    let fd = unsafe { libc::mkstemp(p) };
    let last = path.pop(); // drop the trailing nul
    debug_assert!(last == Some(b'\0'));
    let pathname = OsString::from_vec(path);
    try!(Errno::result(fd));
    Ok((fd, PathBuf::from(pathname)))
}`


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, since this is a simple one, it could be replaced by calling libc directly. The change is done.

vmm/src/vm.rs Outdated
let fs = format!("{}{}", file.display(), "/tmpfile_XXXXXX");
println!("fs is {}", fs);
let fp = Path::new(&fs);
let fd = match unistd::mkstemp(fp) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you look at at alternative that we might already be using? tempfile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the following way?

OpenOptions::new()
.read(true)
.write(true)
.custom_flags(O_TMPFILE)
.open(pmem_cfg.file)

O_TMPFILE requires support by the underlying filesystem; only a subset of Linux filesystems provide that support, like ex2, ext3, ext4, UDF, Minix, and shmem filesystems. But it is not implemented for hugetlbfs. That's why I choose the other way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you access those functions though the libc crate directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto!

| 1 << VIRTIO_RING_F_EVENT_IDX
| 1 << VIRTIO_F_NOTIFY_ON_EMPTY
| 1 << VIRTIO_F_VERSION_1;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those avail_features should be gotten from vhost-user-net backend, and we do not need set here. vhost user device only check the features bits according to cmdline settings and decide which feature will be supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vring features together with virtio device features, both need to be negotiation between VMM and vhost-user backend, since it's possible that some features backend support but VMM not, negotiation will be the safe way to get it.

Copy link
Contributor

@yangzhon yangzhon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we squash this patch to commit 6927dd7 ? thanks

Yang

f.set_len(region.1 as u64)
.map_err(Error::SharedFileSetLen)?;

mem_regions.push((region.0, region.1, Some(FileOffset::new(f, 0))));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we squash this patch to commit 6927dd7 ? They are all for hugetlbfs handling feature changes, thanks

Copy link
Contributor

@yangzhon yangzhon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cathy, thanks for rebase, and i will rebase my patch based on your branch.

.map_err(Error::GuestMemory)?;
println!("ctrl virtqueue has queue_pairs: {}\n", _queue_pairs);
if (_queue_pairs < VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN)
|| (_queue_pairs > VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current handler.rs is for blk and net, please make VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN and VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX more general ?

.read_obj::<u8>(avail_desc.addr)
.map_err(Error::GuestMemory)?;
match _class {
VIRTIO_NET_CTRL_MQ => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for VIRTIO_NET_CTRL_MQ


pub const VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN: u16 = 1;
pub const VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX: u16 = 0x8000;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSI-X Table entry number, from 0 to 0x7FF, we can make those definition more general.

// found in the THIRD-PARTY file.
// Copyright 2019 Intel Corporation. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0 AND BSD-3-Clause

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Squash this changes to commit 3081c88, thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please squash :)

@bjzhjing
Copy link
Contributor Author

@rbradford
I run the integration test local well, although without cloudinit to assign ip for guest. While, it failed in the CI environment with the same error " PermissionDenied", do you know why this happens?

@bjzhjing
Copy link
Contributor Author

@sameo @sboeuf @chao-p

Please help review the code for vhost-user-net device implementation. Although there is a permission issue which prevent from running integration test in CI env, but it should not affect the review of code.

Here are things which are not covered in this PR, but will be required to review later.

  • working on: Implement Vhost-user-net backend with Tap interface for cloud-hypervisor:
    Server/client start (workable)
    Control (workable)
    Polling for vhost-user message handling
    Data (under development)
    Polling for virtqueue data,
    Virtqueue maintain
    Virtqueue notify

  • TODO: vhost-user reconnection (high-level design done)
    Socket watch:
    Virtqueue status report from VMM to guest driver
    Vhost-user-net init and activate: Virtqueue start/stop, save/restore, set_vring_base/set_vring_kick

@bjzhjing
Copy link
Contributor Author

bjzhjing commented Aug 15, 2019

@rbradford @sameo

I just notice one thing this time, there is one statement at the very beginning of the Console log of the integration test.
Pull request #69 updated 16:31:22 Connecting to https://api.github.com using cloud-hypervisor-bot/****** (cloud-hypervisor-bot (username + password)) Connecting to https://api.github.com to check permissions of obtain list of bjzhjing for intel/cloud-hypervisor Loading trusted files from base branch master at 658c076eb28338f042c65c40a605079d4953db8f rather than 5d503db469b4746395127f4266f85dc563227267 Obtained Jenkinsfile from 658c076eb28338f042c65c40a605079d4953db8f
Jenkins actually does not pick up my branch 5d503db, but run the master branch after a permission checking, that's why the integration test added in my branch is not running at all. BTW, I get back to see the previous log for failed build, yes, the same thing for the former test. Could you please help verify this?
https://cloud-hypervisor-jenkins.westus.cloudapp.azure.com/job/cloud-hypervisor/job/PR-69/33/console
https://cloud-hypervisor-jenkins.westus.cloudapp.azure.com/job/cloud-hypervisor/job/PR-69/31/consoleFull

Copy link
Member

@sboeuf sboeuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjzhjing Globally the PR looks good. I have a few comments regarding specifics about vhost-user-net, but my main concern is to make sure this PR does not bring duplication. That's why it should make sure to rework/move fs.rs code that could be shared with vhost-user-net so that we make sure the generic vhost-user parts are abstracted from each device implementation.

pub struct VhostUserNetConfig<'a> {
pub mac: MacAddr,
pub sock: &'a Path,
pub num_queue_pairs: usize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be better to use num_queues and ensure the number of queues is a multiple of 2, instead of naming this num_queue_pairs.
This way, we could create a sub structure VhostUserConfig that would be embedded in both VhostUserNetConfig and FsConfig. This structure would contain the common vhost-user fields:

pub struct VhostUserConfig {
    pub sock: &'a Path,
    pub num_queues: usize,
    pub queue_size: u16,
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I'd like to put the new struct into vm-virtio/src/vhost_user/mod, as one public struct of vhost_user, then, vmm/src/config could refer to it for use. This change will decrease the parameter number for a series of functions in config.rs, vm.rs and device implementation file. But I think it's no need to abstract the value parse process for those fields because of the type change and other fields involved.

src/main.rs Outdated
.min_values(1),
)
.arg(
Arg::with_name("vunet")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer vu-net instead of vunet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, change done.

src/main.rs Outdated
"Network parameters \"mac=<mac_addr>,\
sock=<socket_path>, num_queue_pairs=<number_of_queue_pairs>,\
queue_size=<size_of_each_queue>\"",
queue_size=<size_of_each_queue>, num_vectors=<number_of_vectors>\"",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we expect the user to provide the number of vectors since as you explain it in your commit message, you know how to compute it from the number of queues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it seems this option could be removed from the command line and config.rs, only need to compute it in vm.rs.

src/main.rs Outdated
sock=<socket_path>, num_queue_pairs=<number_of_queue_pairs>,\
queue_size=<size_of_each_queue>, num_vectors=<number_of_vectors>\"",
queue_size=<size_of_each_queue>, num_vectors=<number_of_vectors>,\
server=<is_server>\"",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear what server means in this case. There are two terminology in case of vhost-user, client/server and master/slave.
And the way it usually works, we have the vhost-user daemon as the server and the slave, while the VMM is the client and the master.
Can you elaborate about the use case where the VMM is used as a server?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Server here means the server in client/server. This option is added under considering of two points. On one hand, OVS has vhostuserclientport, add server mode for VMM to connect such type of port. On the other hand, I'm considering about the socket reconnection support. Although it is not actually added here, but to have VMM in server mode seems to be an easy way to fix reconnection issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see, but I consider this as already the step 2 in supporting vhost-user-net. Let's start with the standard use case by removing this server option for now. Once this PR gets merged, you will submit another one focusing exclusively on enabling server and emphasizing on the reasons behind it.
Let's not get too many things/features merged at once, as it's harder to keep track of what's being added.

vmm/Cargo.toml Outdated
libc = "0.2.61"
log = "0.4.8"
net_util = { path = "../net_util" }
nix = "0.14.1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please squash it with the initial commit introducing it?

Copy link
Contributor Author

@bjzhjing bjzhjing Aug 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

squash done!

return Err(Error::VhostUserProtocolNotSupport);
}

let max_queue_number =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please squash this commit directly into the one introducing this code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto!

// found in the THIRD-PARTY file.
// Copyright 2019 Intel Corporation. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0 AND BSD-3-Clause

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please squash :)

pub vu_interrupt_list: Vec<EventFd>,
}

impl VhostUserEpollHandler {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed to have a dedicated VhostUserEpollHandler, but please make sure it can be used from virtio-fs (fs.rs). I'd like to make sure we come up with some clean factorization instead of duplicating with existing code.


// Get features from backend, do negotiation to get a feature collection which
// both VMM and backend can support.
let backend_features = vhost_user_net.get_features().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole VIRTIO_FEATURES and PROTOCOL_FEATURES negotiation should be factorized between vhost-user implementations. Both virtio-fs and vhost-user-net should rely on the same function to perform features negotiation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sboeuf it seems that @bjzhjing implementation is generic enough for virtio-fs to use it. Here vhost_user_net is a vu master, which implements the VhostBackend trait. And that's where the generic feature negociation implementation lives.

})
}

pub fn setup_vunet(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this function should not be specific to vhost-user-net. Unless I'm missing something, it should be factorized at the module level so that all submodules can benefit from it.
This means fs.rs needs to be moved from vm-virtio/src/ to vm-virtio/src/vhost_user/ as a prerequisite.


VUBRIDGE="$WORKLOADS_DIR/vubridge"
VUBRIDGE_DIR="qemu_build"
if [ ! -f "$VUBRIDGE" ]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to build qemu twice!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To build qemu vhost-user-bridge together with virtiofsd?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjzhjing yes please make sure you reuse what's being done for virtio-fs.

src/main.rs Outdated
let mut daemon_child = Command::new(vubridge_path.as_str()).spawn().unwrap();

let mut cloud_child =
Command::new("/root/cloud-hypervisor/target/debug/cloud-hypervisor")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why you're getting permission denied. on the CI.

After fixing this I see an error like this:

rob@rbradford-test:~/cloud-hypervisor$ cargo test --features "integration_tests" -- test_vunet
    Finished dev [unoptimized + debuginfo] target(s) in 0.05s
     Running target/debug/deps/cloud_hypervisor-f60d64920772e0c1

running 1 test
ud socket: /tmp/vubr.sock (server)
local:     127.0.0.1:4444
remote:    127.0.0.1:5555
Waiting for connections on UNIX socket /tmp/vubr.sock ...
Added sock 3 for watching. max_sock: 3
bind: Address already in use
Cloud Hypervisor Guest
	vCPUs: 4
	Memory: 512 MB
	Kernel: "/home/rob/workloads/hypervisor-fw"
	Kernel cmdline: 
	Disk(s): Some([DiskConfig { path: "/tmp/ch.3Sb9u4S7Nuan/osdisk.img" }, DiskConfig { path: "/tmp/ch.3Sb9u4S7Nuan/cloudinit" }])
Guest boot failed: Can not create a new virtual machine: DeviceManager(CreateVhostUserNet(VhostUserCreateMaster(VhostUserProtocol(SocketConnect(Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })))))
test tests::test_vunet ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 20 filtered out

But your test still passes?

src/main.rs Outdated
thread::sleep(std::time::Duration::new(20, 0));
let _ = daemon_child.kill();
let _ = daemon_child.wait();
thread::sleep(std::time::Duration::new(10, 0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to test the network itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjzhjing Here you're only launching the VMM, then the vhost-user bridge, but you're not testing connectivity overall.
You should be able to at least ping the guest from the host once those 2 processes are running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'd like to issue "ifconfig" with ssh_command to get TX package statistic data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Please make sure a simple ping works, which shows at least that the connection is functional.

Copy link
Contributor Author

@bjzhjing bjzhjing Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test with the same command "ip -o link | wc -l" test_multiple_network_interfaces use, to ensure there are 3 interfaces, default net + vhost_user_net interface+ local.

@bjzhjing
Copy link
Contributor Author

bjzhjing commented Aug 22, 2019

@sboeuf
Commit 898cb65 and c5e33c4 could be moved to a separate PR which is added for vhost-user fs integration.

The other changes have been integrated into the original commits while doing rebase.

There is only one place mentioned in your comment not addressed, that is the queue pairs parameter in command line. Since qemu has the same usage, and considering users' habits, I do not change it. But the new structure VhostUserConfig transfer from VMM to vm-virtio does has num_queues.

@bjzhjing
Copy link
Contributor Author

@rbradford
I'm sure the integration test running well this time, since data like the following is printed as expected which are network package while vhost-user-net has data transfer with qemu vhost-user-bridge backend.
TX:: 0120: 00 00 63 82 53 63 35 01 01 3d 13 ff 80 c0 db 71 ..c.Sc5..=.....q
TX:: 0130: 00 02 00 00 ab 11 7b 90 9b e5 0b 98 65 00 37 09 ......{.....e.7.
TX:: 0140: 01 03 0c 0f 06 1a 21 79 2a 39 02 02 40 0c 05 63 ......!y*9..@..c
TX:: 0150: 6c 6f 75 64 ff loud.

@sameo
Copy link
Member

sameo commented Aug 26, 2019

There is only one place mentioned in your comment not addressed, that is the queue pairs parameter in command line. Since qemu has the same usage, and considering users' habits, I do not change it. But the new structure VhostUserConfig transfer from VMM to vm-virtio does has num_queues.

I think we need to be consistent with the virtio-fs command line parameters, which takes a number of queues.

src/main.rs Outdated
}

#[test]
fn test_vunet() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_vhost_user_net, please, to make it readable and clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change done!

src/main.rs Outdated
thread::sleep(std::time::Duration::new(20, 0));
let _ = daemon_child.kill();
let _ = daemon_child.wait();
thread::sleep(std::time::Duration::new(10, 0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjzhjing Here you're only launching the VMM, then the vhost-user bridge, but you're not testing connectivity overall.
You should be able to at least ping the guest from the host once those 2 processes are running.

/// Set vring enable failed.
VhostUserSetVringEnable(VhostError),
/// Set features failed.
VhostUserNetSetFeatures(VhostError),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will that build when the vhost-user-master feature is disabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, if neither vhost-user-master nor vhost-user-slave is enabled, these two messages should not be built, they are wrapper of VhostUserProtocol(vhost_user::Error) in vhost_rs::Error, which has attribute #[cfg(any(feature = "vhost-user-master", feature = "vhost-user-slave"))].

Copy link
Contributor Author

@bjzhjing bjzhjing Aug 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's a good question, I'm considering if we need to have vhost-user devices as features for vm-virtio, and add feature config for the cases like the messages you mentioned above, also like vsock config in firecracker. What's your suggestion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like those error types should be part of vhost-user/mod.rs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error types are vhost_rs::Error. But it doesn't matter, they are removed from lib.rs, new function added in vu_common_ctrl to handle the logic.

@@ -0,0 +1,259 @@
// Copyright 2019 Intel Corporation. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0 AND BSD-3-Clause
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that BSD-3-Clause licensed? Does that contain some crosvm code? If so we should keep their copyright as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comes from the Firecracker code, which includes the following copyrights:

// Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Portions Copyright 2017 The Chromium OS Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the THIRD-PARTY file.

We should keep it, change the SPDX identifier and add our copyright.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, handler.rs is originally from firecracker, added their copyright.

}

impl VhostUserEpollHandler {
/// Construct a new, empty event handler for vhost-based devices.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Construct a new, empty event handler for vhost-based devices.
/// Construct a new, empty event handler for vhost-user based devices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments update done!

/// * `interrupt_cdb` EventFd for signaling an interrupt that the guest
/// driver is listening to
/// * `kill_evt` - EventFd used to kill the vhost-user-net device
/// * `queues` - queues as sharing memory between master and slave
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the documented arguments match the actual code, please fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto!

/// * `queues` - queues as sharing memory between master and slave
pub fn new(
vu_epoll_cfg: VhostUserEpollConfig,
cvq: Option<CtlVirtqueue>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cvq: Option<CtlVirtqueue>,
ctrl_virt_queue: Option<CtlVirtqueue>,

cvq makes the rest of the code harder to follow and maintain. Let's be more verbose about our variable names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update done, and control queue handling is removed from this PR, will be handled in new PR.

pub fn new(
vu_epoll_cfg: VhostUserEpollConfig,
cvq: Option<CtlVirtqueue>,
mem: Option<Arc<RwLock<GuestMemoryMmap>>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mem: Option<Arc<RwLock<GuestMemoryMmap>>>,
memory: Option<Arc<RwLock<GuestMemoryMmap>>>,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto!

pub vu_epoll_cfg: VhostUserEpollConfig,
pub cvq: Option<CtlVirtqueue>,
pub mem: Option<Arc<RwLock<GuestMemoryMmap>>>,
pub slave_req_handler: Option<MasterReqHandler<dyn VhostUserMasterReqHandler>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please squash that commit into your initial vhost-user-net commit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto!

sameo
sameo previously approved these changes Aug 27, 2019
pub use self::vu_common_ctrl::VhostUserConfig;

pub const VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN: u16 = 1;
pub const VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX: u16 = 0x8000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this as they're already defined in the virtio-bindings crate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes because this is a virtio feature, not a vhost-user one. This should be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use virtio-bindings instead, also move this to new PR.


// Get features from backend, do negotiation to get a feature collection which
// both VMM and backend can support.
let backend_features = vhost_user_net.get_features().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sboeuf it seems that @bjzhjing implementation is generic enough for virtio-fs to use it. Here vhost_user_net is a vu master, which implements the VhostBackend trait. And that's where the generic feature negociation implementation lives.

warn!("Cannot acknowledge unknown features page: {}", page);
0u64
}
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not asking for any change, this is just a general comment here, unrelated to this PR: There's a lot of duplicated code between all virtio devices for the virtio features handling. Essentially, all devices implement the features and ack_features method in the same way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's right, I've just opened #214 to track this.

@sameo sameo dismissed their stale review August 27, 2019 01:03

meant to ask for changes instead...

Copy link
Member

@sboeuf sboeuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two main comments.

  • Let's keep it simple and merge a first version that works without all the fancy features. Both server and multiqueue can be submitted as follow up PRs once this one is merged.
  • Thanks for doing the factorization so that virtio-fs can share the same code, I think this is gonna work. The problem is that a bunch of things part of the handler are not specific to vhost-user and I feel we could go one step further into factorizing at the virtio level too. But as @sameo mentioned, this is an orthogonal task that we'll need to figure out later.


VUBRIDGE="$WORKLOADS_DIR/vubridge"
VUBRIDGE_DIR="qemu_build"
if [ ! -f "$VUBRIDGE" ]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjzhjing yes please make sure you reuse what's being done for virtio-fs.

src/main.rs Outdated
.long("vu-net")
.help(
"Network parameters \"mac=<mac_addr>,\
sock=<socket_path>, num_queue_pairs=<number_of_queue_pairs>,\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change num_queue_pairs into num_queues.

src/main.rs Outdated
sock=<socket_path>, num_queue_pairs=<number_of_queue_pairs>,\
queue_size=<size_of_each_queue>, num_vectors=<number_of_vectors>\"",
queue_size=<size_of_each_queue>, num_vectors=<number_of_vectors>,\
server=<is_server>\"",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see, but I consider this as already the step 2 in supporting vhost-user-net. Let's start with the standard use case by removing this server option for now. Once this PR gets merged, you will submit another one focusing exclusively on enabling server and emphasizing on the reasons behind it.
Let's not get too many things/features merged at once, as it's harder to keep track of what's being added.

src/main.rs Outdated
.min_values(1),
)
.arg(
Arg::with_name("vu-net")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sameo what do you think about the naming here? I'd prefer vhost-user-net for clarity, but at the same it's a bit long, WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By far I prefer --vhost-user-net. It's long but that's not something one would type all the time anyway, you usually script it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change done! use --vhost-user-net instead!

src/main.rs Outdated
thread::sleep(std::time::Duration::new(20, 0));
let _ = daemon_child.kill();
let _ = daemon_child.wait();
thread::sleep(std::time::Duration::new(10, 0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Please make sure a simple ping works, which shows at least that the connection is functional.

Ok(())
}

pub fn run(&mut self) -> Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sameo @rbradford @bjzhjing
So if we talk about factorizing so that other vhost-user implementations like virtio-fs can reuse it, this will work. But if we think from a global picture level, 70% of this handler function will be identical across all virtio/vhost-user device implementations. I think we need to define some sort of generic one that can be extended/modified in the context of vhost-user. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. We should open an issue about that and handle this factorization work separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated #214 issue to include the epoll handler factorization too.

pub use self::vu_common_ctrl::VhostUserConfig;

pub const VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MIN: u16 = 1;
pub const VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX: u16 = 0x8000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes because this is a virtio feature, not a vhost-user one. This should be removed.

if let Some(avail_desc) = cvq.queue.iter(&mem).next() {
used_desc_heads[used_count] = (avail_desc.index, avail_desc.len);
used_count += 1;
let _class = mem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace _class with class.

} else {
return Err(Error::NoQueuePairsNum);
};
let _queue_pairs = mem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace _queue_pairs with queue_pairs.

let mut path = fs.as_bytes_with_nul().to_owned();
let path_ptr = path.as_mut_ptr() as *mut _;
let fd = unsafe { libc::mkstemp(path_ptr) };
unsafe { libc::unlink(path_ptr) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is the appropriate way to create temporary files with hugetlbfs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, refer to QEMU. What's your concern?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern was about the unsafe blocks this patch introduces.

@sameo
Copy link
Member

sameo commented Aug 28, 2019

  • Let's keep it simple and merge a first version that works without all the fancy features. Both server and multiqueue can be submitted as follow up PRs once this one is merged.

I agree with that. This would allow us to quickly merge that PR.

The currently directory handling process to open tempfile by
OpenOptions with custom_flags(O_TMPFILE) is workable for tmp
filesystem, but not workable for hugetlbfs, add new directory
handling process which works fine for both tmpfs and hugetlbfs.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
vhost-user framwork could provide good performance in data intensive
scenario due to the memory sharing mechanism. Implement vhost-user-net
device to get the benefit for Rust-based VMMs network.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Update vm configuration and device initial process to add
vhost-user-net support.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
@bjzhjing
Copy link
Contributor Author

@sameo @sboeuf @rbradford
According to the discussion above, this PR will only cover the basic logic for vhost-user-net device implementations, features like multiple queues and server will be moved to new PRs.

Copy link
Member

@sboeuf sboeuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjzhjing thank you for the quick rework of this PR. Except the test which needs to be extended for proper testing, the code is ready to be merged IMO.

let mut path = fs.as_bytes_with_nul().to_owned();
let path_ptr = path.as_mut_ptr() as *mut _;
let fd = unsafe { libc::mkstemp(path_ptr) };
unsafe { libc::unlink(path_ptr) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern was about the unsafe blocks this patch introduces.

Ok(())
}

pub fn run(&mut self) -> Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated #214 issue to include the epoll handler factorization too.

src/main.rs Outdated

thread::sleep(std::time::Duration::new(5, 0));
let _ = daemon_child.kill();
let _ = daemon_child.wait();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you killing the daemon process before to check the network interfaces?
Also, could you add some end to end test here?
The idea is to validate the network connection between the daemon and the guest is functional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While guest connects to the socket provided by the daemon process, it will keep printing the transfer network package content, I really don't want to make the integration console log full of it, once ensure it works normally, just kill it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean to add additional tests for connection between daemon process (vhost-user-bridge) and guest? Hmm, so far, from the implementation of vhost-user-bridge, my understanding is it's just created to test vhost-user connection. What type of functional tests you'd like to see?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjzhjing @sameo @rbradford
Ok so I spent today going through testing of vhost-user-net. I found out several ways we can test end to end network using your implementation + vubridge daemon. Please take a look at the document I wrote about this. The idea is to really test we have network packets flowing properly through our vhost-user-net implementation.
I wanted to go and implement one of the two methods I explain in the document, but I realized we needed netcat (nc binary) to perform all this testing. Unfortunately, nc is not part of the CL cloud image. We need to properly install nc through swupd and then push this updated image to our cloud-hypervisor storage account on Azure.

I would suggest that we can merge the PR for now with this basic testing (making sure the amount of network interfaces being created is as expected).
In the meantime, @rbradford could you take care of the image update? If you don't have time for this, please let me know.
Once the image will be updated, I'll work on extending the test.

@bjzhjing one last thing, I would really prefer if you could kill the daemon_child after you verified the amount of network interfaces.

@yangzhon
Copy link
Contributor

  • Let's keep it simple and merge a first version that works without all the fancy features. Both server and multiqueue can be submitted as follow up PRs once this one is merged.

I agree with that. This would allow us to quickly merge that PR.

@sameo , that's good. Since i am debugging multi-queue issue in user blk, i will rebase my patches on this and send normal PR(support one queue) , thanks.

@sboeuf
Copy link
Member

sboeuf commented Aug 29, 2019

@yangzhon

i will rebase my patches on this and send normal PR(support one queue) , thanks.

That's great, this way both vhost-user-net and block will be merged in CH soon :)

VIRTIOFSD="$WORKLOADS_DIR/virtiofsd"
VUBRIDGE="$WORKLOADS_DIR/vubridge"
QEMU_DIR="qemu_build"
if [ ! -f "$VIRTIOFSD" ]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if [ ! -f "$VIRTIOFSD" ]; then
if [ ! -f "$VIRTIOFSD" || ! -f "$VUBRIDGE" ]; then

We need to make sure to rebuild if one of the two is missing.

src/main.rs Outdated

thread::sleep(std::time::Duration::new(5, 0));
let _ = daemon_child.kill();
let _ = daemon_child.wait();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjzhjing @sameo @rbradford
Ok so I spent today going through testing of vhost-user-net. I found out several ways we can test end to end network using your implementation + vubridge daemon. Please take a look at the document I wrote about this. The idea is to really test we have network packets flowing properly through our vhost-user-net implementation.
I wanted to go and implement one of the two methods I explain in the document, but I realized we needed netcat (nc binary) to perform all this testing. Unfortunately, nc is not part of the CL cloud image. We need to properly install nc through swupd and then push this updated image to our cloud-hypervisor storage account on Azure.

I would suggest that we can merge the PR for now with this basic testing (making sure the amount of network interfaces being created is as expected).
In the meantime, @rbradford could you take care of the image update? If you don't have time for this, please let me know.
Once the image will be updated, I'll work on extending the test.

@bjzhjing one last thing, I would really prefer if you could kill the daemon_child after you verified the amount of network interfaces.

Use qemu/tests/vhost-user-bridge as the backend for integration
test for vhost-user-net.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Copy link
Member

@sboeuf sboeuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my previous comment, I think this PR is ready to be merged!

@sboeuf
Copy link
Member

sboeuf commented Aug 30, 2019

@rbradford please review the PR, and take a look at #69 (comment)

@bjzhjing
Copy link
Contributor Author

@sboeuf Awesome! Thanks for providing this doc to help fully test the implementation! Have you ever tried them in local environment?

@sameo
Copy link
Member

sameo commented Aug 30, 2019

Based on my previous comment, I think this PR is ready to be merged!

I agree.
We have roughly one full week left before 0.2: @bjzhjing I'd appreciate if you could use that time to implement that improved integration test (Example #2 from @sboeuf document would be enough).

@bjzhjing
Copy link
Contributor Author

@sameo Sure, I will. I'd like to make double sure for the correctness :-)

@bjzhjing
Copy link
Contributor Author

bjzhjing commented Aug 30, 2019

@sameo @sboeuf Run test Communication between host and guest in my local environment today, the data sent out from host could be received by guest, I update document with details.

@rbradford
Copy link
Member

@sboeuf @bjzhjing Use Ubuntu in the integration test as it has "nc" already installed.

Copy link
Member

@rbradford rbradford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write a document about how to use with DPDK + OVS etc.

@rbradford rbradford merged commit cc7a96e into cloud-hypervisor:master Aug 30, 2019
@bjzhjing bjzhjing deleted the vhost-user-net branch September 2, 2019 07:00
@bjzhjing
Copy link
Contributor Author

Please write a document about how to use with DPDK + OVS etc.

@rbradford Actually, to setup environment with DPDK + OVS for vhost-user-net test is basically following the instructions from link https://wiki.qemu.org/Documentation/vhost-user-ovs-dpdk, just use cloud-hypervisor instead of qemu to start VMs, while I also create scripts to start the OVS service. So I assume it's not necessary to add a new page in cloud-hypervisor to describe the steps again, let me know if you still think it's better to do.
I run iperf for some simple performance test for vhost-user-net in DPDK + OVS env, and get a comparison result to qemu. I'm doing some analysis on that, will paste it to #369 later.

@zhangjaycee
Copy link
Contributor

@sameo @sboeuf @chao-p

Please help review the code for vhost-user-net device implementation. Although there is a permission issue which prevent from running integration test in CI env, but it should not affect the review of code.

Here are things which are not covered in this PR, but will be required to review later.

  • working on: Implement Vhost-user-net backend with Tap interface for cloud-hypervisor:
    Server/client start (workable)
    Control (workable)
    Polling for vhost-user message handling
    Data (under development)
    Polling for virtqueue data,
    Virtqueue maintain
    Virtqueue notify
  • TODO: vhost-user reconnection (high-level design done)
    Socket watch:
    Virtqueue status report from VMM to guest driver
    Vhost-user-net init and activate: Virtqueue start/stop, save/restore, set_vring_base/set_vring_kick

Hi. I wonder what is the current status of socket and vhost-user reconnection? Is it being developed?

@sboeuf
Copy link
Member

sboeuf commented Feb 26, 2021

@zhangjaycee

Hi. I wonder what is the current status of socket and vhost-user reconnection? Is it being developed?

No these features are not under development.

@zhangjaycee
Copy link
Contributor

@zhangjaycee

Hi. I wonder what is the current status of socket and vhost-user reconnection? Is it being developed?

No these features are not under development.

Thanks for replying! Is there a timetable to implement it, or can you share the details of the high-level design?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for vhost-user-net

6 participants