Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

Firecracker Add jailer support for firecracker#1649

Merged
GabyCT merged 2 commits intokata-containers:masterfrom
mcastelino:topic/jail
Jul 12, 2019
Merged

Firecracker Add jailer support for firecracker#1649
GabyCT merged 2 commits intokata-containers:masterfrom
mcastelino:topic/jail

Conversation

@mcastelino
Copy link
Copy Markdown
Contributor

@mcastelino mcastelino commented May 9, 2019

Firecracker provides a jailer to constrain the VMM. Use this
jailer to launch the firecracker VMM instead of launching it
directly from the kata-runtime.

The jailer will ensure that the firecracker VMM will run
in its own network and mount namespace. All assets required
by the VMM have to be present within these namespaces.
The assets need to be copied or bind mounted into the chroot
location setup by jailer in order for firecracker to access
these resouces. This includes files, device nodes and all
other assets.

Jailer automatically sets up the jail to have access to
kvm and vhost-vsock.

If a jailer is not available (i.e. not setup in the toml)
for a given hypervisor the runtime will act as the jailer.

Also enhance the hypervisor interface and unit tests to
include the network namespace. This allows the hypervisor
to choose how and where to lauch the VMM process, vs
virtcontainers directly launching the VMM process.

Fixes: #1129

Signed-off-by: Manohar Castelino manohar.r.castelino@intel.com

@mcastelino mcastelino requested a review from egernst May 9, 2019 02:20
@mcastelino mcastelino force-pushed the topic/jail branch 2 times, most recently from af428a3 to 0855fe4 Compare May 10, 2019 23:22
@mcastelino
Copy link
Copy Markdown
Contributor Author

Cannot be merged unless jailer bug is fixed firecracker-microvm/firecracker#1089

@mcastelino mcastelino force-pushed the topic/jail branch 2 times, most recently from eddafc1 to 4d165b4 Compare May 10, 2019 23:32
@mcastelino
Copy link
Copy Markdown
Contributor Author

We also need to fix kata network namespace logic to be able to merge this PR #1664

if fc.jailed {
args = []string{
"--id", fc.id,
"--node", "0", //FIXME: Comprehend NUMA topology or explicit ignore
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, don't set it unless the local allocator (kubelet etc.) tells us which NUMA node to use?

Copy link
Copy Markdown
Contributor Author

@mcastelino mcastelino May 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bergwolf agree. I need to send a PR to jailer to allow it to setup the jail w/o the NUMA node chosen. Today it makes it mandatory. If I am able to get the firecracker community to accept that we can remove this limitation. So for now we can always set the NUMA node to 0.

Also I need to work with the firecracker community to delegate the cgroup setup (or allow the cgroup parent) to be chosen by the caller.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Let's have it for now then.

@mcastelino
Copy link
Copy Markdown
Contributor Author

If firecracker accepts firecracker-microvm/firecracker#1093 we can support jailer in Kata.

@devimc @amshinde I had the change the bind mount setup in Kata from private to slave in order for things to work correctly. So we need to change our bindmount logic to choose the right type of bind mount on a case by case basis vs always choosing private.

See 8e2ce77
I will need to fix this more cleanly in kata or create my own bindmount routine that does right for jailer.

@mcastelino mcastelino changed the title [WIP][DNM]: Firecracker Add jailer support for firecracker (another try) Firecracker Add jailer support for firecracker May 14, 2019
@mcastelino mcastelino requested a review from amshinde May 14, 2019 02:06
@devimc
Copy link
Copy Markdown

devimc commented May 14, 2019

@mcastelino

@devimc @amshinde I had the change the bind mount setup in Kata from private to slave in order for things to work correctly. So we need to change our bindmount logic to choose the right type of bind mount on a case by case basis vs always choosing private.

Currently, that's not possible, because of 9p. When overlay is used as storage driver, kata runtime creates a new bind mount point to the merged directory, that way this directory can be shared with the VM through 9p. By default the mount propagation is shared, that means mount events are propagated, but umount events not, to deal with this problem and to avoid left mount points in the host once container finishes, the mount propagation of bind mounts should be set to private.

@mcastelino if you change the mount propagation to slave right now, you will re-introduce a bug. We can switch to slave, once we support virtio-fs and deprecate 9p

@mcastelino does it make sense?

cc @egernst @amshinde

@mcastelino mcastelino force-pushed the topic/jail branch 2 times, most recently from 3fa0258 to 73fee6c Compare May 14, 2019 14:51
@mcastelino
Copy link
Copy Markdown
Contributor Author

By default the mount propagation is shared, that means mount events are propagated, but umount events not, to deal with this problem and to avoid left mount points in the host once container finishes, the mount propagation of bind mounts should be set to private.

@devimc when you say unmount are not propagated, do you mean umount from within the pivot_root/chroot? In the case of firecracker all the mounts and umount happen from outside the pivot_root. So the setup and cleanup is done by kata.

When you say mounts are left being, is this the mounts done by docker as part of docker cp or the mounts created by Kata?

I am trying to understand why I am not seeing the issue when I use SLAVE.

@devimc
Copy link
Copy Markdown

devimc commented May 15, 2019

@mcastelino

@devimc when you say unmount are not propagated, do you mean umount from within the pivot_root/chroot?

no, when the mount propagation is slave/shared, the mount point created outside the pivot_root are propagated (visible within it), but umount event not, because we are creating a bind mount to the merged (overlay) directory. Once we support virtio-fs and deprecated 9p, won't be needed to create a bind mount point, since the fuse daemon will be able to monitor the merged directory directly.

When you say mounts are left being, is this the mounts done by docker as part of docker cp or the mounts created by Kata?

yes, for a strange reason that I still don't understand, docker cp remounts all the volumes, once it finishes, those mounts are umounted by docker, unfortunately for us, remounts are propagated within the bind mount that we create, but umount events are not propagated, that way the mounts are left behind causing bugs.

@mcastelino
Copy link
Copy Markdown
Contributor Author

/test

@raravena80
Copy link
Copy Markdown
Member

@mcastelino any updates? Thx!

@mcastelino
Copy link
Copy Markdown
Contributor Author

@mcastelino any updates? Thx!

@raravena80 this is actually ready for merge. Not sure what is happening here with the CI.

The change needed for firecracker has been merged upstream firecracker-microvm/firecracker#1093

However I do not think there is a release with this bugfix. Maybe that is why the CI is failing.

So I think we will need to first update firecracker to use a release with this bugfix and then merge this PR.

// do some bookkeeping:
// * evaluate all symlinks
// * ensure the source exists
// * recursively create the destination
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a note to specify that you are adding this to override the bind-mount behaviour in Kata to use a slave mount instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amshinde you are right. This is only for firecracker specific. Also it does not impact firecracker as it does not use 9p today.

@amshinde
Copy link
Copy Markdown
Member

amshinde commented Jun 4, 2019

Blocked on firecracker next release.

@caoruidong
Copy link
Copy Markdown
Member

ping @mcastelino

@mcastelino
Copy link
Copy Markdown
Contributor Author

/test

@mcastelino
Copy link
Copy Markdown
Contributor Author

/test

Copy link
Copy Markdown

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mcastelino.

lgtm

Copy link
Copy Markdown
Member

@jcvenegas jcvenegas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a few nits that we can fix in a follow-up PR.

# Firecracker binary name
FCCMD := firecracker
# Firecracker's jailer binary name
FCJAILERCMD := jailer
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This looks like a generic option ... probably could be part of the toplevel makefile?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about that. But as of now we do not really have a real jailer for any other Hypervisor. The fc jailer if generalized may be able to support other hypervisors. But looking at the feature set I think we can even use runc as a jailer.

// Also jailer based on the id implicitly sets up cgroups under
// <cgroups_base>/<exec_file_name>/<id>/
hypervisorName := filepath.Base(hypervisorConfig.HypervisorPath)
fc.chrootBaseDir = "/var/lib/vc" //store.ConfigStoragePath //This mount has exec perms
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be used with the general global variable

const StoragePathSuffix = "vc"

Commented code?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix this.

// So we need to repopulate this at startSandbox where it is valid
fc.netNSPath = networkNS.NetNsPath

// Till we create lower privileged kata user run as root
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing issue to reference here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add an issue.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var networkNS NetworkNamespace
if err := fc.store.Load(store.Network, &networkNS); err == nil {
if networkNS.NetNsPath == "" {
fc.Logger().WithField("NETWORK NAMESPACE NULL", networkNS).Error()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return ? error or just warn?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

"--node", "0", //FIXME: Comprehend NUMA topology or explicit ignore
"--seccomp-level", "2",
"--exec-file", fc.config.HypervisorPath,
"--uid", "0", //hack: we need to setup user and group for kata
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a issue for this?

args = []string{
"--id", fc.id,
"--node", "0", //FIXME: Comprehend NUMA topology or explicit ignore
"--seccomp-level", "2",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

magic number? maybe a comment for future reference?

_, err := fc.client().Operations.PutGuestBootSource(bootSrcParams)
return err
_, err = fc.client().Operations.PutGuestBootSource(bootSrcParams)
if err != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nil: this still could be return err remove the return nil below

hostPath := filepath.Join(fc.jailerRoot, jailedPath)
err := syscall.Unmount(hostPath, syscall.MNT_DETACH)
if err != nil {
fc.Logger().WithField("umountResource failed", err).Info()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warn?

Copy link
Copy Markdown
Contributor Author

@mcastelino mcastelino Jul 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently is info as I know some of these will fail on second go around. There is a comment in the caller but not here. Let me add another comment here.

return nil
}

func (fc *firecracker) umountResource(jailedPath string) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about return an err ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do. The error can be ignored by caller.

}

fc.Logger().WithField("cleaningJail", fc.vmPath).Error()
if err := os.RemoveAll(fc.vmPath); err != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit :if vmPath contains fc.id just in case :P

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand a bit on this comment. Not sure I understood.

@mcastelino
Copy link
Copy Markdown
Contributor Author

/test

@codecov
Copy link
Copy Markdown

codecov bot commented Jul 9, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@5e67e04). Click here to learn what that means.
The diff coverage is 3.4%.

@@           Coverage Diff            @@
##             master   #1649   +/-   ##
========================================
  Coverage          ?   52.5%           
========================================
  Files             ?     108           
  Lines             ?   13951           
  Branches          ?       0           
========================================
  Hits              ?    7325           
  Misses            ?    5756           
  Partials          ?     870

@jcvenegas
Copy link
Copy Markdown
Member

rhel:
Job fail seems not related @grahamwhaley @GabyCT heads up... restarting

bats integration/entropy/entropy_time.bats
1..1
not ok 1 measured time for /dev/random
# (in test file integration/entropy/entropy_time.bats, line 29)
#   `[ "$measured_time" -le "$expected_time" ]' failed
# 
# 
make: *** [entropy] Error 1
Failed at 40: sudo -E PATH="$PATH" bash -c "make test"

@mcastelino
Copy link
Copy Markdown
Contributor Author

Also the centos job is failing due to vsock kernel module being absent

Stderr: docker: Error response from daemon: OCI runtime create failed: [PUT /actions][400] createSyncActionBadRequest  &{FaultMessage:Cannot create vsock device. VhostOpen(VhostOpen(Os { code: 19, kind: Other, message: No such device }))}: unknown.

@grahamwhaley
Copy link
Copy Markdown
Contributor

rhel:
Job fail seems not related @grahamwhaley @GabyCT heads up... restarting

bats integration/entropy/entropy_time.bats
1..1
not ok 1 measured time for /dev/random
# (in test file integration/entropy/entropy_time.bats, line 29)
#   `[ "$measured_time" -le "$expected_time" ]' failed
# 
# 
make: *** [entropy] Error 1
Failed at 40: sudo -E PATH="$PATH" bash -c "make test"

thanks @jcvenegas and, :irony: thanks :/irony: bats, for not giving us any useful actual details of the values of the comparison that failed :-(
iirc, I asked/noted at the time that it would be really helpful if that entropy time test could print out the values upon failure to aid debug - but - I'm not sure if that is something easily done with BATS? - can you look into that @GabyCT ?

@mcastelino
Copy link
Copy Markdown
Contributor Author

mcastelino commented Jul 10, 2019

@GabyCT the FC CI is passing on ubuntu for me by following the steps as below.

export KATA_DEV_MODE="false"
export KATA_HYPERVISOR="firecracker"
export CI="true"
export CI_JOB="FIRECRACKER"
export RUNTIME="kata-runtime"
export PATH=${GOPATH}/bin:${PATH}
DEBUG=true .ci/setup.sh
sudo -E PATH=$PATH make docker
root@kata:/etc/docker# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04 LTS"
root@kata:/etc/docker# cd
root@kata:~# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04 LTS"
root@kata:~# uname -r
4.15.0-54-generic

@GabyCT
Copy link
Copy Markdown
Contributor

GabyCT commented Jul 10, 2019

/test-fc

@mcastelino
Copy link
Copy Markdown
Contributor Author

The tests need update to allow the jailed version of firecracker to be detected

kata-containers/tests#1803

GabyCT added a commit to GabyCT/tests-1 that referenced this pull request Jul 11, 2019
Currently we have issues while trying to run the soak test on
firecracker (see kata-containers#1803).
We need to remove it so the change kata-containers/runtime#1649
can be merged.

Fixes kata-containers#1805

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Firecracker provides a jailer to constrain the VMM. Use this
jailer to launch the firecracker VMM instead of launching it
directly from the kata-runtime.

The jailer will ensure that the firecracker VMM will run
in its own network and mount namespace. All assets required
by the VMM have to be present within these namespaces.
The assets need to be copied or bind mounted into the chroot
location setup by jailer in order for firecracker to access
these resouces. This includes files, device nodes and all
other assets.

Jailer automatically sets up the jail to have access to
kvm and vhost-vsock.

If a jailer is not available (i.e. not setup in the toml)
for a given hypervisor the runtime will act as the jailer.

Also enhance the hypervisor interface and unit tests to
include the network namespace. This allows the hypervisor
to choose how and where to lauch the VMM process, vs
virtcontainers directly launching the VMM process.

Fixes: kata-containers#1129

Signed-off-by: Manohar Castelino <manohar.r.castelino@intel.com>
Add jailer support to configuration files.
Also enable jailer by default in Kata containers.

Signed-off-by: Manohar Castelino <manohar.r.castelino@intel.com>
@mcastelino
Copy link
Copy Markdown
Contributor Author

/test

@GabyCT
Copy link
Copy Markdown
Contributor

GabyCT commented Jul 12, 2019

@mcastelino so it seems that there is an issue with initrd

@GabyCT
Copy link
Copy Markdown
Contributor

GabyCT commented Jul 12, 2019

@mcastelino , the issue with initrd seems not related with this PR as I am seeing this failure in other jobs.

@GabyCT GabyCT merged commit bc15e44 into kata-containers:master Jul 12, 2019
@raravena80
Copy link
Copy Markdown
Member

🎉 🎈

GabyCT added a commit to GabyCT/tests-1 that referenced this pull request Jul 17, 2019
Currently we have issues while trying to run the soak test on
firecracker (see kata-containers#1803).
We need to remove it so the change kata-containers/runtime#1649
can be merged.

Fixes kata-containers#1805

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
GabyCT added a commit to GabyCT/tests-1 that referenced this pull request Jul 17, 2019
Currently we have issues while trying to run the soak test on
firecracker (see kata-containers#1803).
We need to remove it so the change kata-containers/runtime#1649
can be merged.

Fixes kata-containers#1805

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add jailer support for firecracker