Firecracker Add jailer support for firecracker#1649
Firecracker Add jailer support for firecracker#1649GabyCT merged 2 commits intokata-containers:masterfrom
Conversation
af428a3 to
0855fe4
Compare
|
Cannot be merged unless jailer bug is fixed firecracker-microvm/firecracker#1089 |
eddafc1 to
4d165b4
Compare
|
We also need to fix kata network namespace logic to be able to merge this PR #1664 |
| if fc.jailed { | ||
| args = []string{ | ||
| "--id", fc.id, | ||
| "--node", "0", //FIXME: Comprehend NUMA topology or explicit ignore |
There was a problem hiding this comment.
For now, don't set it unless the local allocator (kubelet etc.) tells us which NUMA node to use?
There was a problem hiding this comment.
@bergwolf agree. I need to send a PR to jailer to allow it to setup the jail w/o the NUMA node chosen. Today it makes it mandatory. If I am able to get the firecracker community to accept that we can remove this limitation. So for now we can always set the NUMA node to 0.
Also I need to work with the firecracker community to delegate the cgroup setup (or allow the cgroup parent) to be chosen by the caller.
There was a problem hiding this comment.
I see. Let's have it for now then.
|
If firecracker accepts firecracker-microvm/firecracker#1093 we can support jailer in Kata. @devimc @amshinde I had the change the bind mount setup in Kata from private to slave in order for things to work correctly. So we need to change our bindmount logic to choose the right type of bind mount on a case by case basis vs always choosing private. See 8e2ce77 |
Currently, that's not possible, because of 9p. When overlay is used as storage driver, kata runtime creates a new bind mount point to the merged directory, that way this directory can be shared with the VM through 9p. By default the mount propagation is shared, that means mount events are propagated, but umount events not, to deal with this problem and to avoid left mount points in the host once container finishes, the mount propagation of bind mounts should be set to private. @mcastelino if you change the mount propagation to slave right now, you will re-introduce a bug. We can switch to slave, once we support virtio-fs and deprecate 9p @mcastelino does it make sense? |
3fa0258 to
73fee6c
Compare
@devimc when you say unmount are not propagated, do you mean umount from within the pivot_root/chroot? In the case of firecracker all the mounts and umount happen from outside the pivot_root. So the setup and cleanup is done by kata. When you say mounts are left being, is this the mounts done by docker as part of docker cp or the mounts created by Kata? I am trying to understand why I am not seeing the issue when I use SLAVE. |
no, when the mount propagation is slave/shared, the mount point created outside the pivot_root are propagated (visible within it), but umount event not, because we are creating a bind mount to the merged (overlay) directory. Once we support virtio-fs and deprecated 9p, won't be needed to create a bind mount point, since the fuse daemon will be able to monitor the merged directory directly.
yes, for a strange reason that I still don't understand, |
|
/test |
|
@mcastelino any updates? Thx! |
@raravena80 this is actually ready for merge. Not sure what is happening here with the CI. The change needed for firecracker has been merged upstream firecracker-microvm/firecracker#1093 However I do not think there is a release with this bugfix. Maybe that is why the CI is failing. So I think we will need to first update firecracker to use a release with this bugfix and then merge this PR. |
| // do some bookkeeping: | ||
| // * evaluate all symlinks | ||
| // * ensure the source exists | ||
| // * recursively create the destination |
There was a problem hiding this comment.
Can you add a note to specify that you are adding this to override the bind-mount behaviour in Kata to use a slave mount instead?
There was a problem hiding this comment.
@amshinde you are right. This is only for firecracker specific. Also it does not impact firecracker as it does not use 9p today.
|
Blocked on firecracker next release. |
|
ping @mcastelino |
|
/test |
|
/test |
jcvenegas
left a comment
There was a problem hiding this comment.
LGTM, a few nits that we can fix in a follow-up PR.
| # Firecracker binary name | ||
| FCCMD := firecracker | ||
| # Firecracker's jailer binary name | ||
| FCJAILERCMD := jailer |
There was a problem hiding this comment.
Nit: This looks like a generic option ... probably could be part of the toplevel makefile?
There was a problem hiding this comment.
I was thinking about that. But as of now we do not really have a real jailer for any other Hypervisor. The fc jailer if generalized may be able to support other hypervisors. But looking at the feature set I think we can even use runc as a jailer.
virtcontainers/fc.go
Outdated
| // Also jailer based on the id implicitly sets up cgroups under | ||
| // <cgroups_base>/<exec_file_name>/<id>/ | ||
| hypervisorName := filepath.Base(hypervisorConfig.HypervisorPath) | ||
| fc.chrootBaseDir = "/var/lib/vc" //store.ConfigStoragePath //This mount has exec perms |
There was a problem hiding this comment.
This could be used with the general global variable
Commented code?
| // So we need to repopulate this at startSandbox where it is valid | ||
| fc.netNSPath = networkNS.NetNsPath | ||
|
|
||
| // Till we create lower privileged kata user run as root |
There was a problem hiding this comment.
Missing issue to reference here?
There was a problem hiding this comment.
Will add an issue.
virtcontainers/fc.go
Outdated
| var networkNS NetworkNamespace | ||
| if err := fc.store.Load(store.Network, &networkNS); err == nil { | ||
| if networkNS.NetNsPath == "" { | ||
| fc.Logger().WithField("NETWORK NAMESPACE NULL", networkNS).Error() |
virtcontainers/fc.go
Outdated
| "--node", "0", //FIXME: Comprehend NUMA topology or explicit ignore | ||
| "--seccomp-level", "2", | ||
| "--exec-file", fc.config.HypervisorPath, | ||
| "--uid", "0", //hack: we need to setup user and group for kata |
| args = []string{ | ||
| "--id", fc.id, | ||
| "--node", "0", //FIXME: Comprehend NUMA topology or explicit ignore | ||
| "--seccomp-level", "2", |
There was a problem hiding this comment.
magic number? maybe a comment for future reference?
virtcontainers/fc.go
Outdated
| _, err := fc.client().Operations.PutGuestBootSource(bootSrcParams) | ||
| return err | ||
| _, err = fc.client().Operations.PutGuestBootSource(bootSrcParams) | ||
| if err != nil { |
There was a problem hiding this comment.
Nil: this still could be return err remove the return nil below
| hostPath := filepath.Join(fc.jailerRoot, jailedPath) | ||
| err := syscall.Unmount(hostPath, syscall.MNT_DETACH) | ||
| if err != nil { | ||
| fc.Logger().WithField("umountResource failed", err).Info() |
There was a problem hiding this comment.
This currently is info as I know some of these will fail on second go around. There is a comment in the caller but not here. Let me add another comment here.
| return nil | ||
| } | ||
|
|
||
| func (fc *firecracker) umountResource(jailedPath string) { |
There was a problem hiding this comment.
Can do. The error can be ignored by caller.
| } | ||
|
|
||
| fc.Logger().WithField("cleaningJail", fc.vmPath).Error() | ||
| if err := os.RemoveAll(fc.vmPath); err != nil { |
There was a problem hiding this comment.
Nit :if vmPath contains fc.id just in case :P
There was a problem hiding this comment.
Can you expand a bit on this comment. Not sure I understood.
|
/test |
Codecov Report
@@ Coverage Diff @@
## master #1649 +/- ##
========================================
Coverage ? 52.5%
========================================
Files ? 108
Lines ? 13951
Branches ? 0
========================================
Hits ? 7325
Misses ? 5756
Partials ? 870 |
|
rhel: |
|
Also the centos job is failing due to vsock kernel module being absent |
thanks @jcvenegas and, :irony: thanks :/irony: bats, for not giving us any useful actual details of the values of the comparison that failed :-( |
|
@GabyCT the FC CI is passing on ubuntu for me by following the steps as below. |
|
/test-fc |
|
The tests need update to allow the jailed version of firecracker to be detected |
Currently we have issues while trying to run the soak test on firecracker (see kata-containers#1803). We need to remove it so the change kata-containers/runtime#1649 can be merged. Fixes kata-containers#1805 Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Firecracker provides a jailer to constrain the VMM. Use this jailer to launch the firecracker VMM instead of launching it directly from the kata-runtime. The jailer will ensure that the firecracker VMM will run in its own network and mount namespace. All assets required by the VMM have to be present within these namespaces. The assets need to be copied or bind mounted into the chroot location setup by jailer in order for firecracker to access these resouces. This includes files, device nodes and all other assets. Jailer automatically sets up the jail to have access to kvm and vhost-vsock. If a jailer is not available (i.e. not setup in the toml) for a given hypervisor the runtime will act as the jailer. Also enhance the hypervisor interface and unit tests to include the network namespace. This allows the hypervisor to choose how and where to lauch the VMM process, vs virtcontainers directly launching the VMM process. Fixes: kata-containers#1129 Signed-off-by: Manohar Castelino <manohar.r.castelino@intel.com>
Add jailer support to configuration files. Also enable jailer by default in Kata containers. Signed-off-by: Manohar Castelino <manohar.r.castelino@intel.com>
|
/test |
|
@mcastelino so it seems that there is an issue with |
|
@mcastelino , the issue with |
|
🎉 🎈 |
Currently we have issues while trying to run the soak test on firecracker (see kata-containers#1803). We need to remove it so the change kata-containers/runtime#1649 can be merged. Fixes kata-containers#1805 Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Currently we have issues while trying to run the soak test on firecracker (see kata-containers#1803). We need to remove it so the change kata-containers/runtime#1649 can be merged. Fixes kata-containers#1805 Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Firecracker provides a jailer to constrain the VMM. Use this
jailer to launch the firecracker VMM instead of launching it
directly from the kata-runtime.
The jailer will ensure that the firecracker VMM will run
in its own network and mount namespace. All assets required
by the VMM have to be present within these namespaces.
The assets need to be copied or bind mounted into the chroot
location setup by jailer in order for firecracker to access
these resouces. This includes files, device nodes and all
other assets.
Jailer automatically sets up the jail to have access to
kvm and vhost-vsock.
If a jailer is not available (i.e. not setup in the toml)
for a given hypervisor the runtime will act as the jailer.
Also enhance the hypervisor interface and unit tests to
include the network namespace. This allows the hypervisor
to choose how and where to lauch the VMM process, vs
virtcontainers directly launching the VMM process.
Fixes: #1129
Signed-off-by: Manohar Castelino manohar.r.castelino@intel.com