-
Notifications
You must be signed in to change notification settings - Fork 367
[RFC] [Proposal] support cgroups in the host #1125
Description
Description of the problem
Unlike other runtimes, kata starts at least 2 processes per POD, kata-shim and a hypervisor (and kata-proxy when vsocks is disable). From host perspective, containers run in a guest (virtual machine) whose resources can be monitored by monitoring the hypervisor process and its threads. kata-shims are used to interact with the containers, sending signals, reading/writing data, etc. A POD can have any number of containers that run in the same guest. One of the major gaps that can be found in kata containers currently is that cgroups are applied in the guest, but not in the host or not in the right cgroup path (see #1021) causing problem in some implementations for resource governance, like service fabric https://github.com/Microsoft/service-fabric/.
Proposal
Move kata-shim (pause workload) and hypervisor (and kata-proxy if vsocks is disable) to sandbox cgroup, and apply a cgroup constraint to the sandbox that is defined by the sum of all its containers' cgroups. Container's cgroups are still created and applied but just to the kata-shim that interact with the container. 2 examples:
A pod with 2 containers, container A with 1 cpu and container B without cpu constraints (1 vcpu is hotplugged).
- in the host: no cpu constraints for container B and sandbox, 1 cpu for container A.
- in the guest: no cpu constraints for container B, 1 cpu for container A.
A pod with 2 containers, container A with 1 cpu and container B with 2 cpus (3 vcpus are hotplugged).
cgroup directory hierarchy
/sys/fs/cgroup/cpu,cpuacct
└── kubepods/burstable/pod644e04f3
├── crio-89e60b84 # Sandbox cgroupsPath=/kubepods/burstable/pod644e04f3/crio-89e60b84
├── crio-2a95e0f8 # Container A cgroupsPath=/kubepods/burstable/pod644e04f3/crio-2a95e0f8
└── crio-d037bce7 # Container B cgroupsPath=/kubepods/burstable/pod644e04f3/crio-d037bce7
