Implement containerd shim v2 API for Kata Containers#572
Implement containerd shim v2 API for Kata Containers#572sboeuf merged 22 commits intokata-containers:masterfrom
Conversation
|
PSS Measurement: Memory inside container: |
|
Build succeeded (third-party-check pipeline).
|
|
PSS Measurement: Memory inside container: |
There are maybe more than one containers in a sandbox, thus it's needed to cleanup all of those containers' mount and id mapping files when do cleanup a sandbox. Fixes: kata-containers#572 Signed-off-by: fupan <lifupan@gmail.com>
|
Build succeeded (third-party-check pipeline).
|
|
PSS Measurement: Memory inside container: |
There are maybe more than one containers in a sandbox, thus it's needed to cleanup all of those containers' mount and id mapping files when do cleanup a sandbox. Fixes: kata-containers#572 Signed-off-by: fupan <lifupan@gmail.com>
|
PSS Measurement: Memory inside container: |
|
Build succeeded (third-party-check pipeline).
|
Removed some unused variables and retuned some if/else code style. Fixes: kata-containers#572 Signed-off-by: fupan <lifupan@gmail.com>
|
PSS Measurement: Memory inside container: |
|
Build succeeded (third-party-check pipeline).
|
|
Hi @lifupan - thanks for raising! This is clearly a very large PR. I'm not sure I'll be able to review it today but will review on Monday if not. |
|
@lifupan Thanks for submitting the implementation of Kata for containerd-shim-v2!
Could you please work with @chavafg and @GabyCT in order to introduce more testing to our CI, because I want to make sure we get no regression regarding your implementation once it gets merged. |
|
@sboeuf Sure, no problem. |
Removed some unused variables and retuned some if/else code style, at the same time, do not export those functions and variables which will not used out this module. Fixes: kata-containers#572 Signed-off-by: fupan <lifupan@gmail.com>
|
PSS Measurement: Memory inside container: |
Removed some unused variables and retuned some if/else code style, at the same time, do not export those functions and variables which will not used out this module. Fixes: kata-containers#572 Signed-off-by: fupan <lifupan@gmail.com>
|
PSS Measurement: Memory inside container: |
Removed some unused variables and retuned some if/else code style, at the same time, do not export those functions and variables which will not used out this module. Fixes: kata-containers#572 Signed-off-by: fupan <lifupan@gmail.com>
|
PSS Measurement: Memory inside container: |
Removed some unused variables and retuned some if/else code style, at the same time, do not export those functions and variables which will not used out this module. Fixes: kata-containers#572 Signed-off-by: fupan <lifupan@gmail.com>
|
Build failed (third-party-check pipeline) integration testing with
|
|
/test |
sboeuf
left a comment
There was a problem hiding this comment.
@lifupan last round of review :)
I left a bunch of comments, and I would appreciate if you could address them. It shouldn't take long.
Thanks again for the hard work, and I'm looking forward to get this PR merged by the end of the week!!
containerd-shim-v2/create.go
Outdated
| // Since there is a bug in kata for sharedPidNs, here to | ||
| // remove the pidns to disable the sharePidNs temporarily, | ||
| // once kata fixed this issue, we can remove this line. | ||
| removeNameSpace(&ociSpec, specs.PIDNamespace) |
There was a problem hiding this comment.
Have you opened an issue for that, or is there an issue already opened?
Please refer the issue link and number from the comment here.
Nit: s/removeNameSpace/removeNamespace
containerd-shim-v2/service.go
Outdated
| s.Lock() | ||
| defer s.Unlock() | ||
|
|
||
| ns, err := namespaces.NamespaceRequired(ctx) |
containerd-shim-v2/service.go
Outdated
| } | ||
| } | ||
|
|
||
| container, err := create(ctx, s, r, ns, s.config) |
There was a problem hiding this comment.
So you explained to me that ns was a concept brought by containerd and is completely different from regular Linux namespaces. Why are you reusing this namespace string to be the network namespace path?
There was a problem hiding this comment.
Sorry for the wrong explanation above. I had just dived into the containerd and found that here the "namespace" is the network namespace created by the containerd's CNI plugin, and that's why I passed it to kata's network ns path. Sorry for the confusion.
There was a problem hiding this comment.
Oh okay then that's nice because CNI is handled in a very straightforward way by doing so, I like it!
However, could you change the naming of the variable from ns to netns to make sure it's very obvious that we retrieved the network namespace from containerd.
There was a problem hiding this comment.
I had renamed it to netns and a line of comment.
| } | ||
|
|
||
| if n.Path == "" { | ||
| n.Path = netns |
There was a problem hiding this comment.
Why are we replacing the network namespace path with the containerd namespace name?
When the path is empty, this means the caller expect the network namespace to be created by the container runtime, and this is done later by CreateSandbox().
There was a problem hiding this comment.
The reason I had explained above for why replacing the network namespace with netns.
Since the "netns" is created by containerd's plugin, thus if containerd doesn't enabled any network plugin, then here the "netns" will be empty and yes it will be done later by CreateSandbox().
containerd-shim-v2/stream.go
Outdated
| var ( | ||
| bufPool = sync.Pool{ | ||
| New: func() interface{} { | ||
| buffer := make([]byte, 32<<10) |
There was a problem hiding this comment.
Please makes this a constant, and explain why it is needed.
containerd-shim-v2/service.go
Outdated
| return &taskAPI.StateResponse{ | ||
| ID: execs.id, | ||
| Bundle: c.bundle, | ||
| Pid: PID, |
|
|
||
| //wait for container | ||
| if r.ExecID == "" { | ||
| ret = <-c.exitCh |
There was a problem hiding this comment.
I think here it would be very nice to add s.sandbox.StopContainer(). We want to make sure the container is completely stopped inside the VM. The container process might terminate because it properly returned, and we want to ensure there's nothing left inside the VM.
There was a problem hiding this comment.
Actually it doesn't need. Once a container process exited, here it will public a TaskExitEventTopic to containerd, When containerd received this event, it will request a
Delete() service to this shimv2. The shimv2 will do the container stop and clean in this service.
There was a problem hiding this comment.
Ah yes that's cool! Compared to the madness that we have to handle with our OCI CLI implementation, this is a real improvement here.
I was just worried that if the call from containerd was not coming right away, we would want to stop our container as we know the process exited.
I guess if this is needed in the future, we could add it later, right?
There was a problem hiding this comment.
Yes, if it's needed in the future I'll add it.
| @@ -350,7 +350,46 @@ func (s *service) ResizePty(ctx context.Context, r *taskAPI.ResizePtyRequest) (* | |||
|
|
|||
| // State returns runtime state information for a process | |||
| func (s *service) State(ctx context.Context, r *taskAPI.StateRequest) (*taskAPI.StateResponse, error) { | |||
There was a problem hiding this comment.
I thinks this states is needed. First the containerd requested states isn't entirely match to kata's states. there must be a transfer between them and this states does it; Second, there is some states value kata doesn't keep thus shimv2 needs to store it in this states.
There was a problem hiding this comment.
Yes I realized that, and I saw that you tried to translate the Kata states when possible, but I understand containerd expects more states than what we have with Kata. I'm not asking to change anything, it's just sad that we have to redefine another layer of states again...
| } | ||
|
|
||
| if signum == syscall.SIGTERM { | ||
| err = s.sandbox.SignalProcess(c.id, processID, syscall.SIGKILL, r.All) |
There was a problem hiding this comment.
Why are we sending a SIGKILL here? The signal is a SIGTERM and has already been sent.
If there's a good explanation for this, please add a comment in the code explaining this.
There was a problem hiding this comment.
Since kubernetes will use SIGTERM signal to stop a container by default, but some container process such as "shell" will not response to this signal, thus container stop will always get timeout response, in order to terminate the container successfully, here send another SIGKILL to make sure the process will be terminated successfully. I'll add it to comments.
There was a problem hiding this comment.
Well I understand but I would expect the CRI layer (CRI-containerd or CRI-O) to actually send a SIGKILL themselves if they realize that SIGTERM didn't work. Is it something that we want to handle at this level?
/cc @bergwolf ^
There was a problem hiding this comment.
I think we should follow runc's practice and only send SIGTERM. If the container process chooses to ignore SIGTERM, the upper layer should send other signals after timeout. @lifupan do you see any test cases that runc passes but shimv2 fails due to SIGTERM timeout?
There was a problem hiding this comment.
@bergwolf exactly, the upper layer knows about the workload and should be the one that knows if SIGTERM is appropriate or not.
There was a problem hiding this comment.
@lifupan let's not duplicate some fix that we had to use with kata CLI since they might not be needed in this case. Do you have a concrete test case that will actually fail if you don't send a SIGKILL after a SIGTERM ?
There was a problem hiding this comment.
@sboeuf Yes, once I started a container running a cmd in shell, the stop container request always timeout and wait for a lone time to stop it.
There was a problem hiding this comment.
Yes, IIRC the interactive shell ignores SIGTERM, I think it would respond to the SIGTERM if you run without -it.
There was a problem hiding this comment.
Fair enough, it's not ideal, but we'll fix that later once we understand better the issue. I don't want to delay this PR because of this detail.
There was a problem hiding this comment.
Thanks @sboeuf @jodh-intel @WeiZhang555 @bergwolf @egernst. Thanks for your patient and careful help and review.
| ) | ||
|
|
||
| func marshalMetrics(s *service, containerID string) (*google_protobuf.Any, error) { | ||
| stats, err := s.sandbox.StatsContainer(containerID) |
There was a problem hiding this comment.
That's not the only reason why containerd-shim-v2 is very cool, but one of the main ones :)
Add the "Create" api support for creating a pod or container. Signed-off-by: fupan <lifupan@gmail.com>
Add the Start api support of start a pod or container created before. Signed-off-by: fupan <lifupan@gmail.com>
Add the Exec api support for exec an process in a running container. Signed-off-by: fupan <lifupan@gmail.com>
Add the Wait api to wait on a started container or exec process. Signed-off-by: fupan <lifupan@gmail.com>
Add the State api support to get a container or exec process's states. Signed-off-by: fupan <lifupan@gmail.com>
Add the Delete api support to delete a stopped container or pod. Signed-off-by: fupan <lifupan@gmail.com>
Add the Cleanup api support to cleanup the pod and containers running in it when the pod's corresponding shim died. Signed-off-by: fupan <lifupan@gmail.com>
Add the Pids api support to get the processes pids running in the pod. Signed-off-by: fupan <lifupan@gmail.com>
Add the CloseIO api support to close a process's input stream. Signed-off-by: fupan <lifupan@gmail.com>
Add the Connect api to get the shim's info. Signed-off-by: fupan <lifupan@gmail.com>
Add the Shutdown api support to shutdown the shim. Signed-off-by: fupan <lifupan@gmail.com>
Add the Update api support to update a running process's resouce. Signed-off-by: fupan <lifupan@gmail.com>
Add the ResizePty api support to resize the console. Signed-off-by: ZeroMagic <anthonyliu@zju.edu.cn>
Add the Pause api support to pause a container running in the pod. Signed-off-by: ZeroMagic <anthonyliu@zju.edu.cn>
Add the Resume api support to resume a paused container. Signed-off-by: ZeroMagic <anthonyliu@zju.edu.cn>
Add the Kill api support to send signal to a given container process. Signed-off-by: ZeroMagic <anthonyliu@zju.edu.cn> Signed-off-by: fupan.li <lifupan@gmail.com>
Add the Stats api support to get the container's resouces statistic. Signed-off-by: ZeroMagic <anthonyliu@zju.edu.cn>
Add unit test cases. Signed-off-by: fupan <lifupan@gmail.com>
Add the Makefile target of building shimv2. Fixes: kata-containers#485 Signed-off-by: fupan <lifupan@gmail.com>
|
/test |
|
Finally merged, Time to have a beer @lifupan |
This is the init code of containerd shim v2 for kata containers, and I had done the general
test with the containerd's client command ctr and cri's crictl, both of them worked well.
Fixes #485