Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

Implement containerd shim v2 API for Kata Containers#572

Merged
sboeuf merged 22 commits intokata-containers:masterfrom
hyperhq:shimv2
Nov 28, 2018
Merged

Implement containerd shim v2 API for Kata Containers#572
sboeuf merged 22 commits intokata-containers:masterfrom
hyperhq:shimv2

Conversation

@lifupan
Copy link
Member

@lifupan lifupan commented Aug 10, 2018

This is the init code of containerd shim v2 for kata containers, and I had done the general
test with the containerd's client command ctr and cri's crictl, both of them worked well.

Fixes #485

@katacontainersbot
Copy link
Contributor

PSS Measurement:
Qemu: 167881 KB
Proxy: 4183 KB
Shim: 8854 KB

Memory inside container:
Total Memory: 2043464 KB
Free Memory: 2004008 KB

@opendev-zuul
Copy link

opendev-zuul bot commented Aug 10, 2018

Build succeeded (third-party-check pipeline).

@katacontainersbot
Copy link
Contributor

PSS Measurement:
Qemu: 170027 KB
Proxy: 3984 KB
Shim: 8859 KB

Memory inside container:
Total Memory: 2043464 KB
Free Memory: 2003572 KB

lifupan added a commit to hyperhq/kata-runtime that referenced this pull request Aug 10, 2018
There are maybe more than one containers in a sandbox, thus
it's needed to cleanup all of those containers' mount and
id mapping files when do cleanup a sandbox.

Fixes: kata-containers#572

Signed-off-by: fupan <lifupan@gmail.com>
@opendev-zuul
Copy link

opendev-zuul bot commented Aug 10, 2018

Build succeeded (third-party-check pipeline).

@katacontainersbot
Copy link
Contributor

PSS Measurement:
Qemu: 167678 KB
Proxy: 4082 KB
Shim: 8758 KB

Memory inside container:
Total Memory: 2043464 KB
Free Memory: 2002944 KB

lifupan added a commit to hyperhq/kata-runtime that referenced this pull request Aug 10, 2018
There are maybe more than one containers in a sandbox, thus
it's needed to cleanup all of those containers' mount and
id mapping files when do cleanup a sandbox.

Fixes: kata-containers#572

Signed-off-by: fupan <lifupan@gmail.com>
@katacontainersbot
Copy link
Contributor

PSS Measurement:
Qemu: 170260 KB
Proxy: 4120 KB
Shim: 8906 KB

Memory inside container:
Total Memory: 2043464 KB
Free Memory: 2003992 KB

@opendev-zuul
Copy link

opendev-zuul bot commented Aug 10, 2018

Build succeeded (third-party-check pipeline).

lifupan added a commit to hyperhq/kata-runtime that referenced this pull request Aug 10, 2018
Removed some unused variables and retuned some
if/else code style.

Fixes: kata-containers#572

Signed-off-by: fupan <lifupan@gmail.com>
@katacontainersbot
Copy link
Contributor

PSS Measurement:
Qemu: 169873 KB
Proxy: 4094 KB
Shim: 8905 KB

Memory inside container:
Total Memory: 2043464 KB
Free Memory: 2003860 KB

@opendev-zuul
Copy link

opendev-zuul bot commented Aug 10, 2018

Build succeeded (third-party-check pipeline).

@jodh-intel
Copy link

Hi @lifupan - thanks for raising! This is clearly a very large PR. I'm not sure I'll be able to review it today but will review on Monday if not.

@sboeuf
Copy link

sboeuf commented Aug 10, 2018

@lifupan Thanks for submitting the implementation of Kata for containerd-shim-v2!
One thing that comes to mind, we need to introduce a bunch of integration tests for this.
You mentioned

I had done the general test with the containerd's client command ctr and cri's crictl, both of them worked well.

Could you please work with @chavafg and @GabyCT in order to introduce more testing to our CI, because I want to make sure we get no regression regarding your implementation once it gets merged.

@lifupan
Copy link
Member Author

lifupan commented Aug 11, 2018

@sboeuf Sure, no problem.

lifupan added a commit to hyperhq/kata-runtime that referenced this pull request Aug 14, 2018
Removed some unused variables and retuned some
if/else code style, at the same time, do not
export those functions and variables which will
not used out this module.

Fixes: kata-containers#572

Signed-off-by: fupan <lifupan@gmail.com>
@katacontainersbot
Copy link
Contributor

PSS Measurement:
Qemu: 167640 KB
Proxy: 4126 KB
Shim: 8784 KB

Memory inside container:
Total Memory: 2043464 KB
Free Memory: 2003620 KB

lifupan added a commit to hyperhq/kata-runtime that referenced this pull request Aug 14, 2018
Removed some unused variables and retuned some
if/else code style, at the same time, do not
export those functions and variables which will
not used out this module.

Fixes: kata-containers#572

Signed-off-by: fupan <lifupan@gmail.com>
@katacontainersbot
Copy link
Contributor

PSS Measurement:
Qemu: 167620 KB
Proxy: 4010 KB
Shim: 9027 KB

Memory inside container:
Total Memory: 2043464 KB
Free Memory: 2003612 KB

lifupan added a commit to hyperhq/kata-runtime that referenced this pull request Aug 14, 2018
Removed some unused variables and retuned some
if/else code style, at the same time, do not
export those functions and variables which will
not used out this module.

Fixes: kata-containers#572

Signed-off-by: fupan <lifupan@gmail.com>
@katacontainersbot
Copy link
Contributor

PSS Measurement:
Qemu: 167888 KB
Proxy: 4064 KB
Shim: 8840 KB

Memory inside container:
Total Memory: 2043464 KB
Free Memory: 2003876 KB

lifupan added a commit to hyperhq/kata-runtime that referenced this pull request Aug 14, 2018
Removed some unused variables and retuned some
if/else code style, at the same time, do not
export those functions and variables which will
not used out this module.

Fixes: kata-containers#572

Signed-off-by: fupan <lifupan@gmail.com>
@opendev-zuul
Copy link

opendev-zuul bot commented Aug 14, 2018

Build failed (third-party-check pipeline) integration testing with
OpenStack. For information on how to proceed, see
http://docs.openstack.org/infra/manual/developers.html#automated-testing

@lifupan
Copy link
Member Author

lifupan commented Nov 27, 2018

/test

Copy link

@sboeuf sboeuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lifupan last round of review :)
I left a bunch of comments, and I would appreciate if you could address them. It shouldn't take long.
Thanks again for the hard work, and I'm looking forward to get this PR merged by the end of the week!!

// Since there is a bug in kata for sharedPidNs, here to
// remove the pidns to disable the sharePidNs temporarily,
// once kata fixed this issue, we can remove this line.
removeNameSpace(&ociSpec, specs.PIDNamespace)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you opened an issue for that, or is there an issue already opened?
Please refer the issue link and number from the comment here.

Nit: s/removeNameSpace/removeNamespace

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

s.Lock()
defer s.Unlock()

ns, err := namespaces.NamespaceRequired(ctx)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer!

}
}

container, err := create(ctx, s, r, ns, s.config)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you explained to me that ns was a concept brought by containerd and is completely different from regular Linux namespaces. Why are you reusing this namespace string to be the network namespace path?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the wrong explanation above. I had just dived into the containerd and found that here the "namespace" is the network namespace created by the containerd's CNI plugin, and that's why I passed it to kata's network ns path. Sorry for the confusion.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh okay then that's nice because CNI is handled in a very straightforward way by doing so, I like it!
However, could you change the naming of the variable from ns to netns to make sure it's very obvious that we retrieved the network namespace from containerd.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, NP.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had renamed it to netns and a line of comment.

}

if n.Path == "" {
n.Path = netns
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we replacing the network namespace path with the containerd namespace name?
When the path is empty, this means the caller expect the network namespace to be created by the container runtime, and this is done later by CreateSandbox().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I had explained above for why replacing the network namespace with netns.

Since the "netns" is created by containerd's plugin, thus if containerd doesn't enabled any network plugin, then here the "netns" will be empty and yes it will be done later by CreateSandbox().

var (
bufPool = sync.Pool{
New: func() interface{} {
buffer := make([]byte, 32<<10)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please makes this a constant, and explain why it is needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return &taskAPI.StateResponse{
ID: execs.id,
Bundle: c.bundle,
Pid: PID,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


//wait for container
if r.ExecID == "" {
ret = <-c.exitCh
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here it would be very nice to add s.sandbox.StopContainer(). We want to make sure the container is completely stopped inside the VM. The container process might terminate because it properly returned, and we want to ensure there's nothing left inside the VM.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it doesn't need. Once a container process exited, here it will public a TaskExitEventTopic to containerd, When containerd received this event, it will request a
Delete() service to this shimv2. The shimv2 will do the container stop and clean in this service.

Copy link

@sboeuf sboeuf Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes that's cool! Compared to the madness that we have to handle with our OCI CLI implementation, this is a real improvement here.
I was just worried that if the call from containerd was not coming right away, we would want to stop our container as we know the process exited.
I guess if this is needed in the future, we could add it later, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if it's needed in the future I'll add it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thanks.

@@ -350,7 +350,46 @@ func (s *service) ResizePty(ctx context.Context, r *taskAPI.ResizePtyRequest) (*

// State returns runtime state information for a process
func (s *service) State(ctx context.Context, r *taskAPI.StateRequest) (*taskAPI.StateResponse, error) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticing the way this function is implemented brings duplication additionally to the states of the Kata API. Instead, it maintains its own states.
@bergwolf @lifupan WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thinks this states is needed. First the containerd requested states isn't entirely match to kata's states. there must be a transfer between them and this states does it; Second, there is some states value kata doesn't keep thus shimv2 needs to store it in this states.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I realized that, and I saw that you tried to translate the Kata states when possible, but I understand containerd expects more states than what we have with Kata. I'm not asking to change anything, it's just sad that we have to redefine another layer of states again...

}

if signum == syscall.SIGTERM {
err = s.sandbox.SignalProcess(c.id, processID, syscall.SIGKILL, r.All)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we sending a SIGKILL here? The signal is a SIGTERM and has already been sent.
If there's a good explanation for this, please add a comment in the code explaining this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since kubernetes will use SIGTERM signal to stop a container by default, but some container process such as "shell" will not response to this signal, thus container stop will always get timeout response, in order to terminate the container successfully, here send another SIGKILL to make sure the process will be terminated successfully. I'll add it to comments.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I understand but I would expect the CRI layer (CRI-containerd or CRI-O) to actually send a SIGKILL themselves if they realize that SIGTERM didn't work. Is it something that we want to handle at this level?
/cc @bergwolf ^

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should follow runc's practice and only send SIGTERM. If the container process chooses to ignore SIGTERM, the upper layer should send other signals after timeout. @lifupan do you see any test cases that runc passes but shimv2 fails due to SIGTERM timeout?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bergwolf exactly, the upper layer knows about the workload and should be the one that knows if SIGTERM is appropriate or not.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lifupan let's not duplicate some fix that we had to use with kata CLI since they might not be needed in this case. Do you have a concrete test case that will actually fail if you don't send a SIGKILL after a SIGTERM ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sboeuf Yes, once I started a container running a cmd in shell, the stop container request always timeout and wait for a lone time to stop it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, IIRC the interactive shell ignores SIGTERM, I think it would respond to the SIGTERM if you run without -it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, it's not ideal, but we'll fix that later once we understand better the issue. I don't want to delay this PR because of this detail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sboeuf @jodh-intel @WeiZhang555 @bergwolf @egernst. Thanks for your patient and careful help and review.

)

func marshalMetrics(s *service, containerID string) (*google_protobuf.Any, error) {
stats, err := s.sandbox.StatsContainer(containerID)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not the only reason why containerd-shim-v2 is very cool, but one of the main ones :)

lifupan and others added 19 commits November 28, 2018 14:29
Add the "Create" api support for creating a pod
or container.

Signed-off-by: fupan <lifupan@gmail.com>
Add the Start api support of start a pod or
container created before.

Signed-off-by: fupan <lifupan@gmail.com>
Add the Exec api support for exec an process in
a running container.

Signed-off-by: fupan <lifupan@gmail.com>
Add the Wait api to wait on a started container
or exec process.

Signed-off-by: fupan <lifupan@gmail.com>
Add the State api support to get a container
or exec process's states.

Signed-off-by: fupan <lifupan@gmail.com>
Add the Delete api support to delete a stopped
container or pod.

Signed-off-by: fupan <lifupan@gmail.com>
Add the Cleanup api support to cleanup the pod and
containers running in it when the pod's corresponding
shim died.

Signed-off-by: fupan <lifupan@gmail.com>
Add the Pids api support to get the processes
pids running in the pod.

Signed-off-by: fupan <lifupan@gmail.com>
Add the CloseIO api support to close a process's
input stream.

Signed-off-by: fupan <lifupan@gmail.com>
Add the Connect api to get the shim's info.

Signed-off-by: fupan <lifupan@gmail.com>
Add the Shutdown api support to shutdown the shim.

Signed-off-by: fupan <lifupan@gmail.com>
Add the Update api support to update a running
process's resouce.

Signed-off-by: fupan <lifupan@gmail.com>
Add the ResizePty api support to resize the console.

Signed-off-by: ZeroMagic <anthonyliu@zju.edu.cn>
Add the Pause api support to pause a container running
in the pod.

Signed-off-by: ZeroMagic <anthonyliu@zju.edu.cn>
Add the Resume api support to resume a paused container.

Signed-off-by: ZeroMagic <anthonyliu@zju.edu.cn>
Add the Kill api support to send signal to a given
container process.

Signed-off-by: ZeroMagic <anthonyliu@zju.edu.cn>
Signed-off-by: fupan.li <lifupan@gmail.com>
Add the Stats api support to get the container's
resouces statistic.

Signed-off-by: ZeroMagic <anthonyliu@zju.edu.cn>
Add unit test cases.

Signed-off-by: fupan <lifupan@gmail.com>
Add the Makefile target of building shimv2.

Fixes: kata-containers#485

Signed-off-by: fupan <lifupan@gmail.com>
@lifupan
Copy link
Member Author

lifupan commented Nov 28, 2018

/test

Copy link

@sboeuf sboeuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we go!!! Let merge this PR :)

@gnawux
Copy link
Member

gnawux commented Nov 28, 2018

Finally merged,
Thanks all the reviewers.

Time to have a beer @lifupan

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

feature New functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.