Skip to content
This repository was archived by the owner on Apr 3, 2018. It is now read-only.
This repository was archived by the owner on Apr 3, 2018. It is now read-only.

Shims related to the same container should be spawned into the same PID namespace #613

@sboeuf

Description

@sboeuf

I was playing with docker recently, starting a container from a first shell:

docker run --runtime=runc --rm -it busybox

and exec'ing an extra process with the following command:

docker exec -it $CONTAINER_ID sh

when I realized that sometimes (I would say 50% of the time), the exec'ed process was not returning after an exit from the container process, and the shell was staying completely stuck until I restart my docker service.

I have investigated by looking at the difference between the logs from a working run vs a not working run, and I have found that when the shim corresponding to the container process was exiting before the shim related to the exec'ed process (basically a child process), we were having the wrong behavior described above.

The rationale behind the issue seems logical to me, because docker expects those processes to run inside a PID namespace, it does expect that the container process should be the last one to exit. Indeed, the container process being the "init" process of the PID namespace, when it terminates for any reason, the kernel SIGKILL all the other processes inside the PID namespace, and the "init" process returns.

Now, why are we getting issues in our case. The thing is that when our container process inside the VM terminates, the guest kernel will kill the processes inside the same PID namespace and they will all return 137 as exit code. But we are still inside the VM here, and we have to pass this information back to the shim (through a proxy in some cases), which means we have no control which shim (corresponding to every process) will exit first.

I have confirmed those investigations by getting this test working all the time by adding a 5s sleep() here in case the process was the container process. This way, I have forced the container process to wait 5s before to return its exit code back to the shim, leaving some times for other shims to return before the container process.

My suggestion for this issue is to solve this in 2 steps:

  • The agent has to ensure every child process started with exec should be waited by the container process so that no process is left behind after the container process returns its exit code.
  • Virtcontainers should start every shim related to the same container into the same PID namespace. This way, when the container process terminates, the host kernel can kill our shim processes related to exec'ed processes, all returning 137 exit code, or we can get the same exit code from the agent (in case the shim received the exit code before to be killed).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions