Proposal - Allow the container to share the PID namespace with the host#9339
Proposal - Allow the container to share the PID namespace with the host#9339rhatdan wants to merge 1 commit intomoby:masterfrom
Conversation
|
Flagging as proposal, thanks for the "docs first" approach! |
|
I'd feel more comfortable if there was a default |
|
I don't understand the point |
|
Why use a container if you don't want to be contained? |
|
it allows the use of container images as a distribution mechanism - ie, will make the and you get a magical way to uninstall, or up/down grade a host's daemons |
|
I would not go as far as saying make RPM obsolete. As you still need a packageing format for building your image. But yes, the point is to use docker images as a packaging format. Turning off all namespaces except the mnt namespace is the goal. BTW CoreOS is doing this now but they just use the docker format and then run systemd-nspawn on the exploded image. I would prefer to use native docker commands. |
|
@rhatdan one thing that |
|
This feature would allow the entire OS image to be modified on the fly without modifying running applications. If applications were smart enough to handle a restart in this situation, the base images and most of the applications could be restarted leaving key processes behind. In one use case, we want to run OpenStack Nova and Libvirt in an upgrade, but keep all of the KVM processes between container restarts. This would allow libvirt to reconnect to the existing KVM processes, even after what would appear like a full container restart ;) |
|
@sdake idk, i would take the opposite approach and think that docker should inspect the cgroup and kill off any tasks when the container stops or else ( if you do not understand what this means ) you will have processes that live on after docker reports that your container has stopped. idk, what do you think? |
|
@crosbymichael The reason to share PID namespaces between containers and between executions of the container tech is to allow the rest of the OS to upgrade seamlessly while critical applications launched from container daemons remain untouched. I get it is not very intuitive, which is why I think Dan suggested a special flag to get Docker to execute in this way. |
|
I would argue you want both. I want all of the processes running in the container to exit on stopping the container, but I also could see allowing moving certain pids out of the container cgroup and letting them live on until killed. Specifically we are looking at libvirt again running iwithin the container. We might want to reboot the libvirtd container but allow the virtual machines to continue running. |
|
@rhatdan That WFM, would that be a different feature, or had you intended this feature to handle the "both" model? |
e2d7d45 to
097905f
Compare
7a93089 to
784b6aa
Compare
|
@sdake I wanted this feature to handle both. |
|
The problem I am seeing is if I do docker run -ti --pid=host fedora /bin/shsleep 6000 &^d The process just hangs, and the container never exits. What I would like to do is realize that the primary pid has died and then send kill signals to all of the processes in the cgroup. I think the problem is docker does not watch for a pid1 to die, it watches for the cgroup to get removed when all of the processes exit. Is there a place where go gets a sigchild from Pid1 of the container? |
|
Actually the cmd.Wait() will return after PID1 dies. We will then have to use the cgroup to freeze, send a sigkill to all remaining processes, unfreeze and wait for them to die. |
|
Do you think it should be conditional that we nuke the cgroup or is it safe to do it every time? Do you think that setting a parent death signal to SIGKILL work be enough if we using the host's PID namespace? |
|
I found the place and updated the pull request, it now kills all processes if you are using the Host PID Namespace. |
|
The patch I have is currently racy in that a ForkBomb could leave a process running. |
|
I'm updating docker with the latest changes from libcontainer with this change so we can move forward. |
4996709 to
091b271
Compare
|
@crosbymichael updated |
daemon/execdriver/driver.go
Outdated
There was a problem hiding this comment.
Maybe this should just be HostPid to match Ipc above?
|
@icecrime can you review this one also? |
|
This is looking good so far. root@f9b5d1691ec2:/go/src/github.com/docker/docker# docker run --rm busybox ps aux
PID USER COMMAND
1 root ps aux
root@f9b5d1691ec2:/go/src/github.com/docker/docker# docker run --rm --pid host busybox ps aux
PID USER COMMAND
1 root bash
496 root docker -d -s vfs
600 root docker run --rm --pid host busybox ps aux
615 root ps aux
root@f9b5d1691ec2:/go/src/github.com/docker/docker# |
|
LGTM It would be nice to remove the After being created on 2014-04-08 14:16:33, we can finally run |
3212d9f to
9330d0a
Compare
|
sed s/pidns/pid done |
|
LGTM for carry @crosbymichael |
docs/sources/reference/run.md
Outdated
|
Minor Docs comments but otherwise LGTM. |
|
Fixed docs. |
docs/sources/reference/run.md
Outdated
There was a problem hiding this comment.
s/allows processes ids/allows process ids/ ?
We want to be able to use container without the PID namespace. We basically want containers that can manage the host os, which I call Super Privileged Containers. We eventually would like to get to the point where the only namespace we use is the MNT namespace to bring the Apps userspace with it. By eliminating the PID namespace we can get better communication between the host and the clients and potentially tools like strace and gdb become easier to use. We also see tools like libvirtd running within a container telling systemd to place a VM in a particular cgroup, we need to have communications of the PID. I don't see us needing to share PID namespaces between containers, since this is really what docker exec does. So currently I see us just needing docker run --pid=host Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
|
Closing in favor of #10080 |
We want to be able to use container without the PID namespace. We basically
want containers that can manage the host os, which I call Super Privileged
Containers. We eventually would like to get to the point where the only
namespace we use is the MNT namespace to bring the Apps userspace with it.
By eliminating the PID namespace we can get better communication between the
host and the clients and potentially tools like strace and gdb become easier
to use. We also see tools like libvirtd running within a container telling
systemd to place a VM in a particular cgroup, we need to have communications of the PID.
I don't see us needing to share PID namespaces between containers, since this
is really what docker exec does.
So currently I see us just needing docker run --pid=host
Docker-DCO-1.1-Signed-off-by: Dan Walsh dwalsh@redhat.com (github: rhatdan)