Proposal - Allow the container to share the PID namespace with the host by rhatdan · Pull Request #9339 · moby/moby

rhatdan · 2014-11-25T20:18:03Z

We want to be able to use container without the PID namespace. We basically
want containers that can manage the host os, which I call Super Privileged
Containers. We eventually would like to get to the point where the only
namespace we use is the MNT namespace to bring the Apps userspace with it.

By eliminating the PID namespace we can get better communication between the
host and the clients and potentially tools like strace and gdb become easier
to use. We also see tools like libvirtd running within a container telling
systemd to place a VM in a particular cgroup, we need to have communications of the PID.

I don't see us needing to share PID namespaces between containers, since this
is really what docker exec does.

So currently I see us just needing docker run --pid=host

Docker-DCO-1.1-Signed-off-by: Dan Walsh dwalsh@redhat.com (github: rhatdan)

icecrime · 2014-11-26T16:32:57Z

Flagging as proposal, thanks for the "docs first" approach!

SvenDowideit · 2014-11-27T03:00:33Z

I'd feel more comfortable if there was a default --pid=namespaced (or something), making it a little more obvious that this flag changes the norm.

jessfraz · 2014-11-27T09:07:49Z

I don't understand the point

jessfraz · 2014-11-27T09:09:16Z

Why use a container if you don't want to be contained?

SvenDowideit · 2014-11-28T00:07:35Z

it allows the use of container images as a distribution mechanism - ie, will make the rpm format obsolete 🎉 :)

and you get a magical way to uninstall, or up/down grade a host's daemons

rhatdan · 2014-11-30T12:05:56Z

I would not go as far as saying make RPM obsolete. As you still need a packageing format for building your image. But yes, the point is to use docker images as a packaging format.

Turning off all namespaces except the mnt namespace is the goal. BTW CoreOS is doing this now but they just use the docker format and then run systemd-nspawn on the exploded image.

I would prefer to use native docker commands.

SvenDowideit · 2014-12-01T02:59:59Z

@rhatdan one thing that --pid=container:someother might be useful for, is to compose a set of services that might be managed by some kind of supervisor process with might have been coded container unaware (but with configurable spawn commands). But hey, i also want to have --fs=container:someother to let me compose a server with dynamic plugins.

sdake · 2014-12-01T15:09:28Z

This feature would allow the entire OS image to be modified on the fly without modifying running applications. If applications were smart enough to handle a restart in this situation, the base images and most of the applications could be restarted leaving key processes behind.

In one use case, we want to run OpenStack Nova and Libvirt in an upgrade, but keep all of the KVM processes between container restarts. This would allow libvirt to reconnect to the existing KVM processes, even after what would appear like a full container restart ;)

crosbymichael · 2014-12-01T19:36:41Z

@sdake idk, i would take the opposite approach and think that docker should inspect the cgroup and kill off any tasks when the container stops or else ( if you do not understand what this means ) you will have processes that live on after docker reports that your container has stopped.

idk, what do you think?

sdake · 2014-12-01T19:51:56Z

@crosbymichael The reason to share PID namespaces between containers and between executions of the container tech is to allow the rest of the OS to upgrade seamlessly while critical applications launched from container daemons remain untouched. I get it is not very intuitive, which is why I think Dan suggested a special flag to get Docker to execute in this way.

rhatdan · 2014-12-01T19:58:46Z

I would argue you want both. I want all of the processes running in the container to exit on stopping the container, but I also could see allowing moving certain pids out of the container cgroup and letting them live on until killed.

Specifically we are looking at libvirt again running iwithin the container. We might want to reboot the libvirtd container but allow the virtual machines to continue running.

sdake · 2014-12-01T20:34:55Z

@rhatdan That WFM, would that be a different feature, or had you intended this feature to handle the "both" model?

rhatdan · 2015-01-07T20:10:35Z

@sdake I wanted this feature to handle both.

rhatdan · 2015-01-07T20:21:48Z

@crosbymichael

The problem I am seeing is if I do

docker run -ti --pid=host fedora /bin/sh

sleep 6000 &

^d

The process just hangs, and the container never exits. What I would like to do is realize that the primary pid has died and then send kill signals to all of the processes in the cgroup.

I think the problem is docker does not watch for a pid1 to die, it watches for the cgroup to get removed when all of the processes exit.

    oomKillNotification, err := libcontainer.NotifyOnOOM(state)

Is there a place where go gets a sigchild from Pid1 of the container?

crosbymichael · 2015-01-07T21:00:42Z

Actually the cmd.Wait() will return after PID1 dies. We will then have to use the cgroup to freeze, send a sigkill to all remaining processes, unfreeze and wait for them to die.

crosbymichael · 2015-01-07T21:02:02Z

Do you think it should be conditional that we nuke the cgroup or is it safe to do it every time?

Do you think that setting a parent death signal to SIGKILL work be enough if we using the host's PID namespace?

rhatdan · 2015-01-07T21:23:34Z

I found the place and updated the pull request, it now kills all processes if you are using the Host PID Namespace.

rhatdan · 2015-01-07T21:28:30Z

The patch I have is currently racy in that a ForkBomb could leave a process running.
I am not sure if the cgroup freeze will fix the problem, but I guess it is better then what I have now.

crosbymichael · 2015-01-12T18:50:42Z

I'm updating docker with the latest changes from libcontainer with this change so we can move forward.

#10049

rhatdan · 2015-01-12T19:54:58Z

@crosbymichael updated

crosbymichael · 2015-01-12T20:52:16Z

daemon/execdriver/driver.go

Maybe this should just be HostPid to match Ipc above?

crosbymichael · 2015-01-12T20:54:35Z

@icecrime can you review this one also?

crosbymichael · 2015-01-12T20:57:14Z

This is looking good so far.

root@f9b5d1691ec2:/go/src/github.com/docker/docker# docker run --rm busybox ps aux
PID   USER     COMMAND
    1 root     ps aux
root@f9b5d1691ec2:/go/src/github.com/docker/docker# docker run --rm --pid host busybox ps aux
PID   USER     COMMAND
    1 root     bash
  496 root     docker -d -s vfs
  600 root     docker run --rm --pid host busybox ps aux
  615 root     ps aux
root@f9b5d1691ec2:/go/src/github.com/docker/docker#

crosbymichael · 2015-01-12T21:04:36Z

LGTM

It would be nice to remove the NS suffix from the var names to make it more consistent with Ipc.

After being created on 2014-04-08 14:16:33, we can finally run crosbymichael/htop with this PR

rhatdan · 2015-01-12T22:02:14Z

sed s/pidns/pid done

unclejack · 2015-01-12T22:40:13Z

LGTM for carry @crosbymichael
I don't know if the docs are OK from @fredlf's, @SvenDowideit's and @jamtur01's point of view.

jamtur01 · 2015-01-12T23:01:19Z

docs/sources/reference/run.md

Wrap line to 80?

jamtur01 · 2015-01-12T23:03:12Z

Minor Docs comments but otherwise LGTM.

rhatdan · 2015-01-13T14:22:19Z

Fixed docs.

ncdc · 2015-01-13T20:24:21Z

docs/sources/reference/run.md

s/allows processes ids/allows process ids/ ?

We want to be able to use container without the PID namespace. We basically want containers that can manage the host os, which I call Super Privileged Containers. We eventually would like to get to the point where the only namespace we use is the MNT namespace to bring the Apps userspace with it. By eliminating the PID namespace we can get better communication between the host and the clients and potentially tools like strace and gdb become easier to use. We also see tools like libvirtd running within a container telling systemd to place a VM in a particular cgroup, we need to have communications of the PID. I don't see us needing to share PID namespaces between containers, since this is really what docker exec does. So currently I see us just needing docker run --pid=host Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)

crosbymichael · 2015-01-14T00:40:26Z

Closing in favor of #10080

icecrime changed the title ~~Allow the container to share the PID namespace with the host~~ Proposal - Allow the container to share the PID namespace with the host Nov 26, 2014

icecrime added the Proposal label Nov 26, 2014

SvenDowideit added the /project/doc label Nov 27, 2014

rhatdan force-pushed the pid branch 3 times, most recently from e2d7d45 to 097905f Compare December 2, 2014 19:34

jessfraz mentioned this pull request Dec 2, 2014

Allow users to share pid ns with host or other containers docker-archive/libcontainer#283

Closed

rhatdan force-pushed the pid branch from 097905f to 1acb789 Compare December 13, 2014 11:42

rhatdan force-pushed the pid branch 2 times, most recently from 7a93089 to 784b6aa Compare January 5, 2015 21:23

rhatdan force-pushed the pid branch from 784b6aa to 4b062cd Compare January 7, 2015 21:22

rhatdan force-pushed the pid branch from 4b062cd to 5a4d7ac Compare January 7, 2015 21:25

rhatdan force-pushed the pid branch 3 times, most recently from 4996709 to 091b271 Compare January 12, 2015 19:54

crosbymichael reviewed Jan 12, 2015
View reviewed changes

daemon/execdriver/driver.go Outdated

Copy link
Copy Markdown

Contributor

crosbymichael Jan 12, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should just be HostPid to match Ipc above?

rhatdan force-pushed the pid branch 2 times, most recently from 3212d9f to 9330d0a Compare January 12, 2015 22:01

jamtur01 reviewed Jan 12, 2015
View reviewed changes

docs/sources/reference/run.md Outdated

Copy link
Copy Markdown

Contributor

jamtur01 Jan 12, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap line to 80?

rhatdan force-pushed the pid branch from 9330d0a to 51d8e0d Compare January 13, 2015 14:21

ncdc reviewed Jan 13, 2015
View reviewed changes

docs/sources/reference/run.md Outdated

Copy link
Copy Markdown

Contributor

ncdc Jan 13, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/allows processes ids/allows process ids/ ?

rhatdan force-pushed the pid branch from 51d8e0d to 83a06f3 Compare January 13, 2015 20:31

crosbymichael mentioned this pull request Jan 14, 2015

Add --pid flag for staying in the host's pid namespace #10080

Merged

crosbymichael closed this Jan 14, 2015

thaJeztah added this to the 1.5.0 milestone Jul 11, 2016

thaJeztah mentioned this pull request Nov 1, 2022

api/types/container: refactor to use strings.Cut, DRY, move tests and fix validation #44379

Merged

Conversation

rhatdan commented Nov 25, 2014

Uh oh!

icecrime commented Nov 26, 2014

Uh oh!

SvenDowideit commented Nov 27, 2014

Uh oh!

jessfraz commented Nov 27, 2014

Uh oh!

jessfraz commented Nov 27, 2014

Uh oh!

SvenDowideit commented Nov 28, 2014

Uh oh!

rhatdan commented Nov 30, 2014

Uh oh!

SvenDowideit commented Dec 1, 2014

Uh oh!

sdake commented Dec 1, 2014

Uh oh!

crosbymichael commented Dec 1, 2014

Uh oh!

sdake commented Dec 1, 2014

Uh oh!

rhatdan commented Dec 1, 2014

Uh oh!

sdake commented Dec 1, 2014

Uh oh!

rhatdan commented Jan 7, 2015

Uh oh!

rhatdan commented Jan 7, 2015

docker run -ti --pid=host fedora /bin/sh

sleep 6000 &

Uh oh!

crosbymichael commented Jan 7, 2015

Uh oh!

crosbymichael commented Jan 7, 2015

Uh oh!

rhatdan commented Jan 7, 2015

Uh oh!

rhatdan commented Jan 7, 2015

Uh oh!

crosbymichael commented Jan 12, 2015

Uh oh!

rhatdan commented Jan 12, 2015

Uh oh!

crosbymichael Jan 12, 2015

Choose a reason for hiding this comment

Uh oh!

crosbymichael commented Jan 12, 2015

Uh oh!

crosbymichael commented Jan 12, 2015

Uh oh!

crosbymichael commented Jan 12, 2015

Uh oh!

rhatdan commented Jan 12, 2015

Uh oh!

unclejack commented Jan 12, 2015

Uh oh!

jamtur01 Jan 12, 2015

Choose a reason for hiding this comment

Uh oh!

jamtur01 commented Jan 12, 2015

Uh oh!

rhatdan commented Jan 13, 2015

Uh oh!

ncdc Jan 13, 2015

Choose a reason for hiding this comment

Uh oh!

crosbymichael commented Jan 14, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants