c8d: Don't remount if the container is already running#45383
c8d: Don't remount if the container is already running#45383rumpl wants to merge 1 commit intomoby:masterfrom
Conversation
vvoland
left a comment
There was a problem hiding this comment.
Looks good to me, I don't have a better idea for now, and this is already an improvement.
I was wondering if starting stopped container or stopping started container during the cp would cause any problems, but it seems that the container stop/start doesn't happen until the cp actually finishes.
This change fixes double-mounting when running an exec or a cp on a running container with the containerd integration enabled. Sharing layers with overlayfs results in an undefined state and things start to randomly break with a "No such file or directory" error. Graph drivers use reference counting and only mount things once, we can't do this with containerd because we don't know where it mounted the container rootfs. Instead of reference counting we return the `/proc/<pid>/root` directory for exec and/or cp to read from. Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
|
What if the container pid1 dies while the copy is in progress? 😬 (Thinking something like a |
The container is locked so it won't exit until |
| if err := mount.All(mounts, root); err != nil { | ||
| return fmt.Errorf("failed to mount %s: %w", root, err) | ||
| root := "" | ||
| if container.State != nil && !container.State.Running && container.Pid == 0 { |
There was a problem hiding this comment.
Just a (theoretical?) exercise; looking at what this condition looks like for the "else" branch;
if container.State == nil || (container.State != nil && container.State.Running) || container.Pid != 0 {
root = fmt.Sprintf("/proc/%d/root", container.Pid)
}which effectively can be reduce to;
if container.Pid != 0 {
root = fmt.Sprintf("/proc/%d/root", container.Pid)
}But (see first example); would also be running if container.State == nil, and container.Pid has "any value"; can those conditions happen? Should we handle those?
There was a problem hiding this comment.
^^ actually;
container.Pidiscontainer.State.Pid, so currently it would panic ifcontainer.Stateisnil- Given that
RunningandPausedare not mutually exclusive; which branch shouldPausedfall into?
There was a problem hiding this comment.
So perhaps we need an early return (error) at least if container.State is nil
| if err := mount.UnmountAll(root, 0); err != nil { | ||
| return fmt.Errorf("failed to unmount %s: %w", root, err) | ||
| } | ||
| if strings.HasSuffix(root, rootfsMountSuffix) { |
There was a problem hiding this comment.
In this case we don't unmount, but still reset container.BaseFS; should this be an error condition? (Should it be logged?)
| done := make(chan error) | ||
| err = unshare.Go(unix.CLONE_NEWNS, | ||
| func() error { | ||
| if container.State != nil && container.State.Running && container.Pid != 0 { |
There was a problem hiding this comment.
Same condition as mentioned earlier (just checking if ok)
|
Verified this on 45f6f0e (from / # ( set -eux; time dd if=/dev/urandom of=/tmp/big bs=1G count=10; ls -lh /tmp/big; docker run -dit --name foo bash time sleep 1; date -Is; time docker cp /tmp/big foo:/tmp/; date -Is; docker logs -f foo; docker container inspect --format '{{ .State.StartedAt }}{{ "\n" }}{{ .State.FinishedAt }}' foo; docker start foo; docker exec foo ls -lh /tmp/big )
+ time dd 'if=/dev/urandom' 'of=/tmp/big' 'bs=1G' 'count=10'
10+0 records in
10+0 records out
real 0m 29.70s
user 0m 0.00s
sys 0m 29.67s
+ ls -lh /tmp/big
-rw-r--r-- 1 root root 10.0G May 3 19:00 /tmp/big
+ docker run -dit --name foo bash time sleep 1
417c4d15f4e4472f07fcf0d74fc93cd95ecd27cc42d3e9594b5582b860a2c4f2
+ date -Is
2023-05-03T19:00:03+00:00
+ time docker cp /tmp/big foo:/tmp/
Successfully copied 10.7GB to foo:/tmp/
real 0m 11.05s
user 0m 3.34s
sys 0m 6.41s
+ date -Is
2023-05-03T19:00:14+00:00
+ docker logs -f foo
real 0m 1.00s
user 0m 0.00s
sys 0m 0.00s
+ docker container inspect --format '{{ .State.StartedAt }}{{ "\n" }}{{ .State.FinishedAt }}' foo
2023-05-03T19:00:03.246792584Z
2023-05-03T19:00:04.249160906Z
+ docker start foo
foo
+ docker exec foo ls -lh /tmp/big
-rw-r--r-- 1 root root 10.0G May 3 19:00 /tmp/big |
|
Hey, this fixes an issue I'm seeing on my tests where I start a container, cp some stuff to it, then exec in and try to do things. |
|
Maybe we can have a mount service that ref-counts all this. |
|
s/mount service/rootfs service/ |
|
We do know where the container rootfs is mounted because dockerd is doing the mounting, even when
The daemon certainly could use reference counting and only mount things once in a snapshotter world, same as in a graphdriver world. The only difference is that
The finalizer is only set on the |
- What I did
Opening this PR to start a discussion more than wanting this code to be merged. The code works but maybe someone has a better idea for this case. @AkihiroSuda @thaJeztah @vvoland
The
openContainerFSwould need some more work, I'm not sure we need to unmount volumes here: https://github.com/moby/moby/blob/master/daemon/containerfs_linux.go#L69, they are unmounted in the finalizer of the structThis change fixes double-mounting when running an exec or a cp on a running container with the containerd integration enabled. Sharing layers with overlayfs results in an undefined state and things start to randomly break with a "No such file or directory" error.
Graph drivers use reference counting and only mount things once, we can't do this with containerd because we don't know where it mounted the container rootfs. Instead of reference counting we return the
/proc/<pid>/rootdirectory for exec and/or cp to read from.- How I did it
Checking the container state and setting the container basefs to
/proc/<pid>/rootif the container is running.- How to verify it
You can use this script:
With current
masterbranch the output is:Notice the line
touch: cannot touch '/etc/default/test-file': No such file or directory, the rootfs is in an undefined state...With this change the error goes away:
You can also look at
sudo dmesg -win parallel, there shouldn't be any overlayfs warnings after this change.- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)