-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
We've noticed a condition under which a runc run ... process can hang indefinitely.
This can occur when the runc:[2:INIT] process gets killed for some reason before it is able to open the exec fifo.
This situation can be simulated by adding an os.Exit(1) right before opening the exec fifo here:
runc/libcontainer/standard_init_linux.go
Line 172 in ab4a819
| fd, err := unix.Open(fmt.Sprintf("/proc/self/fd/%d", l.fifoFd), unix.O_WRONLY|unix.O_CLOEXEC, 0) |
and then running a
runc run .... You will see the process hang and a <defunct> init process in the process tree:
root 7509 Sl+ 14:38 0:00 | \_ runc run my-container
root 7517 Zs 14:38 0:00 | \_ [runc:[2:INIT]] <defunct>
This is manifesting as container init still running errors for us on concurrent attempts to create/destroy containers that share a pid ns.
We're looking at how we can work around this but wondered if it might make sense to fix the underlying issue in runc?
One potential solution we've been thinking about involves getting the runc run ... process to wait on the child and then explicitly exit.
The call to wait would need to occur as soon as the child process has been spawned, so probably somewhere after the p.execSetns() call here:
runc/libcontainer/process_linux.go
Lines 300 to 303 in ab4a819
| if err := p.execSetns(); err != nil { | |
| return newSystemErrorWithCause(err, "running exec setns process for init") | |
| } | |
We added the following hack right after the p.execSetns() while testing this out and it seemed to solve the issue:
go func() {
p.wait()
os.Exit(0)
}()
I'm sure there's probably a neater way of doing this, but it at least demonstrates a possible fix.