Skip to content

runc run can hang indefinitely #1697

@teddyking

Description

@teddyking

We've noticed a condition under which a runc run ... process can hang indefinitely.
This can occur when the runc:[2:INIT] process gets killed for some reason before it is able to open the exec fifo.

This situation can be simulated by adding an os.Exit(1) right before opening the exec fifo here:

fd, err := unix.Open(fmt.Sprintf("/proc/self/fd/%d", l.fifoFd), unix.O_WRONLY|unix.O_CLOEXEC, 0)

and then running a runc run .... You will see the process hang and a <defunct> init process in the process tree:

root      7509 Sl+  14:38   0:00  | \_ runc run my-container
root      7517 Zs   14:38   0:00  |     \_ [runc:[2:INIT]] <defunct>

This is manifesting as container init still running errors for us on concurrent attempts to create/destroy containers that share a pid ns.

We're looking at how we can work around this but wondered if it might make sense to fix the underlying issue in runc?

One potential solution we've been thinking about involves getting the runc run ... process to wait on the child and then explicitly exit.

The call to wait would need to occur as soon as the child process has been spawned, so probably somewhere after the p.execSetns() call here:

if err := p.execSetns(); err != nil {
return newSystemErrorWithCause(err, "running exec setns process for init")
}

We added the following hack right after the p.execSetns() while testing this out and it seemed to solve the issue:

	go func() {
		p.wait()
		os.Exit(0)
	}()

I'm sure there's probably a neater way of doing this, but it at least demonstrates a possible fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions