Skip to content

Exec process may cause shim hang #2709

@Ace-Tang

Description

@Ace-Tang

Description

when dozens exec process simultaneously run in a shim, shim will get hang.

Steps to reproduce the issue:

  1. a simple shell script can reproduce the problem
#!/bin/bash

for ((i=0;i<10;i++));do
	(sudo ctr t exec --exec-id "t1-1-$i" c1 echo 1) &
	(sudo ctr t exec --exec-id "t1-2-$i" c1 echo 1) &
	(sudo ctr t exec --exec-id "t1-3-$i" c1 echo 1) &
	(sudo ctr t exec --exec-id "t1-4-$i" c1 echo 1) &
	(sudo ctr t exec --exec-id "t1-5-$i" c1 echo 1) &
	(sudo ctr t exec --exec-id "t1-6-$i" c1 echo 1) &
	(sudo ctr t exec --exec-id "t1-7-$i" c1 echo 1) &
done
  1. shim will get hang, we can check ctr t ls command will get hang, since shim.State never get shim lock which block at reaper.Wait
$ sudo ctr t ls
^C

After check the code in shim, I found the last running exec process is block at Monitor.Wait, this will caused runtime.Exec() block, so this process is bolcked, shimService.Start() -> execCreatedState.Start() -> execProcess.Start().

// Wait blocks until a process is signal as dead.
// User should rely on the value of the exit status to determine if the
// command was successful or not.
func (m *Monitor) Wait(c *exec.Cmd, ec chan runc.Exit) (int, error) {
    for e := range ec {  --> wait here
        if e.Pid == c.Process.Pid {
            // make sure we flush all IO
            c.Wait()
            m.Unsubscribe(ec)
            return e.Status, nil 
        }
    }   
    // return no such process if the ec channel is closed and no more exit
    // events will be sent
    return -1, ErrNoSuchProcess
}

enlarge buffersize in runtime/v1/shim/reaper.go can resolve the problem, but it can not solve problems fundamentally.
I think the lock implement in shim may have some problem which cause this problem. I will do more check in code.

@fuweid also reproduce this problem, and get some more detail info about goroutines, please help to post information here.

Describe the results you received:

Describe the results you expected:

Output of containerd --version:

$ containerd -v
containerd github.com/containerd/containerd v1.2.0-rc.1 0c5f8f63c3368856c320ae8a1c125e703b73b51d

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions