-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Description
when dozens exec process simultaneously run in a shim, shim will get hang.
Steps to reproduce the issue:
- a simple shell script can reproduce the problem
#!/bin/bash
for ((i=0;i<10;i++));do
(sudo ctr t exec --exec-id "t1-1-$i" c1 echo 1) &
(sudo ctr t exec --exec-id "t1-2-$i" c1 echo 1) &
(sudo ctr t exec --exec-id "t1-3-$i" c1 echo 1) &
(sudo ctr t exec --exec-id "t1-4-$i" c1 echo 1) &
(sudo ctr t exec --exec-id "t1-5-$i" c1 echo 1) &
(sudo ctr t exec --exec-id "t1-6-$i" c1 echo 1) &
(sudo ctr t exec --exec-id "t1-7-$i" c1 echo 1) &
done
- shim will get hang, we can check
ctr t lscommand will get hang, since shim.State never get shim lock which block at reaper.Wait
$ sudo ctr t ls
^C
After check the code in shim, I found the last running exec process is block at Monitor.Wait, this will caused runtime.Exec() block, so this process is bolcked, shimService.Start() -> execCreatedState.Start() -> execProcess.Start().
// Wait blocks until a process is signal as dead.
// User should rely on the value of the exit status to determine if the
// command was successful or not.
func (m *Monitor) Wait(c *exec.Cmd, ec chan runc.Exit) (int, error) {
for e := range ec { --> wait here
if e.Pid == c.Process.Pid {
// make sure we flush all IO
c.Wait()
m.Unsubscribe(ec)
return e.Status, nil
}
}
// return no such process if the ec channel is closed and no more exit
// events will be sent
return -1, ErrNoSuchProcess
}
enlarge buffersize in runtime/v1/shim/reaper.go can resolve the problem, but it can not solve problems fundamentally.
I think the lock implement in shim may have some problem which cause this problem. I will do more check in code.
@fuweid also reproduce this problem, and get some more detail info about goroutines, please help to post information here.
Describe the results you received:
Describe the results you expected:
Output of containerd --version:
$ containerd -v
containerd github.com/containerd/containerd v1.2.0-rc.1 0c5f8f63c3368856c320ae8a1c125e703b73b51d