Skip to content

Fixes: shim service event blocked when waiting for IO finished#2826

Merged
estesp merged 1 commit intocontainerd:masterfrom
lifubang:statemachineforpid
Nov 27, 2018
Merged

Fixes: shim service event blocked when waiting for IO finished#2826
estesp merged 1 commit intocontainerd:masterfrom
lifubang:statemachineforpid

Conversation

@lifubang
Copy link
Contributor

@lifubang lifubang commented Nov 23, 2018

@crosbymichael As I mentioned in #2823 , there will be a process locking state when waiting for one exec process IO finished.
I'm so sorry to say that I think this is because of your PR #2803 , you modify the location of process lock in delete func(#2624 ).

The correct location: https://github.com/Ace-Tang/containerd/blob/079292e3fc20b7319b6e7c35dcda6864bb128f1c/runtime/v1/linux/proc/exec_state.go#L170-L177

After your modify: https://github.com/containerd/containerd/blob/release/1.2/runtime/v1/linux/proc/exec.go#L103-L108

So, when exec proc A is waiting for it's IO finished, it will hold A's process lock.
When at this time, exec proc B is finished, shim service will trigger checkProcesses(e runc.Exit), it will call A.Pid(), but A.Pid() need A's process lock, so it is stucked, and every event in shim service is blocked, I think it's very dangerous if the event buffer is full. Please see: https://github.com/crosbymichael/containerd/blob/4c72befe097fb5d9e99ede3536c884608d0af474/runtime/v1/shim/service.go#L523-L532

But the PR #2624 is not perfect, because execCreatedState should lock the whole func, or it will cause race situation.

So, I think we should fix this problem.
There are two solutions:

  1. exec lock optimize when waiting for io close if no terminal #2823 , let waiting IO completed parallelly.
  2. this PR, use state machine management. Just lock in execCreatedState and execRunningState when call exec.Pid.

Signed-off-by: Lifubang lifubang@acmcoder.com

Signed-off-by: Lifubang <lifubang@acmcoder.com>
@codecov-io
Copy link

Codecov Report

Merging #2826 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2826   +/-   ##
=======================================
  Coverage   43.79%   43.79%           
=======================================
  Files         100      100           
  Lines       10741    10741           
=======================================
  Hits         4704     4704           
  Misses       5307     5307           
  Partials      730      730
Flag Coverage Δ
#linux 47.45% <ø> (ø) ⬆️
#windows 40.96% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 32aa0cd...bbc2a99. Read the comment docs.

@fuweid
Copy link
Member

fuweid commented Nov 23, 2018

For the lock change, I think it is good. 👍

@crosbymichael
Copy link
Member

LGTM

Copy link
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants