Skip to content

fix: modify lock location of exec delete avoid exec hang#2624

Merged
estesp merged 1 commit intocontainerd:masterfrom
Ace-Tang:fix_delete_lock
Sep 11, 2018
Merged

fix: modify lock location of exec delete avoid exec hang#2624
estesp merged 1 commit intocontainerd:masterfrom
Ace-Tang:fix_delete_lock

Conversation

@Ace-Tang
Copy link
Contributor

func (e *execProcess) delete(ctx context.Context) error {
e.wg.Wait()
...
}
delete exec process will wait for io copy finish, if wait here,
other process can not get lock of shim service.

  1. add lock before and after p.Delete.
  2. put lock after wait io copy in exec Delete.

Signed-off-by: Ace-Tang aceapril@126.com

how to produce the problem

  1. run a container sudo ctr run -d docker.io/library/busybox:latest b1 top
  2. create start.sh
$ sudo ctr t exec --exec-id e1 -t b1 sh
/ # vi start.sh
/ # cat start.sh 
sleep 100 &

echo "ok"
/ # chmod +x start.sh 
/ # exit
  1. run start.sh which will wait for io .
$ sudo ctr t exec --exec-id e1 b1 sh -c '/start.sh'
ok

real	1m40.102s
user	0m0.024s
sys	0m0.040s

  1. run an other exec process will wait for e1 finish.
$ time sudo ctr t exec --exec-id e1 b1 echo 1
1

real	1m30.652s
user	0m0.028s
sys	0m0.004s

also ctr t ls will also hang, since it also need acquire shim lock to get task status.

@codecov-io
Copy link

codecov-io commented Sep 10, 2018

Codecov Report

Merging #2624 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2624   +/-   ##
=======================================
  Coverage   44.05%   44.05%           
=======================================
  Files          94       94           
  Lines       10238    10238           
=======================================
  Hits         4510     4510           
  Misses       5008     5008           
  Partials      720      720
Flag Coverage Δ
#linux 47.77% <ø> (+0.02%) ⬆️
#windows 40.72% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1597270...079292e. Read the comment docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we removed it from here and just applied the lock around the s.transition() calls in the Delete methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, s.transition("deleted") need apply lock, but I just wonder whether delete need a lock. And update as you said.

func (e *execProcess) delete(ctx context.Context) error {
    e.wg.Wait()
...
}
delete exec process will wait for io copy finish, if wait here,
other process can not get lock of shim service.

1. apply lock around s.transition() calls in the Delete methods.
2. put lock after wait io copy in exec Delete.

Signed-off-by: Ace-Tang <aceapril@126.com>
@crosbymichael
Copy link
Member

LGTM

Copy link
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@estesp estesp merged commit ed2bf6d into containerd:master Sep 11, 2018
@Ace-Tang Ace-Tang deleted the fix_delete_lock branch September 11, 2018 14:41
@Random-Liu
Copy link
Member

Random-Liu commented Oct 2, 2018

I think this can help fix #2624. There might still be process/goroutine leakage, but at least won't make containerd-shim hang.

Should we cherry-pick this into 1.1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants