Skip to content

fix pipe in broken may cause shim lock forever#2807

Merged
estesp merged 2 commits intocontainerd:masterfrom
lifubang:shimlockwhenstdinclose
Nov 20, 2018
Merged

fix pipe in broken may cause shim lock forever#2807
estesp merged 2 commits intocontainerd:masterfrom
lifubang:shimlockwhenstdinclose

Conversation

@lifubang
Copy link
Contributor

@lifubang lifubang commented Nov 18, 2018

Signed-off-by: Lifubang lifubang@acmcoder.com

There is an issue in moby: moby/moby#38064
It's hard to reproduce, but if you are lucky, you'll reproduce it successfully with probability about 0.1. Wish you success. If someone can write a go program with libcontainerd to reproduce it, it will help for us to debug it.

For container in created state, if pipe in broken, moby will call DeleteTask, and it will cause a wait in:

Because in

, if pipe out broken, it will set wg done.
But in
io.CopyBuffer(epollConsole, in, *bp)
, when pipe in broken, pipe out broken can't be detected, so it will not set wg done, it will cause shim lock.

@fuweid I know your team are working on shim lock improvement, PTAL.

@thaJeztah
Copy link
Member

Looks like there's a possible flaky test (https://ci.appveyor.com/project/mlaventure/containerd-3g73f/builds/20380056);

    --- FAIL: TestContainersCreateUpdateDelete/DeleteLabel (0.00s)
1012        containers_test.go:733: timestamp for updatedat not after createdat: 2018-11-18 01:28:10.0417508 +0000 UTC <= 2018-11-18 01:28:10.0417508 +0000 UTC

@lifubang lifubang force-pushed the shimlockwhenstdinclose branch from f3329f8 to 4b00f07 Compare November 18, 2018 11:39
@codecov-io
Copy link

codecov-io commented Nov 18, 2018

Codecov Report

Merging #2807 into master will decrease coverage by 3.67%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2807      +/-   ##
==========================================
- Coverage   47.41%   43.73%   -3.68%     
==========================================
  Files          91      100       +9     
  Lines        8411    10730    +2319     
==========================================
+ Hits         3988     4693     +705     
- Misses       3700     5307    +1607     
- Partials      723      730       +7
Flag Coverage Δ
#linux 47.41% <ø> (ø) ⬆️
#windows 40.89% <ø> (?)
Impacted Files Coverage Δ
snapshots/native/native.go 43.3% <0%> (-10%) ⬇️
metadata/snapshot.go 45.8% <0%> (-8.96%) ⬇️
archive/tar.go 43.13% <0%> (-6.87%) ⬇️
metadata/containers.go 47.97% <0%> (-6.62%) ⬇️
content/local/writer.go 57.84% <0%> (-6.36%) ⬇️
content/local/store.go 48.51% <0%> (-5.03%) ⬇️
remotes/docker/resolver.go 58.36% <0%> (-4.99%) ⬇️
archive/tar_opts.go 28.57% <0%> (-4.77%) ⬇️
archive/compression/compression.go 58.69% <0%> (-4.7%) ⬇️
metadata/images.go 58.46% <0%> (-4.7%) ⬇️
... and 56 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e1383c3...e76a887. Read the comment docs.

@lifubang
Copy link
Contributor Author

lifubang commented Nov 18, 2018

@thaJeztah It looks like a random error. Squash all commits make it success.

@thaJeztah
Copy link
Member

Yes, it looked like a flaky test, so your new push triggered CI again, and now it passes.

I'm not a maintainer of this repository, but it may be cleaner to keep the V1 and V2 changes as separate commits though, so that (if needed) the V1 change can be backported separately to older release branches 😅

@lifubang lifubang force-pushed the shimlockwhenstdinclose branch from 4b00f07 to b3438f7 Compare November 19, 2018 01:01
Signed-off-by: Lifubang <lifubang@acmcoder.com>
@lifubang lifubang force-pushed the shimlockwhenstdinclose branch from 9851472 to e76a887 Compare November 19, 2018 01:24
Signed-off-by: Lifubang <lifubang@acmcoder.com>
@crosbymichael
Copy link
Member

LGTM

@@ -49,9 +49,11 @@ func (p *linuxPlatform) CopyConsole(ctx context.Context, console console.Console
cwg.Add(1)
Copy link
Member

@fuweid fuweid Nov 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stdin fifo will be open twice in shim. stdin copy goroutine exit, when fifo is broken or closed by manual.

So, I prefer to remove the cwg.Done() and cwg.Add(1) from here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I can't reproduce it in my local 😂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the wrong comment.

@fuweid
Copy link
Member

fuweid commented Nov 20, 2018

The exit event handler will shutdown the console socket.

But if the Delete API holds lock before the exit event handler, the p.wg.Wait() will be dead end.

It doesn't like non-termnial that stdout pipe will be closed after process exits. It seem that there is no side effect to close console in stdin pipe side. So LGTM.

goroutine 21 [semacquire, 2 minutes]:
sync.runtime_SemacquireMutex(0xc00016c024, 0x0)
\t/usr/local/go/src/runtime/sema.go:71 +0x3d
sync.(*Mutex).Lock(0xc00016c020)
\t/usr/local/go/src/sync/mutex.go:134 +0xff
github.com/containerd/containerd/runtime/v1/linux/proc.(*Init).SetExited(0xc00016c000, 0x2)
\t/tmp/tmp.CH061j8I2a/src/github.com/containerd/containerd/runtime/v1/linux/proc/init.go:257 +0x3a
github.com/containerd/containerd/runtime/v1/shim.(*Service).checkProcesses(0xc00009e0c0, 0xbef4ff5f0a7606bc, 0x147bb13c, 0x97da20, 0x207a, 0x2)
\t/tmp/tmp.CH061j8I2a/src/github.com/containerd/containerd/runtime/v1/shim/service.go:525 +0x152
github.com/containerd/containerd/runtime/v1/shim.(*Service).processExits(0xc00009e0c0)
\t/tmp/tmp.CH061j8I2a/src/github.com/containerd/containerd/runtime/v1/shim/service.go:492 +0xbe
created by github.com/containerd/containerd/runtime/v1/shim.NewService
\t/tmp/tmp.CH061j8I2a/src/github.com/containerd/containerd/runtime/v1/shim/service.go:91 +0x3bf

goroutine 40 [semacquire, 2 minutes]:
sync.runtime_Semacquire(0xc00016c008)
\t/usr/local/go/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc00016c000)
\t/usr/local/go/src/sync/waitgroup.go:130 +0x64
github.com/containerd/containerd/runtime/v1/linux/proc.(*Init).delete(0xc00016c000, 0x764a80, 0xc00004e100, 0x67d557, 0xc0000b5c40)
\t/tmp/tmp.CH061j8I2a/src/github.com/containerd/containerd/runtime/v1/linux/proc/init.go:279 +0x42
github.com/containerd/containerd/runtime/v1/linux/proc.(*createdState).Delete(0xc00000c030, 0x764a80, 0xc00004e100, 0xc0000b5cd8, 0x631bb1)
\t/tmp/tmp.CH061j8I2a/src/github.com/containerd/containerd/runtime/v1/linux/proc/init_state.go:95 +0x46
github.com/containerd/containerd/runtime/v1/linux/proc.(*Init).Delete(0xc00016c000, 0x764a80, 0xc00004e100, 0x0, 0x0)
\t/tmp/tmp.CH061j8I2a/src/github.com/containerd/containerd/runtime/v1/linux/proc/init.go:275 +0x93
github.com/containerd/containerd/runtime/v1/shim.(*Service).Delete(0xc00009e0c0, 0x764a80, 0xc00004e100, 0x99a008, 0xc0001b5d38, 0x40bc88, 0x20)
\t/tmp/tmp.CH061j8I2a/src/github.com/containerd/containerd/runtime/v1/shim/service.go:210 +0x77
github.com/containerd/containerd/runtime/v1/shim/v1.RegisterShimService.func4(0x764a80, 0xc00004e100, 0xc00026dd60, 0xc000269c58, 0x6, 0xc000068800, 0x0)
\t/tmp/tmp.CH061j8I2a/src/github.com/containerd/containerd/runtime/v1/shim/v1/shim.pb.go:1628 +0xc5
github.com/containerd/containerd/vendor/github.com/containerd/ttrpc.(*serviceSet).dispatch(0xc000076060, 0x764a80, 0xc00004e100, 0xc00006e720, 0x25, 0xc000269c58, 0x6, 0x0, 0x0, 0x0, ...)
\t/tmp/tmp.CH061j8I2a/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/services.go:87 +0x109
github.com/containerd/containerd/vendor/github.com/containerd/ttrpc.(*serviceSet).call(0xc000076060, 0x764a80, 0xc00004e100, 0xc00006e720, 0x25, 0xc000269c58, 0x6, 0x0, 0x0, 0x0, ...)
\t/tmp/tmp.CH061j8I2a/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/services.go:60 +0xb5
github.com/containerd/containerd/vendor/github.com/containerd/ttrpc.(*serverConn).run.func2(0xc0001100f0, 0x764a80, 0xc00004e100, 0xb, 0xc000279d80, 0xc0000140c0, 0xc000014180, 0xc00000000b)
\t/tmp/tmp.CH061j8I2a/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/server.go:417 +0xa7
created by github.com/containerd/containerd/vendor/github.com/containerd/ttrpc.(*serverConn).run
\t/tmp/tmp.CH061j8I2a/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/server.go:416 +0x6fc

@Ace-Tang
Copy link
Contributor

I think I meet this problem, container can not be delete, and it wait in p.wg.Wait() when I get the golang stack, not know why this happend.

Copy link
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants