Skip to content

[release/1.2 backport] Fix container pid race condition#4025

Merged
fuweid merged 1 commit intocontainerd:release/1.2from
thaJeztah:1.2_backport_fix_container_pid
Feb 18, 2020
Merged

[release/1.2 backport] Fix container pid race condition#4025
fuweid merged 1 commit intocontainerd:release/1.2from
thaJeztah:1.2_backport_fix_container_pid

Conversation

@thaJeztah
Copy link
Copy Markdown
Member

@thaJeztah thaJeztah commented Feb 14, 2020

alternative for #4024 ([1.2] Revert #3366)
closes #4024 ([1.2] Revert #3366)

backport of:

++>>>>>>> a6b6097c9... Fix container pid.
diff --cc runtime/v1/linux/proc/utils.go
index 08fefdb3e,3c4661770..000000000
--- a/runtime/v1/linux/proc/utils.go
+++ b/runtime/v1/linux/proc/utils.go
@@@ -34,6 -36,13 +34,16 @@@ import
        "golang.org/x/sys/unix"
  )

++<<<<<<< HEAD:runtime/v1/linux/proc/utils.go
++=======
+ const (
+       // RuncRoot is the path to the root runc state directory
+       RuncRoot = "/run/containerd/runc"
+       // InitPidFile name of the file that contains the init pid
+       InitPidFile = "init.pid"
+ )
+
++>>>>>>> a6b6097c9... Fix container pid.:pkg/process/utils.go
  // safePid is a thread safe wrapper for pid.
  type safePid struct {
        sync.Mutex

Resolved by removing the StoppedPID const from runtime/v1/linux/proc/process.go instead of pkg/process/utils.go

Also had to make a small modification to the test, because #3417 is not in the 1.2 branch;

diff --git a/container_test.go b/container_test.go
index 2a0828f4c..aa04a31ad 100644
--- a/container_test.go
+++ b/container_test.go
@@ -1540,7 +1540,7 @@ func TestShortRunningTaskPid(t *testing.T) {

        var (
                image       Image
-               ctx, cancel = testContext(t)
+               ctx, cancel = testContext()
                id          = t.Name()
        )
        defer cancel()

@thaJeztah thaJeztah mentioned this pull request Feb 14, 2020
@zhsj
Copy link
Copy Markdown
Contributor

zhsj commented Feb 14, 2020

Probably you mean to say "Closes: #4023 ([v1.2] regression in v1.2.12 leaves container/shim hanging)" in the PR description?

@thaJeztah
Copy link
Copy Markdown
Member Author

That line is further down below in the description (fixes #4023 ...)
I added closes #4024, so that the PR for which this is an alternative would close if we pick this one 😄

@thaJeztah
Copy link
Copy Markdown
Member Author

thaJeztah commented Feb 14, 2020

Hm.. I was afraid that would happen;

container_test.go:1543:29:warning: too many arguments (compile) (staticcheck)
container_test.go:1543:29:warning: too many arguments (compile) (staticcheck)
container_test.go:1543:29:warning: unused variable or constant too many arguments (varcheck)

I suspect we need #3417 to fix that, or make some changes to the test

Updating the test looks trivial, so let me go for that approach:

diff --git a/container_test.go b/container_test.go
index 2a0828f4c..aa04a31ad 100644
--- a/container_test.go
+++ b/container_test.go
@@ -1540,7 +1540,7 @@ func TestShortRunningTaskPid(t *testing.T) {

        var (
                image       Image
-               ctx, cancel = testContext(t)
+               ctx, cancel = testContext()
                id          = t.Name()
        )
        defer cancel()

@thaJeztah thaJeztah force-pushed the 1.2_backport_fix_container_pid branch from 622b91e to 710007a Compare February 14, 2020 10:32
@thaJeztah
Copy link
Copy Markdown
Member Author

Looks like there's failures with CGO_ENABLED=1. Trying to find what the failure is (CI output is hard to dig through)

@hakman
Copy link
Copy Markdown
Contributor

hakman commented Feb 14, 2020

@thaJeztah I think I have the same failures in #4015, so at least seems unrelated to the fix.

@thaJeztah
Copy link
Copy Markdown
Member Author

Thanks! Wasn't sure if it's related

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep safePid to prevent pid's data race and just remove StoppedPID and update kill logic. WDYT?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a backport, so if we want to change that, it's better to make those changes on master, then backport to release branches

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@fuweid
Copy link
Copy Markdown
Member

fuweid commented Feb 16, 2020

CI failure will be fixed by kubernetes-sigs/cri-tools#574

@zhsj
Copy link
Copy Markdown
Contributor

zhsj commented Feb 16, 2020

CI failure will be fixed by kubernetes-sigs/cri-tools#574

But cri-tools is pinned at v1.12.0

CRITEST_COMMIT=v1.12.0
go get -d github.com/kubernetes-incubator/cri-tools/...
cd $GOPATH/src/github.com/kubernetes-incubator/cri-tools
git checkout $CRITEST_COMMIT

Not sure why the test result differs over time.

@fuweid
Copy link
Copy Markdown
Member

fuweid commented Feb 17, 2020

CI failure will be fixed by kubernetes-sigs/cri-tools#574

But cri-tools is pinned at v1.12.0

CRITEST_COMMIT=v1.12.0
go get -d github.com/kubernetes-incubator/cri-tools/...
cd $GOPATH/src/github.com/kubernetes-incubator/cri-tools
git checkout $CRITEST_COMMIT

Not sure why the test result differs over time.

@zhsj yes. I think the kubernetes-sigs/cri-tools package is always pulled in the machine.

# Makefile from github.com/kubernetes-incubator/cri-tools
GO ?= go
PROJECT := github.com/kubernetes-sigs/cri-tools
BINDIR := /usr/local/bin

That is why that commit change doesn't work and it still using master.
I think we should change it from github.com/kubernetes-incubator/cri-tools to github.com/ kubernetes-sigs/cri-tools.

cc @mikebrow @Random-Liu

@thaJeztah thaJeztah closed this Feb 17, 2020
@thaJeztah thaJeztah reopened this Feb 17, 2020
@thaJeztah
Copy link
Copy Markdown
Member Author

AppVeyor is failing now; could be a Windows issue (we've seen issues with latest Windows patch releases in moby/moby)

exit status 1
FAIL	github.com/containerd/containerd	100.813s
Makefile:158: recipe for target 'integration' failed
mingw32-make: *** [integration] Error 1
Command exited with code 2

@thaJeztah thaJeztah closed this Feb 17, 2020
@thaJeztah thaJeztah reopened this Feb 17, 2020
@thaJeztah thaJeztah changed the title [release/1.2 backport] Fix container pid. [release/1.2 backport] Fix container pid race condition Feb 17, 2020
@thaJeztah thaJeztah force-pushed the 1.2_backport_fix_container_pid branch from 710007a to 88e633f Compare February 17, 2020 15:08
Signed-off-by: Lantao Liu <lantaol@google.com>
(cherry picked from commit a6b6097)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@thaJeztah thaJeztah force-pushed the 1.2_backport_fix_container_pid branch from 88e633f to b970987 Compare February 17, 2020 15:34
@codecov-io
Copy link
Copy Markdown

Codecov Report

Merging #4025 into release/1.2 will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##           release/1.2    #4025   +/-   ##
============================================
  Coverage        44.19%   44.19%           
============================================
  Files              100      100           
  Lines            10847    10847           
============================================
  Hits              4794     4794           
  Misses            5313     5313           
  Partials           740      740
Flag Coverage Δ
#linux 47.87% <ø> (ø) ⬆️
#windows 41% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 89c46ed...b970987. Read the comment docs.

@thaJeztah
Copy link
Copy Markdown
Member Author

Yay, green now!

Copy link
Copy Markdown
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
confirmed cherry-pick of #3857 to 1.2

Copy link
Copy Markdown
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fuweid fuweid merged commit b9b4fa7 into containerd:release/1.2 Feb 18, 2020
@thaJeztah thaJeztah deleted the 1.2_backport_fix_container_pid branch February 18, 2020 07:34
thaJeztah added a commit to thaJeztah/containerd that referenced this pull request Feb 18, 2020
The thirteenth patch release for `containerd` 1.2 fixes a regression introduced
in v1.2.12 that caused container/shim to hang on single core machines, fixes an
issue with blkio, and updates the Golang runtime to 1.12.17.

Notable Updates
----------------------------------

* Fix container pid race condition [containerd#4025](containerd#4025)
* Update containerd/cgroups dependency to address blkio issue [containerd#4001](containerd#4001)
* Set octet-stream content-type on PUT request [containerd#4028](containerd#4028)
* Pin to libseccomp 2.3.3 to preserve compatibility with hosts that do not have libseccomp 2.4 or higher installed [containerd#4015](containerd#4015)
* Update Golang runtime to 1.12.17, which includes a fix to the runtime [containerd#4031](containerd#4031)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants