Skip to content

Fix race conditions in libcontainerd process handling#35809

Merged
thaJeztah merged 2 commits intomoby:masterfrom
cpuguy83:fix_container_zombies
Dec 16, 2017
Merged

Fix race conditions in libcontainerd process handling#35809
thaJeztah merged 2 commits intomoby:masterfrom
cpuguy83:fix_container_zombies

Conversation

@cpuguy83
Copy link
Member

@cpuguy83 cpuguy83 commented Dec 15, 2017

  • Fix some missing synchronization in libcontainerd
  • Fix error handling for kill/process not found

With the contianerd 1.0 migration we now have strongly typed errors that
we can check for process not found.
We also had some bad error checks looking for ESRCH which would only
be returned from unix.Kill and never from containerd even though we
were checking containerd responses for it.

Fixes some race conditions around process handling and our error checks
that could lead to errors that propagate up to the user that should not.

Related to #35594

With the contianerd 1.0 migration we now have strongly typed errors that
we can check for process not found.
We also had some bad error checks looking for `ESRCH` which would only
be returned from `unix.Kill` and never from containerd even though we
were checking containerd responses for it.

Fixes some race conditions around process handling and our error checks
that could lead to errors that propagate up to the user that should not.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
@crosbymichael
Copy link
Contributor

LGTM

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
@vieux
Copy link
Contributor

vieux commented Dec 16, 2017

LGTM

@thaJeztah
Copy link
Member

z failure looks unrelated;

01:53:24 ----------------------------------------------------------------------
01:53:24 FAIL: check_test.go:366: DockerSwarmSuite.TearDownTest
01:53:24 
01:53:24 check_test.go:371:
01:53:24     d.Stop(c)
01:53:24 daemon/daemon.go:395:
01:53:24     t.Fatalf("Error while stopping the daemon %s : %v", d.id, err)
01:53:24 ... Error: Error while stopping the daemon db005154c4e62 : exit status 130
01:53:24 
01:53:24 
01:53:24 ----------------------------------------------------------------------
01:53:24 PANIC: docker_api_swarm_service_test.go:201: DockerSwarmSuite.TestAPISwarmServicesUpdateStartFirst
01:53:24 
01:53:24 [db005154c4e62] waiting for daemon to start
01:53:24 [db005154c4e62] daemon started
01:53:24 
01:53:24 [db005154c4e62] daemon started
01:53:24 Attempt #2: daemon is still running with pid 12408
01:53:24 Attempt #3: daemon is still running with pid 12408
01:53:24 Attempt #4: daemon is still running with pid 12408
01:53:24 [db005154c4e62] exiting daemon
01:53:24 ... Panic: Fixture has panicked (see related PANIC)
01:53:24 
01:53:24 ----------------------------------------------------------------------

restarting for good measure

@jchauncey
Copy link

So this doesnt seem to have fixed my problem =\

@jchauncey
Copy link

nevermind this has definitely solved the problem i was having with jenkins and docker.

@thaJeztah
Copy link
Member

Thanks for testing @jchauncey 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants