Skip to content

Fix process locking and state management#2773

Merged
crosbymichael merged 1 commit intocontainerd:masterfrom
crosbymichael:state-locking
Nov 16, 2018
Merged

Fix process locking and state management#2773
crosbymichael merged 1 commit intocontainerd:masterfrom
crosbymichael:state-locking

Conversation

@crosbymichael
Copy link
Member

There were races with the way process states. This displayed in ways,
especially around pausing the container for atomic operations. Users
would get errors like, cannnot delete container in paused state and
such.

This can be eaisly reproduced with docker and the following command:

> (for i in `seq 1 25`; do id=$(docker create  alpine usleep 50000);docker start $id;docker commit $id;docker wait $id;docker rm $id; done)

This two issues that this fixes are:

  • locks must be held by the owning process, not the state operations.
  • If a container ends up being paused but before the operation
    completes, the process exists, make sure we resume the container before
    setting the the process as exited.

Signed-off-by: Michael Crosby crosbymichael@gmail.com

@crosbymichael
Copy link
Member Author

This was still an implicit issue in 1.x branches, however, it recovered because of the killAll changes when deleting container. That was cleaned and tightened up in 1.2 and became a visible issue.

@crosbymichael crosbymichael added this to the 1.2.1 milestone Nov 9, 2018
There were races with the way process states.  This displayed in ways,
especially around pausing the container for atomic operations.  Users
would get errors like, cannnot delete container in paused state and
such.

This can be eaisly reproduced with `docker` and the following command:

```bash
> (for i in `seq 1 25`; do id=$(docker create  alpine usleep 50000);docker start $id;docker commit $id;docker wait $id;docker rm $id; done)
```

This two issues that this fixes are:

* locks must be held by the owning process, not the state operations.
* If a container ends up being paused but before the operation
completes, the process exists, make sure we resume the container before
setting the the process as exited.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
@codecov-io
Copy link

Codecov Report

Merging #2773 into master will decrease coverage by 2.82%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2773      +/-   ##
==========================================
- Coverage   43.72%   40.89%   -2.83%     
==========================================
  Files         100       69      -31     
  Lines       10734     9388    -1346     
==========================================
- Hits         4693     3839     -854     
+ Misses       5311     4992     -319     
+ Partials      730      557     -173
Flag Coverage Δ
#linux ?
#windows 40.89% <ø> (ø) ⬆️
Impacted Files Coverage Δ
snapshots/native/native.go 1.78% <0%> (-41.52%) ⬇️
archive/tar.go 16.99% <0%> (-26.15%) ⬇️
metadata/snapshot.go 21.53% <0%> (-24.28%) ⬇️
content/local/writer.go 56.86% <0%> (-0.99%) ⬇️
gc/scheduler/scheduler.go 66.34% <0%> (-0.97%) ⬇️
oci/spec_opts.go 20.49% <0%> (-0.28%) ⬇️
mount/temp_unix.go
sys/reaper_linux.go
services/server/server_linux.go
sys/env.go
... and 27 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 130d07e...831a41b. Read the comment docs.

if err := s.p.delete(ctx); err != nil {
return err
}
s.p.mu.Lock()
Copy link
Member

@Random-Liu Random-Liu Nov 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just point out that the lock location is changed in this PR. The lock was moved once in #2757.

But I think the move in this PR is fine, because grabbing process lock only potentially blocks operations to the process. As long as it doesn't block the whole shim, it is fine.

Actually the other fix in #2757 is more important.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let me think about it some

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the change made by this PR is fine. Not grabbing shim lock is more important. :)

Copy link
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ehazlett
Copy link
Member

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants