Skip to content

Update Container Wait#32237

Merged
thaJeztah merged 2 commits intomoby:masterfrom
jlhawn:update_container_wait
May 17, 2017
Merged

Update Container Wait#32237
thaJeztah merged 2 commits intomoby:masterfrom
jlhawn:update_container_wait

Conversation

@jlhawn
Copy link
Contributor

@jlhawn jlhawn commented Mar 30, 2017

Summary

The goal for this PR is to be able to use the POST /containers/{name}/wait API in order to no longer rely on the Events API when doing docker run ... and docker start in attached mode. Here is the set of behavioral changes in the API and CLI:

  • The wait API now has on optional query parameter, condition, which specifies the state of the container which the client would like to wait for:

    • not-running

      This is the default value which maintains the existing behavior of the Wait API. It blocks if the container is in a state which is considered to be running, i.e., any of running, paused, or restarting. If it is already not in any of those states, then it would not block at all and return immediately with the current exit code.

    • next-exit

      This condition allows the client to wait for the container's state to change to a not-running state or to be removed. This is different from the not-running option because it will block if the container is in the created, exited, dead, or removing state, blocking through a transition to one of the running states and back to one of exited states or until the container is removed, whichever comes first. It is similar to waiting for a die or destroy event. This option is very useful to use before issuing a container start API request.

    • removed

      This condition allows the client to wait for the container to be removed. It is similar to waiting for a destroy event.

  • The Wait API will now return HTTP response headers immediately. This serves an an acknowledgement that a wait has been registered. The JSON response body will not be returned until the wait condition is met.

  • The Wait API's request context is extended to wait for a http.CloseNotifier event in the case that the client closes the connection while the server is blocked on the wait condition. This prevents leaked goroutines but should only be necessary until we switch to using Go 1.8's request context which cancels automatically if the client disconnects.

  • To handle a manual detach (when the user enters the detach keys sequence when attached in TTY mode), the CLI now detects this key sequence, stops sending stdin, and causes the CLI to exit immediately. This is done as an alternative to waiting for a detach event. The rug has been pulled out from under this bullet point. This will have to be done in the github.com/docker/cli repo along with the other CLI changes.

Follow-up to #31794

Fixes #32242

@jlhawn jlhawn force-pushed the update_container_wait branch from 119fb19 to b4532ac Compare March 30, 2017 21:31
@tonistiigi tonistiigi changed the title Update Container Wait [wip] Update Container Wait Mar 30, 2017
@tonistiigi
Copy link
Member

@WeiZhang555 @cpuguy83 any concerns for the approach?

@jlhawn jlhawn force-pushed the update_container_wait branch from b4532ac to f344448 Compare March 30, 2017 23:40
@WeiZhang555
Copy link
Contributor

The design looks good, I like it :-)
#31744 seems need to reimplement based on this, it could be simplified a lot with this PR.

@cpuguy83
Copy link
Member

@WeiZhang555 Indeed, however we still need to support old daemons.

@jlhawn jlhawn force-pushed the update_container_wait branch from f344448 to 4d72b20 Compare March 31, 2017 02:17
@WeiZhang555
Copy link
Contributor

@cpuguy83 Yep, you're right, then we have to keep both waitExitOrRemoved and this, backward compatibility makes this so complicated 😢

@jlhawn
Copy link
Contributor Author

jlhawn commented Mar 31, 2017

Well, at least janky seems happy with this first commit. I'll have to look into the failures with windowsRS1 and z tests later. I'll push the second commit now.

@jlhawn jlhawn force-pushed the update_container_wait branch 3 times, most recently from bf5dbce to 0a3032b Compare March 31, 2017 16:20
@jlhawn jlhawn force-pushed the update_container_wait branch 3 times, most recently from 0348904 to c8cf3fa Compare March 31, 2017 21:10
@jlhawn jlhawn changed the title [wip] Update Container Wait Update Container Wait Mar 31, 2017
@jlhawn
Copy link
Contributor Author

jlhawn commented Mar 31, 2017

I've removed the [WIP] label from this PR and made the necessary changes to the API client and docs. Please take a look. I've also updated the waitExitOrRemoved() function in a way that maintains compatibility with older versions.

While all of the integration tests pass locally for me, I have seen some flakiness in a few tests that Jenkins runs. They mostly get this error:

Cannot kill container {ContainerID}: Container {ContinerID} is not running

But I can't tell if it's related to my changes or not since I tried not to modify container state/lifecycle in any way.

Copy link
Contributor Author

@jlhawn jlhawn Mar 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I forgot to do statusC <- 125 here.

@jlhawn jlhawn force-pushed the update_container_wait branch 2 times, most recently from 64f354d to a43fd26 Compare April 1, 2017 01:04
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be using b.clientCtx instead of creating a new context?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same question.

Why does it return a container.ContainerWaitOKBody channel instead of just blocking?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of duplicating the blocking on channel and returning lines I think you could put this block in an if !legacyBehavior {...}

@jlhawn
Copy link
Contributor Author

jlhawn commented May 15, 2017

I'm currently rebasing and addressing @dnephin's comments. I want to expand on one of them since it has been asked already multiple times:

Why does [the ContainerWait client method] return a container.ContainerWaitOKBody channel instead of just blocking?

This is done because it seemed like to quickest way to meet the new requirements we need for the Container Wait API which is to enable this flow:

  1. Create a container in the case of docker run ... or have the container already exist in an exited state in the case of docker start -a ....
  2. Begin waiting for the container's next exit code (i.e., after it starts) or for it to be removed (in the case of docker run --rm ....
  3. Get acknowledgement from the daemon that it has begun waiting for that condition.
  4. Start the container.
  5. Continue to block on that wait condition.
  6. Some time later, the container exits.
  7. Receive the exit code.
  • Point 2 is the reason why there is a next-exit wait condition.
  • Point 3 is the reason why the API handler now returns response headers immediately.
  • Being able to sync point 4 is the reason why the Client method returns a channel for the wait result - to be able to propagate the ack from point 3 and start the container without a race condition: If we didn't have a way of ensuring the order of 3 and 4 then it would be possible that the container is started and exits before the wait actually begins on the backend in which case 5 would block indefinitely.

I'm willing to change it to whatever we can all agree on. An alternative I have thought of is to go back to returning (exitCode int, err error) but to get the acknowledgement I mentioned we would need to have another argument to the method which is a channel which is maybe closed when the headers come back. That seems like an even more confusing interface to me though. Please make other suggestions if you have any ideas.

@jlhawn jlhawn force-pushed the update_container_wait branch 2 times, most recently from a52c13c to ec867d4 Compare May 15, 2017 19:48
Copy link
Member

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

A bit light on tests, but I guess we can address that later.

Are all 12 commits still relevant? Do you think some of them could be squashed now that this is ready to merge?

Josh Hawn added 2 commits May 16, 2017 15:09
This patch consolidates the two WaitStop and WaitWithContext methods
on the container.State type. Now there is a single method, Wait, which
takes a context and a bool specifying whether to wait for not just a
container exit but also removal.

The behavior has been changed slightly so that a wait call during a
Created state will not return immediately but instead wait for the
container to be started and then exited.

The interface has been changed to no longer block, but instead returns
a channel on which the caller can receive a *StateStatus value which
indicates the ExitCode or an error if there was one (like a context
timeout or state transition error).

These changes have been propagated through the rest of the deamon to
preserve all other existing behavior.

Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
This patch adds the untilRemoved option to the ContainerWait API which
allows the client to wait until the container is not only exited but
also removed.

This patch also adds some more CLI integration tests for waiting for a
created container and waiting with the new --until-removed flag.

Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)

Handle detach sequence in CLI

Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)

Update Container Wait Conditions

Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)

Apply container wait changes to API 1.30

The set of changes to the containerWait API missed the cut for the
Docker 17.05 release (API version 1.29). This patch bumps the version
checks to use 1.30 instead.

This patch also makes a minor update to a testfile which was added to
the builder/dockerfile package.

Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)

Remove wait changes from CLI

Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)

Address minor nits on wait changes

- Changed the name of the tty Proxy wrapper to `escapeProxy`
- Removed the unnecessary Error() method on container.State
- Fixes a typo in comment (repeated word)

Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)

Use router.WithCancel in the containerWait handler

This handler previously added this functionality manually but now uses
the existing wrapper which does it for us.

Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)

Add WaitCondition constants to api/types/container

Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)

Address more ContainerWait review comments

- Update ContainerWait backend interface to not return pointer values
  for container.StateStatus type.
- Updated container state's Wait() method comments to clarify that a
  context MUST be used for cancelling the request, setting timeouts,
  and to avoid goroutine leaks.
- Removed unnecessary buffering when making channels in the client's
  ContainerWait methods.
- Renamed result and error channels in client's ContainerWait methods
  to clarify that only a single result or error value would be sent
  on the channel.

Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)

Move container.WaitCondition type to separate file

... to avoid conflict with swagger-generated code for API response

Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)

Address more ContainerWait review comments

Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
@jlhawn jlhawn force-pushed the update_container_wait branch from ec867d4 to 4921171 Compare May 16, 2017 22:13
@jlhawn
Copy link
Contributor Author

jlhawn commented May 16, 2017

Rebased and squashed, it's now just 2 commits.

@thaJeztah
Copy link
Member

Thanks!

@thaJeztah
Copy link
Member

All 💚

@jlhawn
Copy link
Contributor Author

jlhawn commented May 17, 2017

The CLI changes are now in a PR against the docker/cli repo here: docker/cli#101

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docker run can't rely on the Events API.

9 participants