Skip to content

Conversation

@aaronlehmann
Copy link

This PR adds support for a "rollback" failure action. When this is selected, the orchestrator will automatically roll back to the previous version of a service if an update hits the failure threshold.

It adds the following options to control the behavior of rollbacks. They are kept separate from the existing --update-* options, since there are situations where the behavior should be different for rollbacks. For example, an administrator may wish to roll out a new version of a service very slowly, but if failure thresholds are exceeded, switch back to the old version quickly. Arguably, we may be able to live with hardcoded defaults for some of these options, but it seemed better to stay consistent with the --update-* options than to introduce arbitrary differences.

  • --rollback-delay
  • --rollback-failure-action
  • --rollback-max-failure-ratio
  • --rollback-monitor
  • --rollback-parallelism

I'm open to reducing the set of options if desired, but at a minimum we should keep --rollback-delay and --rollback-parallelism.

This PR also switches the manually invoked service update --rollback to work on the server side, instead of being a purely client side operation. This allows manually-initiated rollbacks to respect the new rollback parameters, and it also has the benefits of avoiding some nasty hacks we previously used. The client is version-aware, so it will still use the old method against an older daemon.

Note that --rollback has been changed so that it can't be used in combination with other service update flags. This is a compromise to avoid adding further hacks. While it's unfortunate to be restricting this functionality, it does make the implementation much cleaner.

cc @aluzzardi @dongluochen

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add "|rollback"?
📝

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks

@aaronlehmann aaronlehmann force-pushed the automatic-rollback branch 2 times, most recently from 4d498a6 to 9b1bbf9 Compare February 17, 2017 18:08
@aaronlehmann
Copy link
Author

Rebased

api/swagger.yaml Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=> Amount of time between rollback iterations,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the value from GRPC is unrecognized, there is nothing we can really do here. We could return an error, but I think that would be too extreme. This is not changing the old behavior, just reorganizing the code and adding a new case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this PR is mixed with rollback change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was causing problems testing the change. I can break it out to a separate PR if you want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine if this will be merged soon. This PR might be cherry-picked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should not block the shutdown because a failure in the deactivate service binding, unless user is given a command to fix it. I was just looking at it, while I came to know from @dongluochen this was taken care in this PR as a side change.

Given this PR may take some time to go through, I am also thinking it would be good to have a separate PR which can be quickly merged.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled this part out into #31279

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(pause|continue|rollback) => (pause|continue)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, this was a rebase issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no rollback on rollback failure.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, this was a rebase issue.

@aaronlehmann
Copy link
Author

@dongluochen: Addressed your comments, thanks.

@dongluochen
Copy link
Contributor

LGTM

@aluzzardi
Copy link
Member

LGTM

@aaronlehmann
Copy link
Author

Rebased

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🐯

@yongtang
Copy link
Member

yongtang commented Mar 3, 2017

Docs change LGTM.

@yongtang
Copy link
Member

yongtang commented Mar 3, 2017

bash and zsh completion could be done in the follow up PRs.

@yongtang
Copy link
Member

yongtang commented Mar 3, 2017

I think it might be good to add a simple example section in the docs, and docs/api/version-history.md may need to be updated to reflect the changes in API.

Also, the PR needs a rebase.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Aaron Lehmann added 2 commits March 3, 2017 16:33
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Server-side rollback can take advantage of the rollback-specific update
parameters, instead of being treated as a normal update that happens to
go back to a previous version of the spec.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
@aaronlehmann
Copy link
Author

aaronlehmann commented Mar 4, 2017

Thanks @yongtang. I've updated the API version history and added some content to service_update.md.

@yongtang
Copy link
Member

yongtang commented Mar 4, 2017

LGTM. Thanks!

@yongtang yongtang merged commit 3385658 into moby:master Mar 4, 2017
@GordonTheTurtle GordonTheTurtle added this to the 17.04.0 milestone Mar 4, 2017
@yongtang
Copy link
Member

yongtang commented Mar 4, 2017

/cc @albers @sdurrheimer for bash/zsh completion.

@aaronlehmann aaronlehmann deleted the automatic-rollback branch March 4, 2017 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants