[WIP] Respect restart policy during service reconciliation by aaronlehmann · Pull Request #2290 · moby/swarmkit

aaronlehmann · 2017-06-28T00:55:11Z

Currently, service reconciliation (updating a service, leader
reelection, etc) will cause a service's tasks to be restarted even if
the restart policy says they shouldn't be. This is because the service
definition says there should be, for example, 5 replicas, and the
control loop tries to make this happen.

This experiments with the idea of keeping tasks that we don't want to
restart in the desired state "running" instead of "shutdown". This can
be a starting point for evaluating the tradeoffs and deciding if this is
a good approach.

Known caveats so far:

Draining a node has to set the desired state of tasks on that node to
"shutdown", regardless of the restart policy. But that means the
orchestrator may eventually restart those tasks. So draining triggers
the same issue we had before.
When rolling back a partial service update, tasks with the original
service definition that happen to have failed won't be restarted. I'm
not sure this is the correct behavior, because I would expect a service
rollback to leave the service in a converged state. -- Fixed by
setting all failed tasks' desired state to Shutdown on a rollback.
Global services have tasks shut down on nodes that go down. This is
necessary to prevent rolling updates from getting stuck on those
nodes. But it means that a node going down and coming back up will
cause a restart of its global service tasks, even if that does not
match the restart policy. -- Fixed by excluding tasks on down
down nodes from the list passed to the updater instead of actually
shutting them down.
This breaks the constraint enforcer, which tries to shut down tasks by
setting their observed state to Rejected. It expects the orchestrator to
set the desired state to Shutdown, which will trigger the actual
shutdown. It modifies observed state instead of desired state to avoid
fighting with the orchestrator over the DesiredState field.

Possible solution for #932

cc @aluzzardi

codecov · 2017-06-28T01:06:10Z

Codecov Report

Merging #2290 into master will decrease coverage by 0.01%.
The diff coverage is 42.55%.

@@            Coverage Diff             @@
##           master    #2290      +/-   ##
==========================================
- Coverage   60.47%   60.45%   -0.02%     
==========================================
  Files         128      128              
  Lines       26002    26037      +35     
==========================================
+ Hits        15724    15740      +16     
- Misses       8884     8912      +28     
+ Partials     1394     1385       -9

thaJeztah · 2017-07-13T03:05:43Z

ping @aluzzardi PTAL!

cyli · 2017-07-19T21:32:11Z

For the first (draining) and third case (global services, node goes down) - would having a separate state like "shutdownNoRestart" (that's a terrible name I know) help? This distinguishes it from a shutdown state that can be replaced.

I agree on rolling back - it seems like that's a service update, which seems like it should reset the restart policy history? #2304 resets the restart history on spec update - can we ignore any tasks with the wrong service spec version?

aaronlehmann · 2017-07-19T22:17:16Z

For the first (draining) and third case (global services, node goes down) - would having a separate state like "shutdownNoRestart" (that's a terrible name I know) help? This distinguishes it from a shutdown state that can be replaced.

Yes, I think having a separate state would work. It does feel a little hacky so I wonder if there's another way, but I admit I can't think of anything better right now.

I agree on rolling back - it seems like that's a service update, which seems like it should reset the restart policy history? #2304 resets the restart history on spec update - can we ignore any tasks with the wrong service spec version?

I don't think spec versioning helps us here because rolling back goes back to the old spec version (see #2309). The issue is that the orchestrator might not restart tasks that failed prior to updating and rolling back. In a way, this is the right decision because the task is set up not to restart, and it matches the current (rolled-back) version of the service. But this is a bit of a special case, because we'd expect rolling back to reset everything.

cyli · 2017-07-19T22:21:13Z

I don't think spec versioning helps us here because rolling back goes back to the old spec version (see #2309). The issue is that the orchestrator might not restart tasks that failed prior to updating and rolling back.

Sorry, I meant if we roll back, can we delete tasks with that previous version in the history (leaving only the failed or succeeded tasks with the new task version)? That way, there is nothing to restart per se. Definitely is slightly hacky, and it wipes out a chunk of existing history, so I definitely understand why it may not be desirable.

aaronlehmann · 2017-07-19T22:26:01Z

Sorry, I meant if we roll back, can we delete tasks with that previous version in the history (leaving only the failed or succeeded tasks with the new task version)? That way, there is nothing to restart per se. Definitely is slightly hacky, and it wipes out a chunk of existing history, so I definitely understand why it may not be desirable.

I would not want to wipe history as part of this, but one possibility might be to set the desired state of these leftover tasks to "shutdown" as part of a rollback. Then the reconciliation process would cause them to get restarted, just as they do without this PR.

It's a bit of a hack, but I think it's not as bad a deleting tasks.

cyli · 2017-07-19T22:41:17Z

@aaronlehmann That seems like would work, too.

aaronlehmann · 2017-07-19T22:59:07Z

It's a bit harder than I first thought. This pass over old tasks could race with a restart of one of them (it might have just failed, and the orchestrator is about to call Restart on it to change the desired state). I'm reluctant to add a second place that's shutting down tasks. We used to have some problems due to this, but now that Restart is the only place that sets DesiredState to Shutdown, it seems to work well.

Technically there wouldn't be a risk of the updater interfering with Restart if it only does this in cases where Restart would not change the desired state (i.e. shouldRestart returns false). But that involves coordination with the restart supervisor anyway - shouldRestart is relatively complex, and relies on internal state from the restart supervisor.

cyli · 2017-07-20T21:58:14Z

manager/orchestrator/global/global.go

-				switch v.Node.Status.State {
-				// NodeStatus_DISCONNECTED is a transient state, no need to make any change
-				case api.NodeStatus_DOWN:
+				// NodeStatus_DISCONNECTED is a transient state, no need to remove tasks


Non-blocking nit: NodeStatus_DOWN in this comment (I guess the name got changed at some point)

The comment is correct - it explains why NodeStatus_DISCONNECTED isn't handled specially.

Oh oops, thank you!

Currently, service reconciliation (updating a service, leader reelection, etc) will cause a service's tasks to be restarted even if the restart policy says they shouldn't be. This is because the service definition says there should be, for example, 5 replicas, and the control loop tries to make this happen. This experiments with the idea of keeping tasks that we don't want to restart in the desired state "running" instead of "shutdown". This can be a starting point for evaluating the tradeoffs and deciding if this is a good approach. Known caveats so far: - Draining a node has to set the desired state of tasks on that node to "shutdown", regardless of the restart policy. But that means the orchestrator may eventually restart those tasks. So draining triggers the same issue we had before. - When rolling back a partial service update, tasks with the original service definition that happen to have failed won't be restarted. I'm not sure this is the correct behavior, because I would expect a service rollback to leave the service in a converged state. - Global services have tasks shut down on nodes that go down. This is necessary to prevent rolling updates from getting stuck on those nodes. But it means that a node going down and coming back up will cause a restart of its global service tasks, even if that does not match the restart policy. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>

aaronlehmann · 2017-07-21T00:11:51Z

Brought this up to date with recent changes underneath orchestrator.

This ensures that tasks that failed before the update started won't be skipped by the rollback. The rollback should lead to a converged state, instead of leaving failed tasks in a failed state (which might otherwise happen if the restart policy says not to restart the tasks). Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>

aaronlehmann · 2017-07-21T00:49:48Z

It's a bit harder than I first thought. This pass over old tasks could race with a restart of one of them (it might have just failed, and the orchestrator is about to call Restart on it to change the desired state). I'm reluctant to add a second place that's shutting down tasks. We used to have some problems due to this, but now that Restart is the only place that sets DesiredState to Shutdown, it seems to work well.

After thinking about this a little more, I actually think it's not an issue. This would happen immediately before triggering a service reconciliation, so even if we did race with a restart in some way, the effect would be extremely temporary.

I've added a commit that implements this.

aaronlehmann · 2017-07-21T01:04:50Z

Global services have tasks shut down on nodes that go down. This is necessary to prevent rolling updates from getting stuck on those nodes. But it means that a node going down and coming back up will cause a restart of its global service tasks, even if that does not match the restart policy.

Fixed by not actually shutting down tasks when the node goes down, but instead excluding them from the list of tasks passed to the updater. I think this is cleaner anyway.

Just skip them from the list of tasks passed to the updater. When the node becomes available again, it will trigger node reconciliation, which will correct whatever was skipped. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>

…dState set to Shutdown Instead, let the restart supervisor decide whether to change the desired state or leave it at Running. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>

aaronlehmann · 2017-07-21T18:16:52Z

Adding another caveat:

This breaks the constraint enforcer, which tries to shut down tasks by setting their observed state to Rejected. It expects the orchestrator to set the desired state to Shutdown, which will trigger the actual shutdown. It modifies observed state instead of desired state to avoid fighting with the orchestrator over the DesiredState field.

If we had a fix for this, it would open up a pretty elegant fix for the other outstanding caveat (node draining). The constraint enforcer could handle draining nodes as just another case of a node not meeting the "constraints". This isn't really so radical - in the early swarmkit prototype we had a "drainer" component that worked separately from the orchestrator. I'm experimenting locally with moving draining to the constraint enforcer, and it makes the code very nice. It removes the extra logic I added in the first commit of this PR around forcing a shutdown in the restart manager in some cases. Unfortunately, I went off on this path before I realized that the constraint enforcer doesn't actually work correctly under this change to begin with...

I'd love to find a complete solution to move this forward and finally fix #932. I'll try to continue looking at this.

aaronlehmann · 2017-07-21T18:41:12Z

What about fundamentally reconsidering the approach in this PR, and instead of keeping tasks that we don't want to restart in the desired state "running" instead of "shutdown", we add a flag to Task that indicates the task shouldn't be restarted? The flag would be set by the restart supervisor when it shuts down a task and decides not to start a replacement. When the orchestrator does service-level reconciliation, it would not start new tasks to replace any tasks where DesiredState == Shutdown && DoNotRestart == true. This seems a lot simpler than all the changes in this PR, and I think it would address all the corner cases that have been showing up.

aaronlehmann · 2017-07-21T18:51:19Z

What about fundamentally reconsidering the approach in this PR, and instead of keeping tasks that we don't want to restart in the desired state "running" instead of "shutdown", we add a flag to Task that indicates the task shouldn't be restarted?

This kind of goes against my own policy of not depending on task history. Maybe it's a bad idea.

I suppose we could modify the task reaper so that it will never delete the last dead task from a slot if it has the DontRestart flag set. Maybe not so horrible?

aluzzardi · 2017-07-24T22:07:00Z

This kind of goes against my own policy of not depending on task history. Maybe it's a bad idea.

I agree - I think it's too "special cased" and a bit complex.

A bit naive, but does it matter in reconciliation how many are in desired=running? Can't we just make sure that we have enough slots and handle restarts per slot?

aaronlehmann · 2017-07-24T22:26:40Z

I agree - I think it's too "special cased" and a bit complex.

Special cased, perhaps, but I think it's a lot less complex that what's being explored here. The PR that implements the flag in the task is #2327.

A bit naive, but does it matter in reconciliation how many are in desired=running? Can't we just make sure that we have enough slots and handle restarts per slot?

Sorry, I don't understand this part.

aaronlehmann · 2017-07-27T21:00:05Z

Superseded by #2332

sean-fabri mentioned this pull request Jul 7, 2017

Resetting restart policy state moby/moby#34007

Closed

aaronlehmann force-pushed the robust-restart-policy branch from a86b69f to c40945f Compare July 13, 2017 01:02

aaronlehmann mentioned this pull request Jul 19, 2017

global: Allow updates of failed services with restart policy "none" #2288

Merged

aaronlehmann force-pushed the robust-restart-policy branch from c40945f to 2b51a16 Compare July 19, 2017 23:02

cyli reviewed Jul 20, 2017

View reviewed changes

aaronlehmann force-pushed the robust-restart-policy branch from 2b51a16 to caab791 Compare July 21, 2017 00:11

aaronlehmann force-pushed the robust-restart-policy branch from f0ce829 to 1741154 Compare July 21, 2017 00:48

Aaron Lehmann added 2 commits July 20, 2017 18:16

Don't remove global service tasks from down nodes

8629ad6

Just skip them from the list of tasks passed to the updater. When the node becomes available again, it will trigger node reconciliation, which will correct whatever was skipped. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>

Failed global tasks on paused nodes shouldn't necessarily have Desire…

2a8729e

…dState set to Shutdown Instead, let the restart supervisor decide whether to change the desired state or leave it at Running. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>

aaronlehmann force-pushed the robust-restart-policy branch from adbde76 to 2a8729e Compare July 21, 2017 01:16

aaronlehmann changed the title ~~Respect restart policy during service reconciliation~~ [WIP] Respect restart policy during service reconciliation Jul 21, 2017

aaronlehmann mentioned this pull request Jul 21, 2017

orchestrator: Flag tasks that shouldn't be restarted #2327

Closed

aaronlehmann mentioned this pull request Jul 26, 2017

orchestrator: Use task history to evaluate restart policy #2332

Merged

aaronlehmann closed this Jul 27, 2017

Conversation

aaronlehmann commented Jun 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jun 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

thaJeztah commented Jul 13, 2017

Uh oh!

cyli commented Jul 19, 2017

Uh oh!

aaronlehmann commented Jul 19, 2017

Uh oh!

cyli commented Jul 19, 2017

Uh oh!

aaronlehmann commented Jul 19, 2017

Uh oh!

cyli commented Jul 19, 2017

Uh oh!

aaronlehmann commented Jul 19, 2017

Uh oh!

cyli Jul 20, 2017

Choose a reason for hiding this comment

Uh oh!

aaronlehmann Jul 20, 2017

Choose a reason for hiding this comment

Uh oh!

cyli Jul 20, 2017

Choose a reason for hiding this comment

Uh oh!

aaronlehmann commented Jul 21, 2017

Uh oh!

aaronlehmann commented Jul 21, 2017

Uh oh!

aaronlehmann commented Jul 21, 2017

Uh oh!

aaronlehmann commented Jul 21, 2017

Uh oh!

aaronlehmann commented Jul 21, 2017

Uh oh!

aaronlehmann commented Jul 21, 2017

Uh oh!

aluzzardi commented Jul 24, 2017

Uh oh!

aaronlehmann commented Jul 24, 2017

Uh oh!

aaronlehmann commented Jul 27, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aaronlehmann commented Jun 28, 2017 •

edited

Loading

codecov bot commented Jun 28, 2017 •

edited

Loading