Skip to content

[WIP] Respect restart policy during service reconciliation#2290

Closed
aaronlehmann wants to merge 4 commits intomoby:masterfrom
aaronlehmann:robust-restart-policy
Closed

[WIP] Respect restart policy during service reconciliation#2290
aaronlehmann wants to merge 4 commits intomoby:masterfrom
aaronlehmann:robust-restart-policy

Conversation

@aaronlehmann
Copy link
Copy Markdown
Collaborator

@aaronlehmann aaronlehmann commented Jun 28, 2017

Currently, service reconciliation (updating a service, leader
reelection, etc) will cause a service's tasks to be restarted even if
the restart policy says they shouldn't be. This is because the service
definition says there should be, for example, 5 replicas, and the
control loop tries to make this happen.

This experiments with the idea of keeping tasks that we don't want to
restart in the desired state "running" instead of "shutdown". This can
be a starting point for evaluating the tradeoffs and deciding if this is
a good approach.

Known caveats so far:

  • Draining a node has to set the desired state of tasks on that node to
    "shutdown", regardless of the restart policy. But that means the
    orchestrator may eventually restart those tasks. So draining triggers
    the same issue we had before.
  • When rolling back a partial service update, tasks with the original
    service definition that happen to have failed won't be restarted. I'm
    not sure this is the correct behavior, because I would expect a service
    rollback to leave the service in a converged state.
    -- Fixed by
    setting all failed tasks' desired state to Shutdown on a rollback.
  • Global services have tasks shut down on nodes that go down. This is
    necessary to prevent rolling updates from getting stuck on those
    nodes. But it means that a node going down and coming back up will
    cause a restart of its global service tasks, even if that does not
    match the restart policy.
    -- Fixed by excluding tasks on down
    down nodes from the list passed to the updater instead of actually
    shutting them down.
  • This breaks the constraint enforcer, which tries to shut down tasks by
    setting their observed state to Rejected. It expects the orchestrator to
    set the desired state to Shutdown, which will trigger the actual
    shutdown. It modifies observed state instead of desired state to avoid
    fighting with the orchestrator over the DesiredState field.

Possible solution for #932

cc @aluzzardi

@codecov
Copy link
Copy Markdown

codecov bot commented Jun 28, 2017

Codecov Report

Merging #2290 into master will decrease coverage by 0.01%.
The diff coverage is 42.55%.

@@            Coverage Diff             @@
##           master    #2290      +/-   ##
==========================================
- Coverage   60.47%   60.45%   -0.02%     
==========================================
  Files         128      128              
  Lines       26002    26037      +35     
==========================================
+ Hits        15724    15740      +16     
- Misses       8884     8912      +28     
+ Partials     1394     1385       -9

@thaJeztah
Copy link
Copy Markdown
Member

ping @aluzzardi PTAL!

@cyli
Copy link
Copy Markdown
Contributor

cyli commented Jul 19, 2017

For the first (draining) and third case (global services, node goes down) - would having a separate state like "shutdownNoRestart" (that's a terrible name I know) help? This distinguishes it from a shutdown state that can be replaced.

I agree on rolling back - it seems like that's a service update, which seems like it should reset the restart policy history? #2304 resets the restart history on spec update - can we ignore any tasks with the wrong service spec version?

@aaronlehmann
Copy link
Copy Markdown
Collaborator Author

For the first (draining) and third case (global services, node goes down) - would having a separate state like "shutdownNoRestart" (that's a terrible name I know) help? This distinguishes it from a shutdown state that can be replaced.

Yes, I think having a separate state would work. It does feel a little hacky so I wonder if there's another way, but I admit I can't think of anything better right now.

I agree on rolling back - it seems like that's a service update, which seems like it should reset the restart policy history? #2304 resets the restart history on spec update - can we ignore any tasks with the wrong service spec version?

I don't think spec versioning helps us here because rolling back goes back to the old spec version (see #2309). The issue is that the orchestrator might not restart tasks that failed prior to updating and rolling back. In a way, this is the right decision because the task is set up not to restart, and it matches the current (rolled-back) version of the service. But this is a bit of a special case, because we'd expect rolling back to reset everything.

@cyli
Copy link
Copy Markdown
Contributor

cyli commented Jul 19, 2017

I don't think spec versioning helps us here because rolling back goes back to the old spec version (see #2309). The issue is that the orchestrator might not restart tasks that failed prior to updating and rolling back.

Sorry, I meant if we roll back, can we delete tasks with that previous version in the history (leaving only the failed or succeeded tasks with the new task version)? That way, there is nothing to restart per se. Definitely is slightly hacky, and it wipes out a chunk of existing history, so I definitely understand why it may not be desirable.

@aaronlehmann
Copy link
Copy Markdown
Collaborator Author

Sorry, I meant if we roll back, can we delete tasks with that previous version in the history (leaving only the failed or succeeded tasks with the new task version)? That way, there is nothing to restart per se. Definitely is slightly hacky, and it wipes out a chunk of existing history, so I definitely understand why it may not be desirable.

I would not want to wipe history as part of this, but one possibility might be to set the desired state of these leftover tasks to "shutdown" as part of a rollback. Then the reconciliation process would cause them to get restarted, just as they do without this PR.

It's a bit of a hack, but I think it's not as bad a deleting tasks.

@cyli
Copy link
Copy Markdown
Contributor

cyli commented Jul 19, 2017

@aaronlehmann That seems like would work, too.

@aaronlehmann
Copy link
Copy Markdown
Collaborator Author

It's a bit harder than I first thought. This pass over old tasks could race with a restart of one of them (it might have just failed, and the orchestrator is about to call Restart on it to change the desired state). I'm reluctant to add a second place that's shutting down tasks. We used to have some problems due to this, but now that Restart is the only place that sets DesiredState to Shutdown, it seems to work well.

Technically there wouldn't be a risk of the updater interfering with Restart if it only does this in cases where Restart would not change the desired state (i.e. shouldRestart returns false). But that involves coordination with the restart supervisor anyway - shouldRestart is relatively complex, and relies on internal state from the restart supervisor.

@aaronlehmann aaronlehmann force-pushed the robust-restart-policy branch from c40945f to 2b51a16 Compare July 19, 2017 23:02
switch v.Node.Status.State {
// NodeStatus_DISCONNECTED is a transient state, no need to make any change
case api.NodeStatus_DOWN:
// NodeStatus_DISCONNECTED is a transient state, no need to remove tasks
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking nit: NodeStatus_DOWN in this comment (I guess the name got changed at some point)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is correct - it explains why NodeStatus_DISCONNECTED isn't handled specially.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh oops, thank you!

Currently, service reconciliation (updating a service, leader
reelection, etc) will cause a service's tasks to be restarted even if
the restart policy says they shouldn't be. This is because the service
definition says there should be, for example, 5 replicas, and the
control loop tries to make this happen.

This experiments with the idea of keeping tasks that we don't want to
restart in the desired state "running" instead of "shutdown". This can
be a starting point for evaluating the tradeoffs and deciding if this is
a good approach.

Known caveats so far:

- Draining a node has to set the desired state of tasks on that node to
  "shutdown", regardless of the restart policy. But that means the
  orchestrator may eventually restart those tasks. So draining triggers
  the same issue we had before.
- When rolling back a partial service update, tasks with the original
  service definition that happen to have failed won't be restarted. I'm
  not sure this is the correct behavior, because I would expect a service
  rollback to leave the service in a converged state.
- Global services have tasks shut down on nodes that go down. This is
  necessary to prevent rolling updates from getting stuck on those
  nodes. But it means that a node going down and coming back up will
  cause a restart of its global service tasks, even if that does not
  match the restart policy.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
@aaronlehmann aaronlehmann force-pushed the robust-restart-policy branch from 2b51a16 to caab791 Compare July 21, 2017 00:11
@aaronlehmann
Copy link
Copy Markdown
Collaborator Author

Brought this up to date with recent changes underneath orchestrator.

This ensures that tasks that failed before the update started won't be
skipped by the rollback. The rollback should lead to a converged state,
instead of leaving failed tasks in a failed state (which might otherwise
happen if the restart policy says not to restart the tasks).

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
@aaronlehmann aaronlehmann force-pushed the robust-restart-policy branch from f0ce829 to 1741154 Compare July 21, 2017 00:48
@aaronlehmann
Copy link
Copy Markdown
Collaborator Author

It's a bit harder than I first thought. This pass over old tasks could race with a restart of one of them (it might have just failed, and the orchestrator is about to call Restart on it to change the desired state). I'm reluctant to add a second place that's shutting down tasks. We used to have some problems due to this, but now that Restart is the only place that sets DesiredState to Shutdown, it seems to work well.

After thinking about this a little more, I actually think it's not an issue. This would happen immediately before triggering a service reconciliation, so even if we did race with a restart in some way, the effect would be extremely temporary.

I've added a commit that implements this.

@aaronlehmann
Copy link
Copy Markdown
Collaborator Author

Global services have tasks shut down on nodes that go down. This is necessary to prevent rolling updates from getting stuck on those nodes. But it means that a node going down and coming back up will cause a restart of its global service tasks, even if that does not match the restart policy.

Fixed by not actually shutting down tasks when the node goes down, but instead excluding them from the list of tasks passed to the updater. I think this is cleaner anyway.

Aaron Lehmann added 2 commits July 20, 2017 18:16
Just skip them from the list of tasks passed to the updater. When the
node becomes available again, it will trigger node reconciliation, which
will correct whatever was skipped.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
…dState set to Shutdown

Instead, let the restart supervisor decide whether to change the desired
state or leave it at Running.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
@aaronlehmann aaronlehmann force-pushed the robust-restart-policy branch from adbde76 to 2a8729e Compare July 21, 2017 01:16
@aaronlehmann
Copy link
Copy Markdown
Collaborator Author

Adding another caveat:

  • This breaks the constraint enforcer, which tries to shut down tasks by setting their observed state to Rejected. It expects the orchestrator to set the desired state to Shutdown, which will trigger the actual shutdown. It modifies observed state instead of desired state to avoid fighting with the orchestrator over the DesiredState field.

If we had a fix for this, it would open up a pretty elegant fix for the other outstanding caveat (node draining). The constraint enforcer could handle draining nodes as just another case of a node not meeting the "constraints". This isn't really so radical - in the early swarmkit prototype we had a "drainer" component that worked separately from the orchestrator. I'm experimenting locally with moving draining to the constraint enforcer, and it makes the code very nice. It removes the extra logic I added in the first commit of this PR around forcing a shutdown in the restart manager in some cases. Unfortunately, I went off on this path before I realized that the constraint enforcer doesn't actually work correctly under this change to begin with...

I'd love to find a complete solution to move this forward and finally fix #932. I'll try to continue looking at this.

@aaronlehmann
Copy link
Copy Markdown
Collaborator Author

What about fundamentally reconsidering the approach in this PR, and instead of keeping tasks that we don't want to restart in the desired state "running" instead of "shutdown", we add a flag to Task that indicates the task shouldn't be restarted? The flag would be set by the restart supervisor when it shuts down a task and decides not to start a replacement. When the orchestrator does service-level reconciliation, it would not start new tasks to replace any tasks where DesiredState == Shutdown && DoNotRestart == true. This seems a lot simpler than all the changes in this PR, and I think it would address all the corner cases that have been showing up.

@aaronlehmann aaronlehmann changed the title Respect restart policy during service reconciliation [WIP] Respect restart policy during service reconciliation Jul 21, 2017
@aaronlehmann
Copy link
Copy Markdown
Collaborator Author

What about fundamentally reconsidering the approach in this PR, and instead of keeping tasks that we don't want to restart in the desired state "running" instead of "shutdown", we add a flag to Task that indicates the task shouldn't be restarted?

This kind of goes against my own policy of not depending on task history. Maybe it's a bad idea.

I suppose we could modify the task reaper so that it will never delete the last dead task from a slot if it has the DontRestart flag set. Maybe not so horrible?

@aluzzardi
Copy link
Copy Markdown
Member

This kind of goes against my own policy of not depending on task history. Maybe it's a bad idea.

I agree - I think it's too "special cased" and a bit complex.

A bit naive, but does it matter in reconciliation how many are in desired=running? Can't we just make sure that we have enough slots and handle restarts per slot?

@aaronlehmann
Copy link
Copy Markdown
Collaborator Author

I agree - I think it's too "special cased" and a bit complex.

Special cased, perhaps, but I think it's a lot less complex that what's being explored here. The PR that implements the flag in the task is #2327.

A bit naive, but does it matter in reconciliation how many are in desired=running? Can't we just make sure that we have enough slots and handle restarts per slot?

Sorry, I don't understand this part.

@aaronlehmann
Copy link
Copy Markdown
Collaborator Author

Superseded by #2332

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants