[WIP] Respect restart policy during service reconciliation#2290
[WIP] Respect restart policy during service reconciliation#2290aaronlehmann wants to merge 4 commits intomoby:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2290 +/- ##
==========================================
- Coverage 60.47% 60.45% -0.02%
==========================================
Files 128 128
Lines 26002 26037 +35
==========================================
+ Hits 15724 15740 +16
- Misses 8884 8912 +28
+ Partials 1394 1385 -9 |
a86b69f to
c40945f
Compare
|
ping @aluzzardi PTAL! |
|
For the first (draining) and third case (global services, node goes down) - would having a separate state like "shutdownNoRestart" (that's a terrible name I know) help? This distinguishes it from a shutdown state that can be replaced. I agree on rolling back - it seems like that's a service update, which seems like it should reset the restart policy history? #2304 resets the restart history on spec update - can we ignore any tasks with the wrong service spec version? |
Yes, I think having a separate state would work. It does feel a little hacky so I wonder if there's another way, but I admit I can't think of anything better right now.
I don't think spec versioning helps us here because rolling back goes back to the old spec version (see #2309). The issue is that the orchestrator might not restart tasks that failed prior to updating and rolling back. In a way, this is the right decision because the task is set up not to restart, and it matches the current (rolled-back) version of the service. But this is a bit of a special case, because we'd expect rolling back to reset everything. |
Sorry, I meant if we roll back, can we delete tasks with that previous version in the history (leaving only the failed or succeeded tasks with the new task version)? That way, there is nothing to restart per se. Definitely is slightly hacky, and it wipes out a chunk of existing history, so I definitely understand why it may not be desirable. |
I would not want to wipe history as part of this, but one possibility might be to set the desired state of these leftover tasks to "shutdown" as part of a rollback. Then the reconciliation process would cause them to get restarted, just as they do without this PR. It's a bit of a hack, but I think it's not as bad a deleting tasks. |
|
@aaronlehmann That seems like would work, too. |
|
It's a bit harder than I first thought. This pass over old tasks could race with a restart of one of them (it might have just failed, and the orchestrator is about to call Technically there wouldn't be a risk of the updater interfering with |
c40945f to
2b51a16
Compare
| switch v.Node.Status.State { | ||
| // NodeStatus_DISCONNECTED is a transient state, no need to make any change | ||
| case api.NodeStatus_DOWN: | ||
| // NodeStatus_DISCONNECTED is a transient state, no need to remove tasks |
There was a problem hiding this comment.
Non-blocking nit: NodeStatus_DOWN in this comment (I guess the name got changed at some point)
There was a problem hiding this comment.
The comment is correct - it explains why NodeStatus_DISCONNECTED isn't handled specially.
Currently, service reconciliation (updating a service, leader reelection, etc) will cause a service's tasks to be restarted even if the restart policy says they shouldn't be. This is because the service definition says there should be, for example, 5 replicas, and the control loop tries to make this happen. This experiments with the idea of keeping tasks that we don't want to restart in the desired state "running" instead of "shutdown". This can be a starting point for evaluating the tradeoffs and deciding if this is a good approach. Known caveats so far: - Draining a node has to set the desired state of tasks on that node to "shutdown", regardless of the restart policy. But that means the orchestrator may eventually restart those tasks. So draining triggers the same issue we had before. - When rolling back a partial service update, tasks with the original service definition that happen to have failed won't be restarted. I'm not sure this is the correct behavior, because I would expect a service rollback to leave the service in a converged state. - Global services have tasks shut down on nodes that go down. This is necessary to prevent rolling updates from getting stuck on those nodes. But it means that a node going down and coming back up will cause a restart of its global service tasks, even if that does not match the restart policy. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
2b51a16 to
caab791
Compare
|
Brought this up to date with recent changes underneath |
This ensures that tasks that failed before the update started won't be skipped by the rollback. The rollback should lead to a converged state, instead of leaving failed tasks in a failed state (which might otherwise happen if the restart policy says not to restart the tasks). Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
f0ce829 to
1741154
Compare
After thinking about this a little more, I actually think it's not an issue. This would happen immediately before triggering a service reconciliation, so even if we did race with a restart in some way, the effect would be extremely temporary. I've added a commit that implements this. |
Fixed by not actually shutting down tasks when the node goes down, but instead excluding them from the list of tasks passed to the updater. I think this is cleaner anyway. |
Just skip them from the list of tasks passed to the updater. When the node becomes available again, it will trigger node reconciliation, which will correct whatever was skipped. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
…dState set to Shutdown Instead, let the restart supervisor decide whether to change the desired state or leave it at Running. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
adbde76 to
2a8729e
Compare
|
Adding another caveat:
If we had a fix for this, it would open up a pretty elegant fix for the other outstanding caveat (node draining). The constraint enforcer could handle draining nodes as just another case of a node not meeting the "constraints". This isn't really so radical - in the early swarmkit prototype we had a "drainer" component that worked separately from the orchestrator. I'm experimenting locally with moving draining to the constraint enforcer, and it makes the code very nice. It removes the extra logic I added in the first commit of this PR around forcing a shutdown in the restart manager in some cases. Unfortunately, I went off on this path before I realized that the constraint enforcer doesn't actually work correctly under this change to begin with... I'd love to find a complete solution to move this forward and finally fix #932. I'll try to continue looking at this. |
|
What about fundamentally reconsidering the approach in this PR, and instead of keeping tasks that we don't want to restart in the desired state "running" instead of "shutdown", we add a flag to |
This kind of goes against my own policy of not depending on task history. Maybe it's a bad idea. I suppose we could modify the task reaper so that it will never delete the last dead task from a slot if it has the |
I agree - I think it's too "special cased" and a bit complex. A bit naive, but does it matter in reconciliation how many are in desired=running? Can't we just make sure that we have enough slots and handle restarts per slot? |
Special cased, perhaps, but I think it's a lot less complex that what's being explored here. The PR that implements the flag in the task is #2327.
Sorry, I don't understand this part. |
|
Superseded by #2332 |
Currently, service reconciliation (updating a service, leader
reelection, etc) will cause a service's tasks to be restarted even if
the restart policy says they shouldn't be. This is because the service
definition says there should be, for example, 5 replicas, and the
control loop tries to make this happen.
This experiments with the idea of keeping tasks that we don't want to
restart in the desired state "running" instead of "shutdown". This can
be a starting point for evaluating the tradeoffs and deciding if this is
a good approach.
Known caveats so far:
"shutdown", regardless of the restart policy. But that means the
orchestrator may eventually restart those tasks. So draining triggers
the same issue we had before.
When rolling back a partial service update, tasks with the original-- Fixed byservice definition that happen to have failed won't be restarted. I'm
not sure this is the correct behavior, because I would expect a service
rollback to leave the service in a converged state.
setting all failed tasks' desired state to
Shutdownon a rollback.Global services have tasks shut down on nodes that go down. This is-- Fixed by excluding tasks on downnecessary to prevent rolling updates from getting stuck on those
nodes. But it means that a node going down and coming back up will
cause a restart of its global service tasks, even if that does not
match the restart policy.
down nodes from the list passed to the updater instead of actually
shutting them down.
setting their observed state to
Rejected. It expects the orchestrator toset the desired state to
Shutdown, which will trigger the actualshutdown. It modifies observed state instead of desired state to avoid
fighting with the orchestrator over the
DesiredStatefield.Possible solution for #932
cc @aluzzardi