scheduler: Version-aware failure tracking#2253
Conversation
This addresses a longstanding TODO which I forgot about. Currently, the scheduler tracks potential failure loops by service ID. However, if the service is updated, it won't distinguish the new version of the service from the old one. This means even if the operator fixes the underlying problem, the new tasks from the updated service will still avoid the node where the problem was detected. Change the failure tracking to be sensitive to SpecVersion as well. If SpecVersion changes, we'll treat the task as a different kind of task for failure tracking purposes. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
This map could technically grow without bound. When adding a new failure record, if the cleanup hasn't happened recently, loop over all the recentFailures entries and remove any that only consist of expired records. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
This function previously could take an uninitialized NodeInfo structure and fill in whatever was missing. This is very error-prone, so remove this logic and change the only caller that relies on it to always pass in a properly initialized NodeInfo. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Codecov Report
@@ Coverage Diff @@
## master #2253 +/- ##
==========================================
- Coverage 66.63% 60.38% -6.26%
==========================================
Files 92 124 +32
Lines 13480 20278 +6798
==========================================
+ Hits 8983 12245 +3262
- Misses 3550 6648 +3098
- Partials 947 1385 +438 |
cyli
left a comment
There was a problem hiding this comment.
LGTM with the caveat I'm not super familiar with the scheduler code.
Nonblocking - I'm not sure how feasible these are, especially given that we probably don't inject a clock into the scheduler, but would it be possible to add tests that check that:
- When a node starts up, failed tasks aren't immediately cleaned up
- Tasks are not cleared before the monitoring period is up?
I don't understand this one. Does this mean a node becoming READY? The scheduler itself starting up?
As you mentioned, we don't inject a clock. I could augment the existing tests to check that |
Er sorry, again, not super familiar with the scheduler life cycle. Whenever
Oops, I realized I meant But none of these may be possible without the clock injection, hence non-blocking. :) |
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
|
Added a commit that makes sure the values of |
|
LGTM, thanks for adding the assertions |
- moby/swarmkit#2266 (support for templating Node.Hostname in docker executor) - moby/swarmkit#2281 (change restore action on objects to be update, not delete/create) - moby/swarmkit#2285 (extend watch queue with timeout and size limit) - moby/swarmkit#2253 (version-aware failure tracking in the scheduler) - moby/swarmkit#2275 (update containerd and port executor to container client library) - moby/swarmkit#2292 (rename some generic resources) - moby/swarmkit#2300 (limit the size of the external CA response) - moby/swarmkit#2301 (delete global tasks when the node running them is deleted) Minor cleanups, dependency bumps, and vendoring: - moby/swarmkit#2271 - moby/swarmkit#2279 - moby/swarmkit#2283 - moby/swarmkit#2282 - moby/swarmkit#2274 - moby/swarmkit#2296 (dependency bump of etcd, go-winio) Signed-off-by: Ying Li <ying.li@docker.com> Upstream-commit: 4509a00 Component: engine
- moby/swarmkit#2266 (support for templating Node.Hostname in docker executor) - moby/swarmkit#2281 (change restore action on objects to be update, not delete/create) - moby/swarmkit#2285 (extend watch queue with timeout and size limit) - moby/swarmkit#2253 (version-aware failure tracking in the scheduler) - moby/swarmkit#2275 (update containerd and port executor to container client library) - moby/swarmkit#2292 (rename some generic resources) - moby/swarmkit#2300 (limit the size of the external CA response) - moby/swarmkit#2301 (delete global tasks when the node running them is deleted) Minor cleanups, dependency bumps, and vendoring: - moby/swarmkit#2271 - moby/swarmkit#2279 - moby/swarmkit#2283 - moby/swarmkit#2282 - moby/swarmkit#2274 - moby/swarmkit#2296 (dependency bump of etcd, go-winio) Signed-off-by: Ying Li <ying.li@docker.com> Upstream-commit: 4509a00 Component: engine
This addresses a longstanding TODO which I forgot about. Currently, the scheduler tracks potential failure loops by service ID. However, if the service is updated, it won't distinguish the new version of the service from the old one. This means even if the operator fixes the underlying problem, the new tasks from the updated service will still avoid the node where the problem was detected.
Change the failure tracking to be sensitive to
SpecVersionas well. IfSpecVersionchanges, we'll treat the task as a different kind of task for failure tracking purposes.Also, add a periodic cleanup pass to make sure
recentFailuresdoesn't grow without bound, and clean upaddOrUpdateNodeso it doesn't try to fix up uninitializednodeInfostructs (which is fragile).cc @aluzzardi