-
Notifications
You must be signed in to change notification settings - Fork 660
Description
The orchestrator manages scale by counting how many tasks are in desired state running.
On the other hand, the restart manager sets the desired state of crashed tasks to SHUTDOWN and creates replacements set to RUNNING if the restart policy says so.
This means that when a task crashes, we set its desired state to SHUTDOWN - leading the orchestrator re-creating a new one, even if it shouldn't.
The workaround to get around that is to keep the desired state of crashed tasks to RUNNING. We should only change it if we're ready to come up with a replacement.
I think the restart manager and orchestrator are too much tied together. The orchestrator should only manage scale (e.g. number of slots) without caring about desired state.
The restart manager should be independent from the orchestrator.