Design issue: Desired state, restart policy and orchestrator

The orchestrator manages scale by counting how many tasks are in desired state running.

On the other hand, the restart manager sets the desired state of crashed tasks to SHUTDOWN and creates replacements set to RUNNING **if** the restart policy says so.

This means that when a task crashes, we set its desired state to SHUTDOWN - leading the orchestrator re-creating a new one, even if it shouldn't.

The workaround to get around that is to keep the desired state of crashed tasks to RUNNING. We should only change it if we're ready to come up with a replacement.

I think the restart manager and orchestrator are too much tied together. The orchestrator should only manage scale (e.g. number of slots) without caring about desired state.

The restart manager should be independent from the orchestrator.

/cc @aaronlehmann @dongluochen 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design issue: Desired state, restart policy and orchestrator #932

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Design issue: Desired state, restart policy and orchestrator #932

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions