Skip to content

Design issue: Desired state, restart policy and orchestrator #932

@aluzzardi

Description

@aluzzardi

The orchestrator manages scale by counting how many tasks are in desired state running.

On the other hand, the restart manager sets the desired state of crashed tasks to SHUTDOWN and creates replacements set to RUNNING if the restart policy says so.

This means that when a task crashes, we set its desired state to SHUTDOWN - leading the orchestrator re-creating a new one, even if it shouldn't.

The workaround to get around that is to keep the desired state of crashed tasks to RUNNING. We should only change it if we're ready to come up with a replacement.

I think the restart manager and orchestrator are too much tied together. The orchestrator should only manage scale (e.g. number of slots) without caring about desired state.

The restart manager should be independent from the orchestrator.

/cc @aaronlehmann @dongluochen

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions