Skip to content

[DSIP-92][Master] Refactor workflow serial strategy #17703

@ruanwenjun

Description

@ruanwenjun

Search before asking

  • I had searched in the DSIP and found no similar DSIP.

Motivation

DS supports serialized workflow execution policies, which are used to control the concurrency of workflow instances.
When multiple instances of the same workflow are triggered at the same time, the system can be configured to run them in parallel, run them sequentially, or run only the earliest one while discarding the others.

Currently, when creating new instances, DS checks the database to determine whether another instance of the same workflow is already running. However, because multiple Masters may initiate instances of the same workflow concurrently, this check is not reliable.

As a result, the current implementation contains some inaccuracies and bugs in enforcing the policy. So we remove this feature in 3.3.0. This DSIP is aim to fix the serial strategy.

Design Detail

To implement a serial execution policy, the key challenge is accurately determining the current concurrency level of a workflow when creating a new instance. If we cannot reliably prevent a single Master from creating multiple instances of the same workflow, then it becomes difficult to make this determination without introducing additional resource locks—ultimately adding complexity and potentially creating new problems.

We chose to determine the real concurrency number for serial workflows at the MasterCoordinator. This approach ensures accuracy, as there is only one MasterCoordinator in the entire system, eliminating the possibility of concurrent or conflicting concurrency checks.

If all workflow instances that use the serial strategy are launched exclusively by the MasterCoordinator, the approach becomes implementation-safe. However, this would place significant strain on the MasterCoordinator.

We want workflow instances in the cluster to remain load-balanced across different Masters. To achieve this while supporting serial execution policies, we introduced the concept of a SerialCommand for serial workflow instances. When a workflow uses a serial policy, it is first triggered as a SerialCommand. If the instance is allowed to run, the SerialCommand is then converted into a real Command. SerialCommands can only be processed by the MasterCoordinator.

Image

Compatibility, Deprecation, and Migration Plan

Need to create a new MYSQL table t_ds_serial_command.

Test Plan

No response

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions