Skip to content

Implement a pool function similar to airflow to limit the number of "task instances" that are executed simultaneously #268

@Baoqi

Description

@Baoqi

airflow's documentation on the pool is at: https://airflow.apache.org/concepts.html#pools

Mainly some tasks can be time consuming, so you need to use pool to limit the number of tasks that are executed at the same time.

In addition, for SQL Task, it is generally necessary to limit the number of databases connected at the same time.

Preliminary ideas:
  1. Add a pool configuration interface, (this is a different concept from "Worker grouping")
  2. The user can create a new pool and specify the number of "task instances" that it can execute at the same time, such as: Pool: GPU, the number of tasks is 1.
  3. For all "Task/Task" nodes, after the "Worker Grouping" is specified, add a "Pool" selection. The default may be the Unlimited Default Pool. Here you can select the GPU Pool you just created. *Note: * To avoid problems such as deadlocks, each Task node can only be bound to at most one Pool!
  4. When the Worker takes the task from the Queue, it checks if there is a Pool and whether the Pool exceeds the limit.
At the same time, you need to improve the management interface of the Pool, for example: For each Pool, you should be able to find all Process/Tasks that reference this Pool. At the same time, # need to show the current real-time pool usage.


airflow的关于pool的文档在: https://airflow.apache.org/concepts.html#pools

主要是一些任务可能很耗时, 所以需要用pool来限制同时执行的任务数.

另外, 对于SQL Task, 一般也是要限制同时连接的数据库数量.

初步思路:

  1. 增加一个Pool配置界面, (这个和"Worker分组"是不同的概念)
  2. 用户可以创建一个新的Pool, 并指定其同时执行的"任务实例"个数限制, 比如: Pool: GPU, 任务个数为1
  3. 对于所有的"Task/任务"节点, 在指定 "Worker分组"后面, 增加一个"Pool"选择, 默认可能是不限制的 Default Pool. 这里可以选择刚才创建的GPU Pool. 注意: 为了避免死锁等问题, 每个Task节点, 最多只能绑定一个Pool!
  4. 当Worker从Queue取任务时, 会检查是否有Pool, 并是否Pool是否超过限制.

同时, 需要完善Pool的管理界面, 比如: 对于每个Pool应该能找出所有引用了这个Pool的 Process/Task, 同时, #需要展示出当前实时的Pool使用情况.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions