Skip to content

[Data] [LLM] Stage of engine support flexible compute.size #55480

@DriverSong

Description

@DriverSong

Description

The stage of engine use fixed config.concurrency and set compute.min_size/max_size both to config.concurrency.

https://github.com/ray-project/ray/blob/master/python/ray/llm/_internal/batch/processor/vllm_engine_proc.py#L90-L97

    if isinstance(config.concurrency, int):
        # For CPU-only stages, we leverage auto-scaling to recycle resources.
        processor_concurrency = (1, config.concurrency)
    else:
        raise ValueError(
            "``concurrency`` is expected to be set as an integer,"
            f" but got: {config.concurrency}."
        )

https://github.com/ray-project/ray/blob/master/python/ray/llm/_internal/batch/processor/vllm_engine_proc.py#L159-L166

                compute=ray.data.ActorPoolStrategy(
                    # vLLM start up time is significant, so if user give fixed
                    # concurrency, start all instances without auto-scaling.
                    min_size=config.concurrency,
                    max_size=config.concurrency,
                    max_tasks_in_flight_per_actor=config.experimental.get(
                        "max_tasks_in_flight_per_actor", DEFAULT_MAX_TASKS_IN_FLIGHT
                    ),
                ),

Use case

In my opinion, the config.concurrency should be Tuple[int, int] rather than int, or has the choice to been set to Tuple[int, int] at least, while application scenarios of data.llm are more inclined to offline batch processing. Compared with online tasks, offline tasks generally have lower priority (and are often subject to resource preemption by online tasks) and are less sensitive to latency. In such scenarios, rather than considering the overhead of engine startup and shutdown, it is more important to ensure dynamic resource allocation. If the quantity of the most sensitive GPU resources is fixed, the usability of data.llm in offline scenarios will be greatly reduced.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions