[Data] [LLM] Stage of engine support flexible compute.size

### Description

The stage of engine use fixed config.concurrency and set compute.min_size/max_size both to config.concurrency. 

https://github.com/ray-project/ray/blob/master/python/ray/llm/_internal/batch/processor/vllm_engine_proc.py#L90-L97

```python
    if isinstance(config.concurrency, int):
        # For CPU-only stages, we leverage auto-scaling to recycle resources.
        processor_concurrency = (1, config.concurrency)
    else:
        raise ValueError(
            "``concurrency`` is expected to be set as an integer,"
            f" but got: {config.concurrency}."
        )
```

https://github.com/ray-project/ray/blob/master/python/ray/llm/_internal/batch/processor/vllm_engine_proc.py#L159-L166
```python
                compute=ray.data.ActorPoolStrategy(
                    # vLLM start up time is significant, so if user give fixed
                    # concurrency, start all instances without auto-scaling.
                    min_size=config.concurrency,
                    max_size=config.concurrency,
                    max_tasks_in_flight_per_actor=config.experimental.get(
                        "max_tasks_in_flight_per_actor", DEFAULT_MAX_TASKS_IN_FLIGHT
                    ),
                ),
```

### Use case

In my opinion, the config.concurrency should be `Tuple[int, int]` rather than `int`, or has the choice to been set to `Tuple[int, int]` at least, while application scenarios of data.llm are more inclined to offline batch processing. Compared with online tasks, offline tasks generally have lower priority (and are often subject to resource preemption by online tasks) and are less sensitive to latency. In such scenarios, rather than considering the overhead of engine startup and shutdown, it is more important to ensure dynamic resource allocation. If the quantity of the most sensitive GPU resources is fixed, the usability of data.llm in offline scenarios will be greatly reduced.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data] [LLM] Stage of engine support flexible compute.size #55480

Description

Use case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Data] [LLM] Stage of engine support flexible compute.size #55480

Description

Description

Use case

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions