Skip to content

[Data][LLM] Support multi-node setup for ray.data.llm #55491

@lk-chen

Description

@lk-chen

Description

At the moment, vLLMEngineProcessorConfig from ray.data.llm requires all GPU to be located on the same node (see details below). This feature request intends to eliminate the restriction to support cross-node deployment.

with

        "tensor_parallel_size": 2,
        "pipeline_parallel_size": 3,
        "distributed_executor_backend": "ray"

=> {'GPU': 1.0, 'CPU': 1.0} * 6 (STRICT_PACK)

with

        "tensor_parallel_size": 2,
        "pipeline_parallel_size": 3,
        # "distributed_executor_backend": "ray"

=> [{GPU: 6, CPU: 1}]

with

    engine_kwargs={
        ...
        "tensor_parallel_size": 2,
        "pipeline_parallel_size": 3,
        # "distributed_executor_backend": "ray"
    },
    resources_per_bundle={'num_gpus': 1},

=> [{CPU: 1, num_gpus: 6}]

How does it happen

Currently, vllm_engine_stage.py aggregates all TP and PP requirements, and generate ray_remote_args through _ray_scheduling_strategy_fn, which forces STRICT_PACK strategy. STRICT_PACK puts all bundles on the same node

Request

Ideally we want TP to be put on the same node, but eliminate the restriction to allow PP put on different nodes.

Use case

  1. if TP*PP is too large, there might not be such node to satisfy the requirement (usually at most 8 GPUs on a node)
  2. if the user is accessible to just small node (with 1 or 2 GPU), we want user still able to config PP.

Potential solution

  1. update _ray_scheduling_strategy_fn to generate something similar to [num_gpus: tp_size] * pp_size, strategy='PACK'. As TP usually requires high bandwidth to minimize latency
  2. (be cautious) if above is not enough, introduce resources in addition to resources_per_bundle to allow users to provide detailed resource requirement.

internal jira https://anyscale1.atlassian.net/browse/CI-1255

Metadata

Metadata

Labels

enhancementRequest for new feature and/or capabilityhelp-wantedllmtriageNeeds triage (eg: priority, bug/not-bug, and owning component)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions