-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Closed
Labels
enhancementRequest for new feature and/or capabilityRequest for new feature and/or capabilityhelp-wantedllmtriageNeeds triage (eg: priority, bug/not-bug, and owning component)Needs triage (eg: priority, bug/not-bug, and owning component)
Description
Description
At the moment, vLLMEngineProcessorConfig from ray.data.llm requires all GPU to be located on the same node (see details below). This feature request intends to eliminate the restriction to support cross-node deployment.
with
"tensor_parallel_size": 2,
"pipeline_parallel_size": 3,
"distributed_executor_backend": "ray"
=> {'GPU': 1.0, 'CPU': 1.0} * 6 (STRICT_PACK)
with
"tensor_parallel_size": 2,
"pipeline_parallel_size": 3,
# "distributed_executor_backend": "ray"
=> [{GPU: 6, CPU: 1}]
with
engine_kwargs={
...
"tensor_parallel_size": 2,
"pipeline_parallel_size": 3,
# "distributed_executor_backend": "ray"
},
resources_per_bundle={'num_gpus': 1},
=> [{CPU: 1, num_gpus: 6}]
How does it happen
Currently, vllm_engine_stage.py aggregates all TP and PP requirements, and generate ray_remote_args through _ray_scheduling_strategy_fn, which forces STRICT_PACK strategy. STRICT_PACK puts all bundles on the same node
Request
Ideally we want TP to be put on the same node, but eliminate the restriction to allow PP put on different nodes.
Use case
- if TP*PP is too large, there might not be such node to satisfy the requirement (usually at most 8 GPUs on a node)
- if the user is accessible to just small node (with 1 or 2 GPU), we want user still able to config PP.
Potential solution
- update
_ray_scheduling_strategy_fnto generate something similar to[num_gpus: tp_size] * pp_size, strategy='PACK'. As TP usually requires high bandwidth to minimize latency - (be cautious) if above is not enough, introduce
resourcesin addition toresources_per_bundleto allow users to provide detailed resource requirement.
internal jira https://anyscale1.atlassian.net/browse/CI-1255
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementRequest for new feature and/or capabilityRequest for new feature and/or capabilityhelp-wantedllmtriageNeeds triage (eg: priority, bug/not-bug, and owning component)Needs triage (eg: priority, bug/not-bug, and owning component)