Skip to content

[Data][LLM] Support multi-node TP/PP for ray.data.llm#56779

Merged
kouroshHakha merged 9 commits intoray-project:masterfrom
jeffreyjeffreywang:multi-node-pp-1
Oct 2, 2025
Merged

[Data][LLM] Support multi-node TP/PP for ray.data.llm#56779
kouroshHakha merged 9 commits intoray-project:masterfrom
jeffreyjeffreywang:multi-node-pp-1

Conversation

@jeffreyjeffreywang
Copy link
Copy Markdown
Contributor

@jeffreyjeffreywang jeffreyjeffreywang commented Sep 22, 2025

Why are these changes needed?

Currently, vLLMEngineProcessorConfig in ray.data.llm enforces that all GPUs must be colocated on a single node, preventing users from scaling tensor/pipeline parallelism across multiple nodes.

This PR removes that limitation by dropping the STRICT_PACK placement group constraint, thereby enabling multi-node deployments. Moreover, this PR introduces a new benchmarking module for evaluating ray.data.llm processors.

Benchmark

To validate that Ray Data LLM does not introduce unreasonable overhead to multi-node TP/PP batch inference, we perform the following benchmark on A10G (g5.12xlarge instances) with batch_size=64, drawing 10,000 samples from https://huggingface.co/datasets/Crystalcareai/Code-feedback-sharegpt-renamed. Results show that throughput drops when the model is distributed across multiple nodes. The performance degradation is most pronounced when TP spans across multiple nodes, while configurations that keep TP within the same node show slight declines. The optimal performance is achieved with single-node configurations, confirming that cross-node communication introduces significant overhead that exacerbates with higher tensor parallelism.

Node TP PP Backend Throughput (req/s)
g5.12xlarge 1 1 mp 30.74
g5.12xlarge 2 2 ray 30.29
2 g5.12xlarge 4 2 ray 28.32
2 g5.12xlarge 2 4 ray 29.26
2 g5.12xlarge 8 1 ray 12.04

To reproduce, execute the following command:

python3 benchmark_processor.py
    --mode vllm_engine
    --batch-size 64
    --concurrency 1
    --num-prompts 10000
    --model facebook/opt-1.3b
    --tensor-parallel-size 4
    --pipeline-parallel-size 2
    --distributed-executor-backend ray

Design writeup: https://docs.google.com/document/d/1Db9TDwagXaW5lOWncmkbqq4g769ajDP-MVvURdicZBk/edit?usp=sharing

Related issue number

Closes #55491

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Note

Adds placement-group-based scheduling to vLLM processor (enabling multi-node TP/PP) and introduces a new Ray Data LLM batch benchmarking module.

  • LLM Data Processing (vLLM):
    • Add placement_group_config to vLLMEngineProcessorConfig and plumb through to vLLMEngineStage for custom PG scheduling; default strategy now PACK.
    • Remove use of resources in map_batches and compute GPUs/CPUs from PG config (or fallback), enabling multi-node TP/PP with distributed_executor_backend=ray.
    • Deprecate ProcessorConfig.resources_per_bundle.
  • SGLang Processor:
    • Stop passing resources; set num_gpus via Ray remote args.
  • Benchmarks:
    • Add python/ray/llm/_internal/batch/benchmark/ with a CLI to benchmark vLLM/Serve processors and a ShareGPT-backed dataset loader.
  • Tests/Release:
    • New/updated unit tests for vLLM PG configs and backends; Ray cleanup/compile-cache fixtures.
    • Add multi-node vLLM batch release test and cluster config.

Written by Cursor Bugbot for commit d3bb6cb. This will update automatically on new commits. Configure here.

@jeffreyjeffreywang
Copy link
Copy Markdown
Contributor Author

Hey @kouroshHakha, just posting this draft to share results and to get early feedback from you as I rebase onto master. There are a couple of things I’d appreciate your thoughts on.

  1. Benchmark: As I rebase onto master, I will add serve deployment benchmark scripts into benchmark_processor.py that I introduced. If we'd like to have more comprehensive benchmarks similar to vLLM's benchmark suite, e.g. validating against more datasets, engine configurations, and map batches settings, we could put together a RFC to sort out the details.
  2. Release tests: I'd love your help in making sure that the new release test in release/release_tests.yaml runs as expected. I believe that the validation process requires access to Anyscale's internal infra. I added a multi-node test that’s scheduled on two g6.12xlarge instances, so it’ll need a multi-node cluster to run.

…data.llm; introduce benchmarks

Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>
Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>
@jeffreyjeffreywang jeffreyjeffreywang marked this pull request as ready for review September 24, 2025 05:30
@jeffreyjeffreywang jeffreyjeffreywang requested a review from a team as a code owner September 24, 2025 05:30
cursor[bot]

This comment was marked as outdated.

@ray-gardener ray-gardener bot added data Ray Data-related issues release-test release test llm community-contribution Contributed by the community labels Sep 24, 2025
Copy link
Copy Markdown
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should deprecate resource_per_bundle with this as using pg definition per ray actor in the pool is more expressive than just expressing the bundle / or resources per bundle.

Comment on lines +61 to +70
# Placement group config for TP/PP.
placement_group_config: Optional[Dict[str, Any]] = Field(
default=None,
description=(
"Custom Ray placement group configuration for scheduling vLLM engine workers. "
"Each bundle should define its resource requirements, such as 'CPU' and 'GPU'. "
"Note: When using vLLM's Ray distributed executor backend, each bundle must be restricted "
"to a single GPU. This configuration is only applicable when the Ray distributed executor backend is enabled."
),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to document better what this structure represent and how it should be constructed. Is it only the bundles? Is it gonna be bundles + strategy? Is it a placement group object?

Copy link
Copy Markdown
Contributor

@kouroshHakha kouroshHakha Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd create a schema data class for placement_group_config here:


class BundleSchema(allow_arbitrary_kwargs=True):
    CPU: int = 1
    GPU: int = 1

class PlacementGroupSchema:
    bundles: list[BundleSchema]
    strategy: Literal[PACK, STRICT_PACK, SPREAD, STRICT_SPREAD]

and then I'd cast the passed dict to this data type. we can then later do placementgroup(**pg.dict()) or sth similar to construct the real placement group. This allows you to skip all those validations you are doing further down the line.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kouroshHakha let me know if the latest explanation is clearer to you. Introduced the schema classes as suggested to perform validations and apply default values.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Why is this change here?

Related to #56490

Discussed offline with Jeff

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Why is this change here?

Addressed separately in #57061. Removed this change from the latest revision to keep this PR focusing on TP/PP changes.

Copy link
Copy Markdown
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several things:

  • I think we should deprecate resource_per_bundle in favor of pg (just remove the old API)
  • move benchmark to its right place
  • Follow up PR: Document both sharing the stages, and benchmarking.

@kouroshHakha
Copy link
Copy Markdown
Contributor

/gemini review

@kouroshHakha
Copy link
Copy Markdown
Contributor

@cursoragent review the benchmark code very closely. See what can be improved for the first version of it.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR is a great addition, enabling multi-node TP/PP for ray.data.llm, which is a significant feature for scaling LLM inference. The changes are well-implemented, especially the introduction of placement_group_config and relaxing the placement strategy to PACK. The inclusion of a comprehensive benchmark suite and thorough unit/integration tests (including multi-node and failure cases) is excellent and greatly increases confidence in the changes. I have a few suggestions for minor improvements, mostly around code duplication in the new benchmark script and small code style enhancements.

@kouroshHakha
Copy link
Copy Markdown
Contributor

@nrghosh could you work with @jeffreyjeffreywang to setup the new release test and hook it up with the CI?

nrghosh added a commit to nrghosh/ray that referenced this pull request Sep 26, 2025
- Default placement strategy --> PACK
- Schema validation with BundleSchema and PlacementGroupSchema
- Deprecate resources_per_bundle --> placement_group_config
- Update tests, remove single GPU per bundle restriction
- Deprecation warnings + ensure backwards compatibility

Based on feedback from kouroshHakha in ray-project#56779

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>
cursor[bot]

This comment was marked as outdated.

@jeffreyjeffreywang
Copy link
Copy Markdown
Contributor Author

jeffreyjeffreywang commented Sep 28, 2025

Hey @nrghosh, could you help validate that test_batch_multi_node_vllm.py runs successfully in CI? I'm reaching out you on Slack to coordinate.

Copy link
Copy Markdown
Contributor

@nrghosh nrghosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick comments prior to setting up CI / release testing

  • Looks like llm gpu unit test test_vllm_engine_processor_placement_group is unhappy:
E       AssertionError: assert {'accelerator...gpus': 1, ...} == {'accelerator..._batch': True}
--
  | [2025-09-28T08:52:18Z] E         Omitting 3 identical items, use -vv to show
  | [2025-09-28T08:52:18Z] E         Left contains 2 more items:
  | [2025-09-28T08:52:18Z] E         {'num_cpus': 1, 'num_gpus': 1}
  | [2025-09-28T08:52:18Z] E         Right contains 1 more item:
  | [2025-09-28T08:52:18Z] E         {'resources': {'CPU': 1, 'GPU': 1}}
  | [2025-09-28T08:52:18Z] E         Full diff:
  | [2025-09-28T08:52:18Z] E           {...
  | [2025-09-28T08:52:18Z] E
  | [2025-09-28T08:52:18Z] E         ...Full output truncated (8 lines hidden), use '-vv' to show
  | [2025-09-28T08:52:18Z]
  | [2025-09-28T08:52:18Z] python/ray/llm/tests/batch/gpu/processor/test_vllm_engine_proc.py:86: AssertionError

  • lint / pre_commit checker also unhappy (./ci/lint/lint.sh pre_commit )

@nrghosh
Copy link
Copy Markdown
Contributor

nrghosh commented Sep 29, 2025

Hey @nrghosh, could you help validate that test_batch_multi_node_vllm.py runs successfully in CI? I'm reaching out you on Slack to coordinate.

https://buildkite.com/ray-project/release/builds/60549/steps/table?sid=019996a9-6cbb-4701-81ce-9ddb536ce6e6

image

- ray data requires separate num_cpus / num_gpus keywords but was
  receiving resources dict
- unit test was expecting resources in format that contradicted with ray
  data requirements
- python docstrings were not complete so linter complained

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
cursor[bot]

This comment was marked as outdated.

@nrghosh
Copy link
Copy Markdown
Contributor

nrghosh commented Sep 29, 2025

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables multi-node tensor and pipeline parallelism for ray.data.llm by introducing a configurable placement_group_config for the vLLMEngineProcessorConfig and changing the default placement strategy to PACK. This is a great enhancement for scaling LLM batch inference. The changes are well-implemented, and the addition of a new benchmarking module and corresponding tests is very valuable.

I have a couple of points of feedback. First, there's an unused parameter in build_serve_deployment_processor which could be misleading. Second, one of the new test cases for placement groups seems to have an incorrect configuration that will likely lead to test failures. Addressing these points will improve the clarity and robustness of the code.

@github-actions github-actions bot disabled auto-merge October 2, 2025 04:24
@aslonnie aslonnie enabled auto-merge (squash) October 2, 2025 04:26

# Keep only non-CPU/GPU custom resources, if any.
if resource_counter:
ray_remote_args["resources"] = dict(resource_counter)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Empty Placement Group Config Causes Resource Allocation Issues

An empty placement_group_config results in an empty bundles list after validation. For the Ray backend, ray.util.placement_group fails. For other backends, it allocates zero CPU/GPU, preventing the vLLM engine from starting. Additionally, non-Ray backends sum resources across all bundles, which may not align with the intent for distinct bundle allocations.

Additional Locations (1)

Fix in Cursor Fix in Web

@kouroshHakha kouroshHakha disabled auto-merge October 2, 2025 04:31
@kouroshHakha kouroshHakha enabled auto-merge (squash) October 2, 2025 04:32
@kouroshHakha kouroshHakha merged commit d82d069 into ray-project:master Oct 2, 2025
8 checks passed
eicherseiji pushed a commit to eicherseiji/ray that referenced this pull request Oct 6, 2025
)

Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>
Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
dstrodtman pushed a commit that referenced this pull request Oct 6, 2025
Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>
Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
liulehui pushed a commit to liulehui/ray that referenced this pull request Oct 9, 2025
)

Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>
Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
snorkelopstesting2-coder pushed a commit to snorkel-marlin-repos/ray-project_ray_pr_56779_683bbfde-0cdc-4775-becd-08292065ca0d that referenced this pull request Oct 11, 2025
Original PR #56779 by jeffreyjeffreywang
Original: ray-project/ray#56779
snorkelopstesting2-coder added a commit to snorkel-marlin-repos/ray-project_ray_pr_56779_683bbfde-0cdc-4775-becd-08292065ca0d that referenced this pull request Oct 11, 2025
snorkelopstesting2-coder pushed a commit to snorkel-marlin-repos/ray-project_ray_pr_56779_7213eff7-dbdb-4172-b044-ac96d20a3374 that referenced this pull request Oct 11, 2025
Original PR #56779 by jeffreyjeffreywang
Original: ray-project/ray#56779
snorkelopstesting2-coder added a commit to snorkel-marlin-repos/ray-project_ray_pr_56779_7213eff7-dbdb-4172-b044-ac96d20a3374 that referenced this pull request Oct 11, 2025
joshkodi pushed a commit to joshkodi/ray that referenced this pull request Oct 13, 2025
)

Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>
Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
Signed-off-by: Josh Kodi <joshkodi@gmail.com>
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
)

Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>
Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
snorkelopstesting1-a11y pushed a commit to snorkel-marlin-repos/ray-project_ray_pr_56779_f2ac9047-16a6-4993-974d-bad767ca3d15 that referenced this pull request Oct 22, 2025
Original PR #56779 by jeffreyjeffreywang
Original: ray-project/ray#56779
snorkelopstesting1-a11y added a commit to snorkel-marlin-repos/ray-project_ray_pr_56779_f2ac9047-16a6-4993-974d-bad767ca3d15 that referenced this pull request Oct 22, 2025
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
)

Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>
Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
)

Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>
Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
)

Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>
Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com>
Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues go add ONLY when ready to merge, run all tests llm release-test release test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Data][LLM] Support multi-node setup for ray.data.llm

5 participants