[Data][LLM] Support multi-node TP/PP for ray.data.llm by jeffreyjeffreywang · Pull Request #56779 · ray-project/ray

jeffreyjeffreywang · 2025-09-22T00:56:51Z

Why are these changes needed?

Currently, vLLMEngineProcessorConfig in ray.data.llm enforces that all GPUs must be colocated on a single node, preventing users from scaling tensor/pipeline parallelism across multiple nodes.

This PR removes that limitation by dropping the STRICT_PACK placement group constraint, thereby enabling multi-node deployments. Moreover, this PR introduces a new benchmarking module for evaluating ray.data.llm processors.

Benchmark

To validate that Ray Data LLM does not introduce unreasonable overhead to multi-node TP/PP batch inference, we perform the following benchmark on A10G (g5.12xlarge instances) with batch_size=64, drawing 10,000 samples from https://huggingface.co/datasets/Crystalcareai/Code-feedback-sharegpt-renamed. Results show that throughput drops when the model is distributed across multiple nodes. The performance degradation is most pronounced when TP spans across multiple nodes, while configurations that keep TP within the same node show slight declines. The optimal performance is achieved with single-node configurations, confirming that cross-node communication introduces significant overhead that exacerbates with higher tensor parallelism.

Node	TP	PP	Backend	Throughput (req/s)
g5.12xlarge	1	1	mp	30.74
g5.12xlarge	2	2	ray	30.29
2 g5.12xlarge	4	2	ray	28.32
2 g5.12xlarge	2	4	ray	29.26
2 g5.12xlarge	8	1	ray	12.04

To reproduce, execute the following command:

python3 benchmark_processor.py
    --mode vllm_engine
    --batch-size 64
    --concurrency 1
    --num-prompts 10000
    --model facebook/opt-1.3b
    --tensor-parallel-size 4
    --pipeline-parallel-size 2
    --distributed-executor-backend ray

Design writeup: https://docs.google.com/document/d/1Db9TDwagXaW5lOWncmkbqq4g769ajDP-MVvURdicZBk/edit?usp=sharing

Related issue number

Closes #55491

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Note

Adds placement-group-based scheduling to vLLM processor (enabling multi-node TP/PP) and introduces a new Ray Data LLM batch benchmarking module.

LLM Data Processing (vLLM):
- Add placement_group_config to vLLMEngineProcessorConfig and plumb through to vLLMEngineStage for custom PG scheduling; default strategy now PACK.
- Remove use of resources in map_batches and compute GPUs/CPUs from PG config (or fallback), enabling multi-node TP/PP with distributed_executor_backend=ray.
- Deprecate ProcessorConfig.resources_per_bundle.
SGLang Processor:
- Stop passing resources; set num_gpus via Ray remote args.
Benchmarks:
- Add python/ray/llm/_internal/batch/benchmark/ with a CLI to benchmark vLLM/Serve processors and a ShareGPT-backed dataset loader.
Tests/Release:
- New/updated unit tests for vLLM PG configs and backends; Ray cleanup/compile-cache fixtures.
- Add multi-node vLLM batch release test and cluster config.

^{Written by Cursor Bugbot for commit d3bb6cb. This will update automatically on new commits. Configure here.}

jeffreyjeffreywang · 2025-09-22T01:13:06Z

Hey @kouroshHakha, just posting this draft to share results and to get early feedback from you as I rebase onto master. There are a couple of things I’d appreciate your thoughts on.

Benchmark: As I rebase onto master, I will add serve deployment benchmark scripts into benchmark_processor.py that I introduced. If we'd like to have more comprehensive benchmarks similar to vLLM's benchmark suite, e.g. validating against more datasets, engine configurations, and map batches settings, we could put together a RFC to sort out the details.
Release tests: I'd love your help in making sure that the new release test in release/release_tests.yaml runs as expected. I believe that the validation process requires access to Anyscale's internal infra. I added a multi-node test that’s scheduled on two g6.12xlarge instances, so it’ll need a multi-node cluster to run.

…data.llm; introduce benchmarks Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>

Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>

kouroshHakha

I feel like we should deprecate resource_per_bundle with this as using pg definition per ray actor in the pool is more expressive than just expressing the bundle / or resources per bundle.

python/ray/llm/_internal/batch/processor/serve_deployment_proc.py

kouroshHakha · 2025-09-24T23:16:36Z

python/ray/llm/_internal/batch/processor/vllm_engine_proc.py

+    # Placement group config for TP/PP.
+    placement_group_config: Optional[Dict[str, Any]] = Field(
+        default=None,
+        description=(
+            "Custom Ray placement group configuration for scheduling vLLM engine workers. "
+            "Each bundle should define its resource requirements, such as 'CPU' and 'GPU'. "
+            "Note: When using vLLM's Ray distributed executor backend, each bundle must be restricted "
+            "to a single GPU. This configuration is only applicable when the Ray distributed executor backend is enabled."
+        ),
+    )


We need to document better what this structure represent and how it should be constructed. Is it only the bundles? Is it gonna be bundles + strategy? Is it a placement group object?

I'd create a schema data class for placement_group_config here:

class BundleSchema(allow_arbitrary_kwargs=True): CPU: int = 1 GPU: int = 1 class PlacementGroupSchema: bundles: list[BundleSchema] strategy: Literal[PACK, STRICT_PACK, SPREAD, STRICT_SPREAD]

and then I'd cast the passed dict to this data type. we can then later do placementgroup(**pg.dict()) or sth similar to construct the real placement group. This allows you to skip all those validations you are doing further down the line.

@kouroshHakha let me know if the latest explanation is clearer to you. Introduced the schema classes as suggested to perform validations and apply default values.

+1. Why is this change here?

Related to #56490

Discussed offline with Jeff

+1. Why is this change here?

Addressed separately in #57061. Removed this change from the latest revision to keep this PR focusing on TP/PP changes.

kouroshHakha

Several things:

I think we should deprecate resource_per_bundle in favor of pg (just remove the old API)
move benchmark to its right place
Follow up PR: Document both sharing the stages, and benchmarking.

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py

release/llm_tests/batch/benchmark/__init__.py

release/llm_tests/batch/benchmark/benchmark_processor.py

release/release_tests.yaml

release/llm_tests/batch/test_batch_vllm.py

release/llm_tests/batch/benchmark/benchmark_processor.py

kouroshHakha · 2025-09-24T23:48:53Z

/gemini review

kouroshHakha · 2025-09-24T23:49:28Z

@cursoragent review the benchmark code very closely. See what can be improved for the first version of it.

gemini-code-assist

Code Review

This PR is a great addition, enabling multi-node TP/PP for ray.data.llm, which is a significant feature for scaling LLM inference. The changes are well-implemented, especially the introduction of placement_group_config and relaxing the placement strategy to PACK. The inclusion of a comprehensive benchmark suite and thorough unit/integration tests (including multi-node and failure cases) is excellent and greatly increases confidence in the changes. I have a few suggestions for minor improvements, mostly around code duplication in the new benchmark script and small code style enhancements.

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py

release/llm_tests/batch/benchmark/benchmark_processor.py

release/llm_tests/batch/benchmark/dataset.py

release/llm_tests/batch/test_batch_vllm.py

kouroshHakha · 2025-09-24T23:50:50Z

@nrghosh could you work with @jeffreyjeffreywang to setup the new release test and hook it up with the CI?

- Default placement strategy --> PACK - Schema validation with BundleSchema and PlacementGroupSchema - Deprecate resources_per_bundle --> placement_group_config - Update tests, remove single GPU per bundle restriction - Deprecation warnings + ensure backwards compatibility Based on feedback from kouroshHakha in ray-project#56779 Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>

jeffreyjeffreywang · 2025-09-28T06:44:10Z

Hey @nrghosh, could you help validate that test_batch_multi_node_vllm.py runs successfully in CI? I'm reaching out you on Slack to coordinate.

nrghosh

Quick comments prior to setting up CI / release testing

Looks like llm gpu unit test test_vllm_engine_processor_placement_group is unhappy:

E       AssertionError: assert {'accelerator...gpus': 1, ...} == {'accelerator..._batch': True}
--
  | [2025-09-28T08:52:18Z] E         Omitting 3 identical items, use -vv to show
  | [2025-09-28T08:52:18Z] E         Left contains 2 more items:
  | [2025-09-28T08:52:18Z] E         {'num_cpus': 1, 'num_gpus': 1}
  | [2025-09-28T08:52:18Z] E         Right contains 1 more item:
  | [2025-09-28T08:52:18Z] E         {'resources': {'CPU': 1, 'GPU': 1}}
  | [2025-09-28T08:52:18Z] E         Full diff:
  | [2025-09-28T08:52:18Z] E           {...
  | [2025-09-28T08:52:18Z] E
  | [2025-09-28T08:52:18Z] E         ...Full output truncated (8 lines hidden), use '-vv' to show
  | [2025-09-28T08:52:18Z]
  | [2025-09-28T08:52:18Z] python/ray/llm/tests/batch/gpu/processor/test_vllm_engine_proc.py:86: AssertionError

lint / pre_commit checker also unhappy (./ci/lint/lint.sh pre_commit )

nrghosh · 2025-09-29T20:59:08Z

Hey @nrghosh, could you help validate that test_batch_multi_node_vllm.py runs successfully in CI? I'm reaching out you on Slack to coordinate.

https://buildkite.com/ray-project/release/builds/60549/steps/table?sid=019996a9-6cbb-4701-81ce-9ddb536ce6e6

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py

- ray data requires separate num_cpus / num_gpus keywords but was receiving resources dict - unit test was expecting resources in format that contradicted with ray data requirements - python docstrings were not complete so linter complained Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

nrghosh · 2025-09-29T23:36:45Z

/gemini review

gemini-code-assist

Code Review

This pull request enables multi-node tensor and pipeline parallelism for ray.data.llm by introducing a configurable placement_group_config for the vLLMEngineProcessorConfig and changing the default placement strategy to PACK. This is a great enhancement for scaling LLM batch inference. The changes are well-implemented, and the addition of a new benchmarking module and corresponding tests is very valuable.

I have a couple of points of feedback. First, there's an unused parameter in build_serve_deployment_processor which could be misleading. Second, one of the new test cases for placement groups seems to have an incorrect configuration that will likely lead to test failures. Addressing these points will improve the clarity and robustness of the code.

release/llm_tests/batch/test_batch_vllm.py

python/ray/llm/_internal/batch/processor/serve_deployment_proc.py

cursor · 2025-10-02T04:29:16Z

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py

+
+                # Keep only non-CPU/GPU custom resources, if any.
+                if resource_counter:
+                    ray_remote_args["resources"] = dict(resource_counter)


Bug: Empty Placement Group Config Causes Resource Allocation Issues

An empty placement_group_config results in an empty bundles list after validation. For the Ray backend, ray.util.placement_group fails. For other backends, it allocates zero CPU/GPU, preventing the vLLM engine from starting. Additionally, non-Ray backends sum resources across all bundles, which may not align with the intent for distinct bundle allocations.

Additional Locations (1)

python/ray/llm/_internal/batch/processor/vllm_engine_proc.py#L93-L100

) Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com> Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Co-authored-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com> Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com> Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com> Signed-off-by: Seiji Eicher <seiji@anyscale.com>

Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com> Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Co-authored-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com> Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com> Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>

) Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com> Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Co-authored-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com> Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com> Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>

Original PR #56779 by jeffreyjeffreywang Original: ray-project/ray#56779

…ata.llm Merged from original PR #56779 Original: ray-project/ray#56779

Original PR #56779 by jeffreyjeffreywang Original: ray-project/ray#56779

…ata.llm Merged from original PR #56779 Original: ray-project/ray#56779

) Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com> Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Co-authored-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com> Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com> Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com> Signed-off-by: Josh Kodi <joshkodi@gmail.com>

) Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com> Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Co-authored-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com> Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com> Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>

Original PR #56779 by jeffreyjeffreywang Original: ray-project/ray#56779

…ata.llm Merged from original PR #56779 Original: ray-project/ray#56779

) Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com> Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Co-authored-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com> Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com> Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>

) Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com> Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Co-authored-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com> Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com> Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>

) Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com> Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Co-authored-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com> Co-authored-by: Nikhil Ghosh <nikhil@anyscale.com> Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>

jeffreywang-anyscale added 2 commits September 22, 2025 04:03

Introduce placment_group_config to support multi-node PP/TP with ray.…

c50eeed

…data.llm; introduce benchmarks Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>

Introduce serve deplyoment benchmarks

7a8f0b6

Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>

jeffreyjeffreywang force-pushed the multi-node-pp-1 branch from 4dbea04 to 7a8f0b6 Compare September 24, 2025 05:28

jeffreyjeffreywang marked this pull request as ready for review September 24, 2025 05:30

jeffreyjeffreywang requested a review from a team as a code owner September 24, 2025 05:30

This comment was marked as outdated.

Sign in to view

ray-gardener bot added data Ray Data-related issues release-test release test llm community-contribution Contributed by the community labels Sep 24, 2025

kouroshHakha reviewed Sep 24, 2025

View reviewed changes

gemini-code-assist bot reviewed Sep 24, 2025

View reviewed changes

CR feedback & Deprecate resources_per_bundle

2e54e56

Signed-off-by: jeffreyjeffreywang <jeffjeffreywang@gmail.com>

This comment was marked as outdated.

Sign in to view

nrghosh reviewed Sep 29, 2025

View reviewed changes

jeffreyjeffreywang commented Sep 29, 2025

View reviewed changes

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py Outdated Show resolved Hide resolved

nrghosh force-pushed the multi-node-pp-1 branch from 67fe896 to 5992c0d Compare September 29, 2025 23:25

nrghosh force-pushed the multi-node-pp-1 branch from 5992c0d to 254ddd8 Compare September 29, 2025 23:29

This comment was marked as outdated.

Sign in to view

gemini-code-assist bot reviewed Sep 29, 2025

View reviewed changes

release/llm_tests/batch/test_batch_vllm.py Show resolved Hide resolved

python/ray/llm/_internal/batch/processor/serve_deployment_proc.py Outdated Show resolved Hide resolved

kouroshHakha approved these changes Oct 1, 2025

View reviewed changes

nrghosh mentioned this pull request Oct 2, 2025

[Data] Ray Data + vLLM use unexpected (double) GPU resources #55247

Closed

Merge branch 'master' into multi-node-pp-1

c3c6289

github-actions bot disabled auto-merge October 2, 2025 04:24

aslonnie enabled auto-merge (squash) October 2, 2025 04:26

cursor bot reviewed Oct 2, 2025

View reviewed changes

kouroshHakha disabled auto-merge October 2, 2025 04:31

kouroshHakha enabled auto-merge (squash) October 2, 2025 04:32

kouroshHakha merged commit d82d069 into ray-project:master Oct 2, 2025
8 checks passed

jeffreyjeffreywang mentioned this pull request Oct 7, 2025

[docs][data.llm] Introduce docs for serve deployment processor and cross-node parallelism #57261

Merged

8 tasks

nrghosh mentioned this pull request Oct 8, 2025

[serve][LLM] Enable cross-node placement groups for serve.llm #56980

Merged

8 tasks

snorkelopstesting2-coder mentioned this pull request Oct 11, 2025

[Data][LLM] Support multi-node TP/PP for ray.data.llm snorkel-marlin-repos/ray-project_ray_pr_56779_683bbfde-0cdc-4775-becd-08292065ca0d#1

Merged

snorkelopstesting2-coder mentioned this pull request Oct 11, 2025

[Data][LLM] Support multi-node TP/PP for ray.data.llm snorkel-marlin-repos/ray-project_ray_pr_56779_7213eff7-dbdb-4172-b044-ac96d20a3374#1

Merged

snorkelopstesting1-a11y mentioned this pull request Oct 22, 2025

[Data][LLM] Support multi-node TP/PP for ray.data.llm snorkel-marlin-repos/ray-project_ray_pr_56779_f2ac9047-16a6-4993-974d-bad767ca3d15#1

Merged

This was referenced Oct 25, 2025

Update documentation for cross-node parallelism anyscale/ray#1

Closed

[doc][data][llm] Update documentation for cross-node parallelism #58158

Closed

Conversation

jeffreyjeffreywang commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Benchmark

Related issue number

Checks

Uh oh!

jeffreyjeffreywang commented Sep 22, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

kouroshHakha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kouroshHakha Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

kouroshHakha Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeffreyjeffreywang Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

nrghosh Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

jeffreyjeffreywang Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

kouroshHakha left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kouroshHakha commented Sep 24, 2025

Uh oh!

kouroshHakha commented Sep 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kouroshHakha commented Sep 24, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

jeffreyjeffreywang commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nrghosh left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nrghosh commented Sep 29, 2025

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

nrghosh commented Sep 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

jeffreyjeffreywang commented Sep 22, 2025 •

edited

Loading

kouroshHakha Sep 24, 2025 •

edited

Loading

kouroshHakha left a comment •

edited

Loading

jeffreyjeffreywang commented Sep 28, 2025 •

edited

Loading

nrghosh left a comment •

edited

Loading