[Data/LLM] Fixing runai_streamer for vLLM 0.10.2 integration (and Deepseek support) by jiangwu300 · Pull Request #56906 · ray-project/ray

jiangwu300 · 2025-09-24T22:31:18Z

Why are these changes needed?

The original PR #55662 was aimed to fix runai_streamer to skip tensorizer files. However, this integration with vLLM was broken because the local model path ends up being passed into the vLLM engine and that location has no tensorizer files because we skipped it. Furthermore, we missed the first occurrence of model downloading in the vllm_engine_proc.py file. This meant that when testing with normal models like Qwen 3 this PR worked, but for Deepseek which depends on a .py configuration file, this fails.

Related issue number

Closes #56905

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

gemini-code-assist

Code Review

This pull request aims to fix an integration issue with runai_streamer in vLLM, particularly for models like Deepseek that have specific file requirements. The changes correctly identify the need to adjust model downloading and how model paths are passed to the vLLM engine. My review highlights a critical bug where a proposed fix was not correctly implemented, potentially leaving the original issue unresolved. I've also included a couple of medium-severity suggestions to improve logging practices and code comment clarity for better maintainability. Overall, the direction is correct, but the implementation needs a small correction to be effective.

gemini-code-assist · 2025-09-24T22:32:56Z

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py

+        logger.info("=" * 20)
+        logger.info("Initializing vLLM engine stage with model: %s", model)
+        logger.info("=" * 20)


These decorative log lines (here and on lines 492-494) are helpful for debugging, but using logger.info makes them quite verbose for standard runs. Please consider changing the log level to DEBUG or removing them if they were for temporary debugging purposes.

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py

hao-aaron

Mainly just the unused variable bug. I do have a question which is, if we are using runai streamer/tensorizer do we need to download anything from s3 on data.llm side? My understanding is that AsyncEngineArgs will create ModelConfig, which will call maybe_pull_model_tokenizer_for_runai and pull all non model weight files.

python/ray/llm/_internal/batch/processor/vllm_engine_proc.py

hao-aaron · 2025-09-25T00:01:26Z

python/ray/llm/_internal/common/utils/download_utils.py


 logger = get_logger(__name__)

+EXCLUDE_TENSORIZER_MODES = ["runai_streamer", "tensorizer"]


I think maybe EXCLUDE_SAFETENSORS_MODES makes more sense as a name

It seems like before we hit the AsyncEngineArgs the code is already failing because it tries to instantiate the model config after the download_model_files function is invoked in vllm_engine_proc.py, so I think the easiest way to fix would be this PR:

model_path = download_model_files( model_id=config.model_source, mirror_config=None, download_model=download_model_mode, download_extra_files=False, ) hf_config = transformers.AutoConfig.from_pretrained( model_path, trust_remote_code=config.engine_kwargs.get("trust_remote_code", False), )

jiangwu300 · 2025-10-03T23:15:03Z

@ahao-anyscale Are you able to review the changes?

hao-aaron

LGTM, thanks!

nrghosh

hi @jiangwu300 thanks for this

Please run the pre-commit linter etc to unblock the preliminary unit tests, thanks

eg.

./ci/lint/lint.sh pre_commit
./ci/lint/lint.sh code_format

python/ray/llm/_internal/batch/processor/vllm_engine_proc.py

kouroshHakha · 2025-10-01T15:28:23Z

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py

+        # need to still go to the model passed in if we need to exclude safetensors
+        # because the model could be a cloud storage that contains the safetensors files that we skipped.


more clear comment:

If we are using streaming load formats, we need to pass in self.model which is a remote cloud storage path.

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py

kouroshHakha · 2025-10-01T15:30:55Z

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py

-            "runai_streamer",
-            "tensorizer",
-        ]
+        exclude_safetensors = self.engine_kwargs.get("load_format") in EXCLUDE_SAFETENSORS_MODES


I would rename this EXCLUDE_SAFETENSORS_MODES to STREAMING_LOAD_FORMATS

python/ray/llm/_internal/batch/processor/vllm_engine_proc.py

Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com>

…ad if load_format is specified as one that needs to skip Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com>

Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com>

…ode formatting/lint Signed-off-by: Jiang Wu <jwu@cclgroup.com>

jiangwu300 · 2025-10-06T22:14:56Z

Added cleaner comments/global variable name, signed off all commits, and ran code linter. All good from my end!

nrghosh

good to go

cpu/gpu tests passed

…epseek support) (#56906) Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com> Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>

…epseek support) (ray-project#56906) Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com> Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>

…epseek support) (ray-project#56906) Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com> Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com> Signed-off-by: xgui <xgui@anyscale.com>

…epseek support) (#56906) Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com> Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

…epseek support) (ray-project#56906) Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com> Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>

…epseek support) (ray-project#56906) Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com> Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>

…epseek support) (ray-project#56906) Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com> Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>

jiangwu300 requested a review from a team as a code owner September 24, 2025 22:31

gemini-code-assist bot reviewed Sep 24, 2025

View reviewed changes

This comment was marked as outdated.

Sign in to view

hao-aaron requested changes Sep 25, 2025

View reviewed changes

ray-gardener bot added serve Ray Serve Related Issue data Ray Data-related issues llm community-contribution Contributed by the community labels Sep 25, 2025

gvspraveen removed the data Ray Data-related issues label Sep 25, 2025

jiangwu300 mentioned this pull request Sep 25, 2025

[Bug]: vLLM (AsyncLLMEngine, LLM) engine initialization fails when using runai_streamer vllm-project/vllm#22843

Closed

1 task

hao-aaron approved these changes Oct 6, 2025

View reviewed changes

nrghosh reviewed Oct 6, 2025

View reviewed changes

nrghosh added the go add ONLY when ready to merge, run all tests label Oct 6, 2025

kouroshHakha reviewed Oct 6, 2025

View reviewed changes

jiangwu300 and others added 6 commits October 6, 2025 22:12

Adding in handling for s3 paths

4ad732c

Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com>

Adding in exclude tensorizer download from all layers of model downlo…

079f15b

…ad if load_format is specified as one that needs to skip Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com>

Fixing bug and addressing comments

967c3f6

Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com>

Fixing sglang runai

0557650

Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com>

SGLang fix doesn't work, probably out of scope of this PR

a13da05

Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com>

Fixing comments and global variable to be more descriptive, running c…

03c1c90

…ode formatting/lint Signed-off-by: Jiang Wu <jwu@cclgroup.com>

jiangwu300 force-pushed the vllm-integration branch from 3d7a259 to 03c1c90 Compare October 6, 2025 22:13

This comment was marked as outdated.

Sign in to view

Merge branch 'master' into vllm-integration

a416e56

nrghosh approved these changes Oct 9, 2025

View reviewed changes

kouroshHakha approved these changes Oct 14, 2025

View reviewed changes

kouroshHakha merged commit a18653a into ray-project:master Oct 14, 2025
6 checks passed

nrghosh mentioned this pull request Jan 20, 2026

[data][llm] MiniMax M2.1 Initialization fails because configuration_minimax_m2.py isn't downloaded by the chat_template_stage.py #60122

Closed


		logger = get_logger(__name__)

		EXCLUDE_TENSORIZER_MODES = ["runai_streamer", "tensorizer"]

		# need to still go to the model passed in if we need to exclude safetensors
		# because the model could be a cloud storage that contains the safetensors files that we skipped.

Conversation

jiangwu300 commented Sep 24, 2025

Why are these changes needed?

Related issue number

Checks

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

hao-aaron left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hao-aaron Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

jiangwu300 Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jiangwu300 commented Oct 3, 2025

Uh oh!

hao-aaron left a comment

Choose a reason for hiding this comment

Uh oh!

nrghosh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kouroshHakha Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kouroshHakha Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

jiangwu300 commented Oct 6, 2025

Uh oh!

nrghosh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jiangwu300 Sep 25, 2025 •

edited

Loading