Skip to content

[Data/LLM] Fixing runai_streamer for vLLM 0.10.2 integration (and Deepseek support)#56906

Merged
kouroshHakha merged 7 commits intoray-project:masterfrom
jiangwu300:vllm-integration
Oct 14, 2025
Merged

[Data/LLM] Fixing runai_streamer for vLLM 0.10.2 integration (and Deepseek support)#56906
kouroshHakha merged 7 commits intoray-project:masterfrom
jiangwu300:vllm-integration

Conversation

@jiangwu300
Copy link
Copy Markdown
Contributor

Why are these changes needed?

The original PR #55662 was aimed to fix runai_streamer to skip tensorizer files. However, this integration with vLLM was broken because the local model path ends up being passed into the vLLM engine and that location has no tensorizer files because we skipped it. Furthermore, we missed the first occurrence of model downloading in the vllm_engine_proc.py file. This meant that when testing with normal models like Qwen 3 this PR worked, but for Deepseek which depends on a .py configuration file, this fails.

Related issue number

Closes #56905

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@jiangwu300 jiangwu300 requested a review from a team as a code owner September 24, 2025 22:31
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix an integration issue with runai_streamer in vLLM, particularly for models like Deepseek that have specific file requirements. The changes correctly identify the need to adjust model downloading and how model paths are passed to the vLLM engine. My review highlights a critical bug where a proposed fix was not correctly implemented, potentially leaving the original issue unresolved. I've also included a couple of medium-severity suggestions to improve logging practices and code comment clarity for better maintainability. Overall, the direction is correct, but the implementation needs a small correction to be effective.

Comment on lines +460 to +462
logger.info("=" * 20)
logger.info("Initializing vLLM engine stage with model: %s", model)
logger.info("=" * 20)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These decorative log lines (here and on lines 492-494) are helpful for debugging, but using logger.info makes them quite verbose for standard runs. Please consider changing the log level to DEBUG or removing them if they were for temporary debugging purposes.

cursor[bot]

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

@hao-aaron hao-aaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly just the unused variable bug. I do have a question which is, if we are using runai streamer/tensorizer do we need to download anything from s3 on data.llm side? My understanding is that AsyncEngineArgs will create ModelConfig, which will call maybe_pull_model_tokenizer_for_runai and pull all non model weight files.


logger = get_logger(__name__)

EXCLUDE_TENSORIZER_MODES = ["runai_streamer", "tensorizer"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe EXCLUDE_SAFETENSORS_MODES makes more sense as a name

Copy link
Copy Markdown
Contributor Author

@jiangwu300 jiangwu300 Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like before we hit the AsyncEngineArgs the code is already failing because it tries to instantiate the model config after the download_model_files function is invoked in vllm_engine_proc.py, so I think the easiest way to fix would be this PR:

model_path = download_model_files(
        model_id=config.model_source,
        mirror_config=None,
        download_model=download_model_mode,
        download_extra_files=False,
    )
hf_config = transformers.AutoConfig.from_pretrained(
    model_path,
    trust_remote_code=config.engine_kwargs.get("trust_remote_code", False),
)

@ray-gardener ray-gardener bot added serve Ray Serve Related Issue data Ray Data-related issues llm community-contribution Contributed by the community labels Sep 25, 2025
@gvspraveen gvspraveen removed the data Ray Data-related issues label Sep 25, 2025
@jiangwu300
Copy link
Copy Markdown
Contributor Author

@ahao-anyscale Are you able to review the changes?

Copy link
Copy Markdown
Contributor

@hao-aaron hao-aaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Copy link
Copy Markdown
Contributor

@nrghosh nrghosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @jiangwu300 thanks for this

Please run the pre-commit linter etc to unblock the preliminary unit tests, thanks

eg.

./ci/lint/lint.sh pre_commit
./ci/lint/lint.sh code_format

@nrghosh nrghosh added the go add ONLY when ready to merge, run all tests label Oct 6, 2025
Comment on lines +491 to +492
# need to still go to the model passed in if we need to exclude safetensors
# because the model could be a cloud storage that contains the safetensors files that we skipped.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more clear comment:

If we are using streaming load formats, we need to pass in self.model which is a remote cloud storage path. 

"runai_streamer",
"tensorizer",
]
exclude_safetensors = self.engine_kwargs.get("load_format") in EXCLUDE_SAFETENSORS_MODES
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this EXCLUDE_SAFETENSORS_MODES to STREAMING_LOAD_FORMATS

jiangwu300 and others added 6 commits October 6, 2025 22:12
Signed-off-by: Jiang Wu <JWu@cclgroup.com>
Signed-off-by: Jiang Wu <jwu@cclgroup.com>
…ad if load_format is specified as one that needs to skip

Signed-off-by: Jiang Wu <JWu@cclgroup.com>
Signed-off-by: Jiang Wu <jwu@cclgroup.com>
Signed-off-by: Jiang Wu <JWu@cclgroup.com>
Signed-off-by: Jiang Wu <jwu@cclgroup.com>
Signed-off-by: Jiang Wu <JWu@cclgroup.com>
Signed-off-by: Jiang Wu <jwu@cclgroup.com>
Signed-off-by: Jiang Wu <JWu@cclgroup.com>
Signed-off-by: Jiang Wu <jwu@cclgroup.com>
…ode formatting/lint

Signed-off-by: Jiang Wu <jwu@cclgroup.com>
cursor[bot]

This comment was marked as outdated.

@jiangwu300
Copy link
Copy Markdown
Contributor Author

Added cleaner comments/global variable name, signed off all commits, and ran code linter. All good from my end!

Copy link
Copy Markdown
Contributor

@nrghosh nrghosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kouroshHakha kouroshHakha merged commit a18653a into ray-project:master Oct 14, 2025
6 checks passed
harshit-anyscale pushed a commit that referenced this pull request Oct 15, 2025
…epseek support) (#56906)

Signed-off-by: Jiang Wu <JWu@cclgroup.com>
Signed-off-by: Jiang Wu <jwu@cclgroup.com>
Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
…epseek support) (ray-project#56906)

Signed-off-by: Jiang Wu <JWu@cclgroup.com>
Signed-off-by: Jiang Wu <jwu@cclgroup.com>
Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>
xinyuangui2 pushed a commit to xinyuangui2/ray that referenced this pull request Oct 22, 2025
…epseek support) (ray-project#56906)

Signed-off-by: Jiang Wu <JWu@cclgroup.com>
Signed-off-by: Jiang Wu <jwu@cclgroup.com>
Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>
Signed-off-by: xgui <xgui@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Oct 23, 2025
…epseek support) (#56906)

Signed-off-by: Jiang Wu <JWu@cclgroup.com>
Signed-off-by: Jiang Wu <jwu@cclgroup.com>
Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…epseek support) (ray-project#56906)

Signed-off-by: Jiang Wu <JWu@cclgroup.com>
Signed-off-by: Jiang Wu <jwu@cclgroup.com>
Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…epseek support) (ray-project#56906)

Signed-off-by: Jiang Wu <JWu@cclgroup.com>
Signed-off-by: Jiang Wu <jwu@cclgroup.com>
Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
…epseek support) (ray-project#56906)

Signed-off-by: Jiang Wu <JWu@cclgroup.com>
Signed-off-by: Jiang Wu <jwu@cclgroup.com>
Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community go add ONLY when ready to merge, run all tests llm serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Data/LLM] Runai Streamer integration error with vLLM 0.10.2

5 participants