[data.llm] Skip safetensor file downloads for runai streamer mode by jiangwu300 · Pull Request #55662 · ray-project/ray

jiangwu300 · 2025-08-15T18:52:41Z

Why are these changes needed?

Ray LLM is downloading models to disk even when runai streamer is turned on, this causes significant startup overhead and network costs to download the model at scale. This change aims to first check for "load_format" in the engine kwargs, and skip the download of the safetensors files if runai streamer is on.

NOTE: vLLM currently has a bug when trying to load in models using runai streamer for the LLM and AsyncLLMEngine classes because it skips the download of the tokenizer and config.json. This PR needs to be in tandem with vLLM's issue fix for the functionality to work as expected.
vLLM Issue: vllm-project/vllm#22843

Related issue number

Closes #55574

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

gemini-code-assist

Code Review

This pull request introduces a mechanism to skip downloading .safetensors files when using the runai_streamer mode. The changes are well-implemented by adding a runai_streamer flag that is propagated through the model downloading functions. A new suffixes_to_exclude parameter is added to download_files to filter out files based on their suffix, which is a clean way to implement the core logic. My feedback includes a minor suggestion to improve maintainability.

gemini-code-assist · 2025-08-15T18:53:59Z

python/ray/llm/_internal/common/utils/cloud_utils.py

                ["tokenizer", "config.json"] if tokenizer_only else []
            )
+
+            safetensors_to_exclude = [".safetensors"] if runai_streamer else None


To improve maintainability and avoid using a magic string, consider defining ".safetensors" as a constant at the module level (e.g., SAFETENSORS_SUFFIX = ".safetensors") and referencing it here. This makes the code clearer and easier to update if the suffix ever needs to change.

kouroshHakha

Hi @jiangwu300 ,

Thanks for your contribution. Left some feedback.

python/ray/llm/_internal/common/utils/cloud_utils.py

kouroshHakha · 2025-08-18T23:34:59Z

python/ray/llm/_internal/common/utils/cloud_utils.py

            destination_path: Path where the model will be stored
            bucket_uri: URI of the cloud directory containing the model
            tokenizer_only: If True, only download tokenizer-related files
+            runai_streamer: If True, skip download of safetensor files


Shouldn't this usecase be already covered by using tokenizer_only=True?

I noticed for models like Deepseek R1, it has files that aren't covered by the tokenizer only path, which means we still need to download everything else except for the safetensors.:

kouroshHakha · 2025-08-18T23:38:04Z

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py

        if self.max_pending_requests > 0:
            logger.info("Max pending requests is set to %d", self.max_pending_requests)

+        runai_streamer = self.engine_kwargs.get("load_format") == "runai_streamer"


We shoud simply do download_model=....TokenizerOnly when load_format is those subset of formats that need model downloading to be skipped. There is also other load_formats that should be treated the same way like Tensorizer. Is there a better condition to use here?

I don't think using tokenizer only would suffice (please correct me if I'm wrong) because certain models require specific files to exist in the model directory like Deepseek R1 (attached screenshot in comment above).

gotcha, ok so would it make sense to define a new download_model enum called EXCLUDE_SAFETENSOR that will basically exclude *.safetensor ? We should make the api more generic than runai_streamer=True / False

That makes sense. I made this change.

nrghosh

Thanks for the contribution!

Quick fixes for CI

please sign off all commits (ex. git commit -s -m "foobar") (will fix DCO check failure)
Please lint for code format (see ./ci/lint/lint.sh code_format and ./ci/lint/lint.sh pre_commit (will fix microcheck failure)

jiangwu300 · 2025-08-19T04:13:31Z

@nrghosh @kouroshHakha Are either of you guys able to escalate the vLLM issue (vllm-project/vllm#22843 (comment)) to the vLLM team? This fix still requires a fix on the vLLM front.

jiangwu300 · 2025-08-19T18:58:07Z

runai_log.txt
Tested and works as expected ^

Signed-off-by: Jiang Wu <jwu@cclgroup.com>

Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com>

Signed-off-by: Jiang Wu <jwu@cclgroup.com>

kouroshHakha

Running tests and release tests. Just one quick comment to address in parallel:

kouroshHakha · 2025-08-21T17:06:20Z

python/ray/llm/_internal/batch/stages/vllm_engine_stage.py

        if self.max_pending_requests > 0:
            logger.info("Max pending requests is set to %d", self.max_pending_requests)

+        exclude_safetensors = self.engine_kwargs.get("load_format") == "runai_streamer"


thiis should be sth like .get(load_format, "auto") not in ["safetensors", "auto"]??

I'm actually not sure which load formats should exclude safetensor downloads except that runai_streamer needs to skip it for sure.

I think we can revise the list. Let's make it inclusive on runai_streamer and tensorizer(this is similar to runai_streamer). I know about these two for sure.

Added tensorizer

Signed-off-by: Jiang Wu <jwu@cclgroup.com>

kouroshHakha

LGTM. Thanks

…y-project#55662) Signed-off-by: Jiang Wu <jwu@cclgroup.com> Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: jugalshah291 <shah.jugal291@gmail.com>

…5662) Signed-off-by: Jiang Wu <jwu@cclgroup.com> Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>

…y-project#55662) Signed-off-by: Jiang Wu <jwu@cclgroup.com> Signed-off-by: Jiang Wu <JWu@cclgroup.com>

jiangwu300 requested a review from a team as a code owner August 15, 2025 18:52

gemini-code-assist bot reviewed Aug 15, 2025

View reviewed changes

ray-gardener bot added serve Ray Serve Related Issue llm community-contribution Contributed by the community labels Aug 15, 2025

kouroshHakha changed the title ~~Skip safetensor file downloads for runai streamer mode~~ [data.llm] Skip safetensor file downloads for runai streamer mode Aug 18, 2025

kouroshHakha requested changes Aug 18, 2025

View reviewed changes

nrghosh reviewed Aug 19, 2025

View reviewed changes

jiangwu300 and others added 3 commits August 20, 2025 13:30

Skip safetensor file downloads for runai streamer mode

f21d985

Signed-off-by: Jiang Wu <jwu@cclgroup.com>

Making it more generic, addressing comments.

16a75a0

Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com>

Running linting scripts

143422e

Signed-off-by: Jiang Wu <jwu@cclgroup.com>

jiangwu300 force-pushed the runai_fix_pr branch from f5f270b to 143422e Compare August 20, 2025 20:30

kouroshHakha added the go add ONLY when ready to merge, run all tests label Aug 21, 2025

kouroshHakha reviewed Aug 21, 2025

View reviewed changes

Adding tensorizer as a load format to exlcude safetensors download

4008146

Signed-off-by: Jiang Wu <jwu@cclgroup.com>

kouroshHakha approved these changes Aug 21, 2025

View reviewed changes

kouroshHakha merged commit 1818e21 into ray-project:master Aug 24, 2025
5 checks passed

jiangwu300 mentioned this pull request Oct 23, 2025

[serve][llm] Model files still being downloaded with runai_streamer mode #58024

Closed

landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025

[data.llm] Skip safetensor file downloads for runai streamer mode (ra…

05eab1c

…y-project#55662) Signed-off-by: Jiang Wu <jwu@cclgroup.com> Signed-off-by: Jiang Wu <JWu@cclgroup.com>

Conversation

jiangwu300 commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

kouroshHakha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kouroshHakha Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nrghosh left a comment

Choose a reason for hiding this comment

Uh oh!

jiangwu300 commented Aug 19, 2025

Uh oh!

jiangwu300 commented Aug 19, 2025

Uh oh!

kouroshHakha left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kouroshHakha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jiangwu300 commented Aug 15, 2025 •

edited

Loading

kouroshHakha Aug 19, 2025 •

edited

Loading