[Data/LLM] Fixing runai_streamer for vLLM 0.10.2 integration (and Deepseek support)#56906
Conversation
There was a problem hiding this comment.
Code Review
This pull request aims to fix an integration issue with runai_streamer in vLLM, particularly for models like Deepseek that have specific file requirements. The changes correctly identify the need to adjust model downloading and how model paths are passed to the vLLM engine. My review highlights a critical bug where a proposed fix was not correctly implemented, potentially leaving the original issue unresolved. I've also included a couple of medium-severity suggestions to improve logging practices and code comment clarity for better maintainability. Overall, the direction is correct, but the implementation needs a small correction to be effective.
| logger.info("=" * 20) | ||
| logger.info("Initializing vLLM engine stage with model: %s", model) | ||
| logger.info("=" * 20) |
There was a problem hiding this comment.
hao-aaron
left a comment
There was a problem hiding this comment.
Mainly just the unused variable bug. I do have a question which is, if we are using runai streamer/tensorizer do we need to download anything from s3 on data.llm side? My understanding is that AsyncEngineArgs will create ModelConfig, which will call maybe_pull_model_tokenizer_for_runai and pull all non model weight files.
|
|
||
| logger = get_logger(__name__) | ||
|
|
||
| EXCLUDE_TENSORIZER_MODES = ["runai_streamer", "tensorizer"] |
There was a problem hiding this comment.
I think maybe EXCLUDE_SAFETENSORS_MODES makes more sense as a name
There was a problem hiding this comment.
It seems like before we hit the AsyncEngineArgs the code is already failing because it tries to instantiate the model config after the download_model_files function is invoked in vllm_engine_proc.py, so I think the easiest way to fix would be this PR:
model_path = download_model_files(
model_id=config.model_source,
mirror_config=None,
download_model=download_model_mode,
download_extra_files=False,
)
hf_config = transformers.AutoConfig.from_pretrained(
model_path,
trust_remote_code=config.engine_kwargs.get("trust_remote_code", False),
)|
@ahao-anyscale Are you able to review the changes? |
nrghosh
left a comment
There was a problem hiding this comment.
hi @jiangwu300 thanks for this
Please run the pre-commit linter etc to unblock the preliminary unit tests, thanks
eg.
./ci/lint/lint.sh pre_commit
./ci/lint/lint.sh code_format
| # need to still go to the model passed in if we need to exclude safetensors | ||
| # because the model could be a cloud storage that contains the safetensors files that we skipped. |
There was a problem hiding this comment.
more clear comment:
If we are using streaming load formats, we need to pass in self.model which is a remote cloud storage path.
| "runai_streamer", | ||
| "tensorizer", | ||
| ] | ||
| exclude_safetensors = self.engine_kwargs.get("load_format") in EXCLUDE_SAFETENSORS_MODES |
There was a problem hiding this comment.
I would rename this EXCLUDE_SAFETENSORS_MODES to STREAMING_LOAD_FORMATS
Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com>
…ad if load_format is specified as one that needs to skip Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com>
Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com>
Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com>
Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com>
…ode formatting/lint Signed-off-by: Jiang Wu <jwu@cclgroup.com>
3d7a259 to
03c1c90
Compare
|
Added cleaner comments/global variable name, signed off all commits, and ran code linter. All good from my end! |
…epseek support) (#56906) Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com> Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>
…epseek support) (ray-project#56906) Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com> Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>
…epseek support) (ray-project#56906) Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com> Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com> Signed-off-by: xgui <xgui@anyscale.com>
…epseek support) (#56906) Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com> Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
…epseek support) (ray-project#56906) Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com> Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com>
…epseek support) (ray-project#56906) Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com> Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
…epseek support) (ray-project#56906) Signed-off-by: Jiang Wu <JWu@cclgroup.com> Signed-off-by: Jiang Wu <jwu@cclgroup.com> Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
Why are these changes needed?
The original PR #55662 was aimed to fix runai_streamer to skip tensorizer files. However, this integration with vLLM was broken because the local model path ends up being passed into the vLLM engine and that location has no tensorizer files because we skipped it. Furthermore, we missed the first occurrence of model downloading in the vllm_engine_proc.py file. This meant that when testing with normal models like Qwen 3 this PR worked, but for Deepseek which depends on a .py configuration file, this fails.
Related issue number
Closes #56905
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.