[serve][llm] Disable model downloading for RunAI streamer, introduce optimized download function#57854
Conversation
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request introduces two main improvements: disabling model downloads for streaming loaders like runai_streamer and optimizing cloud downloads by parallelizing them. The logic to conditionally disable downloads based on load_format is well-implemented and tested. The new parallel download function download_files_parallel correctly uses pyarrow for efficient transfers. However, I've found a critical typo that would cause runtime failures and a high-severity issue in error handling that could lead to silent partial downloads. I've also suggested a minor clarification to a docstring. Once these points are addressed, the PR will be in great shape.
7f73524 to
1405e2a
Compare
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
…ions Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
failing lmcache bug unrelated to this PR, see LMCache/LMCache#1768 |
ruisearch42
left a comment
There was a problem hiding this comment.
Some readability & cleanness suggestions
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
ruisearch42
left a comment
There was a problem hiding this comment.
Thanks for addressing the comments
|
Hehe Why did no one press go label :D |
…optimized download function (ray-project#57854) Signed-off-by: ahao-anyscale <ahao@anyscale.com>
…optimized download function (ray-project#57854) Signed-off-by: ahao-anyscale <ahao@anyscale.com>
…optimized download function (ray-project#57854) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
…optimized download function (ray-project#57854) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
…optimized download function (ray-project#57854) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
Description
When running serve llm with runai streamer, current codepath unnecessarily downloads model first. Also, current model download function is not parallelized.
Changes:
worker_node_download_modeldepending onload_formatin LLMConfig.engine_kwargsdownload_model_parallelfunction which is used byCloudDownloadercallback