Added Ray-Serve Config For LLMs#3517
Conversation
kouroshHakha
left a comment
There was a problem hiding this comment.
The config looks good to me. (tho I haven't run the config myself)
|
Should I also add config for autoscaling? |
|
Chatted with @Blaze-DSP offline |
|
What is the plan? @kevin85421 |
Add a doc in Ray repo and make this example simpler (e.g. remove LoRA). |
|
I have updated the ray serve llm config and added doc for it in the ray repo. PR For Doc.: ray-serve llm doc |
|
Going to give this a shot on my setup in the next week-ish. |
There was a problem hiding this comment.
Worked great on my setup. Thanks for the PR @Blaze-DSP!
|
@eicherseiji could you push to this branch directly to fix CI issues so that I can merge this PR? Thanks! |
|
Signed-off-by: DPatel_7 <dpatel@gocommotion.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Co-authored-by: DPatel_7 <dpatel@gocommotion.com> Co-authored-by: Seiji Eicher <seiji@anyscale.com>
| limits: | ||
| cpu: 32 | ||
| memory: 32Gi | ||
| nvidia.com/gpu: "4" |
There was a problem hiding this comment.
I know it will depend on the GPU type, but does Qwen/Qwen2.5-7B-Instruct really need 4 GPUs? What GPUs did you test with?
There was a problem hiding this comment.
Ah I noticed that tensor parallelism is not set, so it must only be using 1 GPU, I suggest updating this example to only request 1 GPU for each worker
Co-authored-by: DPatel_7 <dpatel@gocommotion.com> Co-authored-by: Seiji Eicher <seiji@anyscale.com>
Co-authored-by: DPatel_7 <dpatel@gocommotion.com> Co-authored-by: Seiji Eicher <seiji@anyscale.com>
Added Example Config For Ray-Serve LLM