[Model Runner V2] Feature: Support ElasticEPScalingExecutor for MRv2#43915
[Model Runner V2] Feature: Support ElasticEPScalingExecutor for MRv2#43915yewentao256 wants to merge 7 commits into
ElasticEPScalingExecutor for MRv2#43915Conversation
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
fyi @njhill @itayalroy |
|
Elastic EP already touches too much V1 model-runner internals, which led us to this issue with V2. Instead of now also touching a lot of V2 internals, I think we are better off moving this logic to the model runners themselves. Perhaps they can expose a context manager? Also, it seems like for V2, |
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
@itayalroy make sense, already updates to model runner. For warmup_kernels, this PR doesn't cover, this PR is aimed to fix a current CI issue and should be landed soon, we can have a following up PR instead if there does have an issue (perhaps not as warmup_kernels doesn't overwrite meaningful data) |
tlrmchlsmth
left a comment
There was a problem hiding this comment.
Can we add a test for this? Otherwise looks good to me
yewentao256
left a comment
There was a problem hiding this comment.
@tlrmchlsmth Thanks for the review! Done for adding test
Purpose
VLLM_USE_V2_MODEL_RUNNER=1 pytest tests/distributed/test_elastic_ep.py -k test_elastic_ep_scaling -xvsWill raise error
This is beause the storage of input batch is changed in v2, this PR fixes the issue
Note that ray env is hard to use and causes a lot of troubles, I write this easy test to reproduce quickly:
Originally
Now
================================= 2 passed, 17 warnings in 1.22s ==================================