[Core] Async scheduling + structured outputs compatibility#26866

Merged

njhill merged 47 commits intovllm-project:mainfrom

njhill:async-sched-struct-output

Nov 1, 2025

Member

njhill commented Oct 15, 2025 •

edited by github-actions bot

Loading

Following similar approach to #23391.

Throughput benchmarks using the same json schema as #23224:

vllm serve Qwen/Qwen3-1.7B --uvicorn-log-level=error  --no-enable-prefix-caching

python3 benchmarks/benchmark_serving_structured_output.py --backend vllm --model Qwen/Qwen3-1.7B --structured-output-ratio $ratio --request-rate 200 --max-concurrency 800 --num-prompts 4000 --json-schema-path ./test3.json  --output-len 128

Test	Executor / pct struct reqs ->	0.0	0.2	0.8	1.0
main	uniproc	103.16	92.57	70.68	69.36
This PR	uniproc	103.19	99.67	87.90	85.28
This PR + `--async-scheduling`	uniproc	132.72	106.08	93.59	90.34
This PR + `--async-scheduling`	multiproc	133.31	114.67	96.08	93.42

This is a breaking change for the model runner and scheduler interfaces.

mergify bot added structured-output v1 tpu kv-connector labels

github-project-automation bot added this to Structured Output

njhill added the suppress-bc-linter label

mergify bot commented Oct 17, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @njhill.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify bot added the needs-rebase label

njhill added 5 commits

October 16, 2025 18:40


          [Core] Async scheduling + structured outputs compatibility

d5d7924

Signed-off-by: Nick Hill <nhill@redhat.com>


          small fixes

Signed-off-by: Nick Hill <nhill@redhat.com>


          misc code improvement

bc33394

Signed-off-by: Nick Hill <nhill@redhat.com>


          simplify with context manager

e5f9634

Signed-off-by: Nick Hill <nhill@redhat.com>


          readability/simplification updates

8cba549

Signed-off-by: Nick Hill <nhill@redhat.com>

njhill force-pushed the async-sched-struct-output branch from 829ef60 to 8cba549 Compare

October 17, 2025 01:40

mergify bot removed the needs-rebase label

njhill added 13 commits

October 16, 2025 18:50


          include sample_tokens() when logging error details

66906ff

Signed-off-by: Nick Hill <nhill@redhat.com>


          reorg logic a bit for readability

ac87699

Signed-off-by: Nick Hill <nhill@redhat.com>


          Merge remote-tracking branch 'origin/main' into async-sched-struct-ou…

885760b

…tput

# Conflicts:
#	vllm/v1/engine/core.py


          update comment

eef1d44

Signed-off-by: Nick Hill <nhill@redhat.com>


          Merge remote-tracking branch 'origin/main' into async-sched-struct-ou…

2d17506

…tput


          TPU updates

ac60de7

Signed-off-by: Nick Hill <nhill@redhat.com>


          add ray compatibility

01eec54

Signed-off-by: Nick Hill <nhill@redhat.com>


          Merge remote-tracking branch 'origin/main' into async-sched-struct-ou…

866a281

…tput


          Merge remote-tracking branch 'refs/remotes/origin/main' into async-sc…

717fbad

…hed-struct-output

# Conflicts:
#	vllm/v1/core/sched/output.py
#	vllm/v1/worker/gpu_worker.py
#	vllm/v1/worker/tpu_worker.py


          fix import and test

0c03cb2

Signed-off-by: Nick Hill <nhill@redhat.com>


          Merge remote-tracking branch 'origin/main' into async-sched-struct-ou…

f6b3318

…tput


          test updates

0127d64

Signed-off-by: Nick Hill <nhill@redhat.com>


          add to e2e async scheduling test

b8208bd

Signed-off-by: Nick Hill <nhill@redhat.com>

njhill mentioned this pull request

[Core] Async Scheduling X Spec Decoding Compatibility #24799

Merged

5 tasks


          Merge remote-tracking branch 'origin/main' into async-sched-struct-ou…

09090a6

…tput

Signed-off-by: Nick Hill <nhill@redhat.com>

# Conflicts:
#	vllm/v1/engine/core.py
#	vllm/v1/executor/abstract.py
#	vllm/v1/executor/ray_distributed_executor.py

py4 mentioned this pull request

[Runner] Separate execute_model and sample_tokens to adapt upstream change. vllm-project/tpu-inference#1003

Merged

njhill mentioned this pull request

[PerfFix] Avoid separate thread for MP executor shm spin #28012

Merged

Contributor

ys950902 commented Nov 7, 2025 •

edited by njhill

Loading

Hi @njhill, I found some performance drop for pipeline-parallism scenarios after your pr merged. Do you have some ideas about it, thanks in advance for your great support.

And below is the command to launch the server:

VLLM_USE_V1=1 VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 python3 -m vllm.entrypoints.openai.api_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --enforce-eager --port 8000 --host 0.0.0.0 -pp 2 --distributed_executor_backend=mp --trust-remote-code --gpu-memory-util=0.9 --no-enable-prefix-caching --max-num-batched-tokens=8192 --disable-log-requests --max-model-len=8192 --block-size 64 --quantization fp8    --dtype=float16   -tp=2

The command for send the request:

python3 -m vllm.entrypoints.cli.main bench serve  --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --ready-check-timeout-sec 1 --dataset-name random --random-input-len=1024 --random-output-len=512 --ignore-eos --port=8000 --host 0.0.0.0 --num-prompt 30 --request-rate inf --backend vllm --trust-remote-code

And the perf drop from 617.26 tok/s to 384.40 tok/s.

Member Author

njhill commented Nov 7, 2025

Hi @njhill, I found some performance drop for pipeline-parallism scenarios after your pr merged. Do you have some ideas about it, thanks in advance for your great support.

And below is the command to launch the server:

VLLM_USE_V1=1 VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 python3 -m vllm.entrypoints.openai.api_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --enforce-eager --port 8000 --host 0.0.0.0 -pp 2 --distributed_executor_backend=mp --trust-remote-code --gpu-memory-util=0.9 --no-enable-prefix-caching --max-num-batched-tokens=8192 --disable-log-requests --max-model-len=8192 --block-size 64 --quantization fp8    --dtype=float16   -tp=2

The command for send the request:

python3 -m vllm.entrypoints.cli.main bench serve  --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --ready-check-timeout-sec 1 --dataset-name random --random-input-len=1024 --random-output-len=512 --ignore-eos --port=8000 --host 0.0.0.0 --num-prompt 30 --request-rate inf --backend vllm --trust-remote-code

And the perf drop from 617.26 tok/s to 384.40 tok/s.

Thanks @ys950902. Which commit exactly were you testing? There was some known perf regression from this PR which was subsequently fixed in #28012. Unfortunately, that PR was just reverted due to a compatibility bug, but the re-apply of it #28319 should be merged to main soon.

It would be great if you could check whether the degraded performance still shows up when that PR is included (if it wasn't already in your test). If so could you open a new issue with the above detail and we can investigate further.

sixiang-google mentioned this pull request

Update execute_model to support async scheduling in vllm vllm-project/tpu-inference#1047

Merged

ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request


          [Core] Async scheduling + structured outputs compatibility (vllm-proj…

8f25269

…ect#26866)

Signed-off-by: Nick Hill <nhill@redhat.com>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request


          [Core] Async scheduling + structured outputs compatibility (vllm-proj…

3f488dd

…ect#26866)

Signed-off-by: Nick Hill <nhill@redhat.com>

weireweire mentioned this pull request

[Bug]: Pipeline parallel doesn't really do the "parallel" among GPUs. #28270

Closed

1 task

weireweire reviewed

View reviewed changes

vllm/v1/engine/core.py

+                              grammar_output = self.scheduler.get_grammar_bitmask(scheduler_output)
+                              # Block-wait for execute to return (continues running async on the GPU).
+                              with self.log_error_detail(scheduler_output):
+                                  exec_result = exec_future.result()

Contributor

weireweire Nov 10, 2025

why do we blocking before batch queue is full? won't this break the batch queue behavior?

Contributor

weireweire Nov 11, 2025 •

edited

Loading

Could you have a look? PP mode will block here, none parallel will happen. Even though here is intent to wait for the model_execute, but the previous sample_tokens task should also in the queue.

Contributor

weireweire Nov 13, 2025

@ys950902 is your PP perf issue solved? is it also related to the blocking here?

Contributor

nvpohanh Nov 14, 2025

@njhill Could you help answer this question? Thanks!

Contributor

weireweire Nov 14, 2025

draft fix: #28286

heheda12345 mentioned this pull request

[BugFix]Fix the issue where there is no parallelism in PP mode #28286

Open

5 tasks

njhill mentioned this pull request

[BugFix] Fix PP performance and PP kv connector output regression #28768

Merged

njhill mentioned this pull request

[RFC]: Restructure the core loop to allow more asynchrony #23233

Closed

1 task

wangxiyuan mentioned this pull request

upgrade to vllm 0.11.2 vllm-project/vllm-ascend#4400

Merged

wangxiyuan added a commit to vllm-project/vllm-ascend that referenced this pull request


          upgrade to vllm 0.11.2 (#4400)

bc69d7c

Bump vLLM version to v0.11.2

What's broken and changed by vLLM:
1. structured_output is broken by
vllm-project/vllm#26866
2. get_mrope_input_positions is broken by
vllm-project/vllm#28399
3. graph mode is broken by
vllm-project/vllm#25110 we'll upgrade torch to
2.8 to fix the problem later
4. embedding is broken by
vllm-project/vllm#27583
5. `get_attn_backend_cls` and attention backend is broken are broken by
vllm-project/vllm#28534
6. spec decode is broken by
vllm-project/vllm#28771
7. sp feature is broken by
vllm-project/vllm#27126
8. mtp is broken by vllm-project/vllm#27922
9. lora is broken by vllm-project/vllm#21068
10. execute_model is broken by
vllm-project/vllm#26866
11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by
vllm-project/vllm#28159
12. kv cahe is broken by vllm-project/vllm#27753
13. dp is broken by vllm-project/vllm#25110

 
What's broken and changed by ourself:
1. qwen vl is broken by vllm-project/vllm#28455
We'll remove model files in the future to avoid this kind of error
2. Engine core is broken by
vllm-project/vllm#23691 We'll remove the patch
file in the future.
3. Ascend scheduler is broken by
vllm-project/vllm#28733 We'll remove ascend
scheudler later.
4. qwen3-next is broken by
vllm-project/vllm#28083 We'll remove model files
in the future to avoid this kind of error
5. qwen vl is broken by vllm-project/vllm#27764.
We'll remove model files in the future

Known issue:
1. ray doesn't work 
2. the accuracy of qwen3-next is not correct
3. qwen3-vl is broken
4. prefix cache+ ascend scheduler + deepseek v2 lite is broken.

Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: 22dimensions <waitingwind@foxmail.com>
Co-authored-by: shen-shanshan <467638484@qq.com>


- vLLM version: v0.11.2

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>

Kurumi5210 pushed a commit to lidenghui1110/vllm-ascend that referenced this pull request


          upgrade to vllm 0.11.2 (vllm-project#4400)

d5af9dc

Bump vLLM version to v0.11.2

What's broken and changed by vLLM:
1. structured_output is broken by
vllm-project/vllm#26866
2. get_mrope_input_positions is broken by
vllm-project/vllm#28399
3. graph mode is broken by
vllm-project/vllm#25110 we'll upgrade torch to
2.8 to fix the problem later
4. embedding is broken by
vllm-project/vllm#27583
5. `get_attn_backend_cls` and attention backend is broken are broken by
vllm-project/vllm#28534
6. spec decode is broken by
vllm-project/vllm#28771
7. sp feature is broken by
vllm-project/vllm#27126
8. mtp is broken by vllm-project/vllm#27922
9. lora is broken by vllm-project/vllm#21068
10. execute_model is broken by
vllm-project/vllm#26866
11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by
vllm-project/vllm#28159
12. kv cahe is broken by vllm-project/vllm#27753
13. dp is broken by vllm-project/vllm#25110

What's broken and changed by ourself:
1. qwen vl is broken by vllm-project/vllm#28455
We'll remove model files in the future to avoid this kind of error
2. Engine core is broken by
vllm-project/vllm#23691 We'll remove the patch
file in the future.
3. Ascend scheduler is broken by
vllm-project/vllm#28733 We'll remove ascend
scheudler later.
4. qwen3-next is broken by
vllm-project/vllm#28083 We'll remove model files
in the future to avoid this kind of error
5. qwen vl is broken by vllm-project/vllm#27764.
We'll remove model files in the future

Known issue:
1. ray doesn't work
2. the accuracy of qwen3-next is not correct
3. qwen3-vl is broken
4. prefix cache+ ascend scheduler + deepseek v2 lite is broken.

Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: 22dimensions <waitingwind@foxmail.com>
Co-authored-by: shen-shanshan <467638484@qq.com>

- vLLM version: v0.11.2

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Signed-off-by: Kurumi5210 <Jaychou1620@Gmail.com>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request


          [Core] Async scheduling + structured outputs compatibility (vllm-proj…

b28ccba

…ect#26866)

Signed-off-by: Nick Hill <nhill@redhat.com>

845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request


          upgrade to vllm 0.11.2 (vllm-project#4400)

c676faf

Bump vLLM version to v0.11.2

What's broken and changed by vLLM:
1. structured_output is broken by
vllm-project/vllm#26866
2. get_mrope_input_positions is broken by
vllm-project/vllm#28399
3. graph mode is broken by
vllm-project/vllm#25110 we'll upgrade torch to
2.8 to fix the problem later
4. embedding is broken by
vllm-project/vllm#27583
5. `get_attn_backend_cls` and attention backend is broken are broken by
vllm-project/vllm#28534
6. spec decode is broken by
vllm-project/vllm#28771
7. sp feature is broken by
vllm-project/vllm#27126
8. mtp is broken by vllm-project/vllm#27922
9. lora is broken by vllm-project/vllm#21068
10. execute_model is broken by
vllm-project/vllm#26866
11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by
vllm-project/vllm#28159
12. kv cahe is broken by vllm-project/vllm#27753
13. dp is broken by vllm-project/vllm#25110

 
What's broken and changed by ourself:
1. qwen vl is broken by vllm-project/vllm#28455
We'll remove model files in the future to avoid this kind of error
2. Engine core is broken by
vllm-project/vllm#23691 We'll remove the patch
file in the future.
3. Ascend scheduler is broken by
vllm-project/vllm#28733 We'll remove ascend
scheudler later.
4. qwen3-next is broken by
vllm-project/vllm#28083 We'll remove model files
in the future to avoid this kind of error
5. qwen vl is broken by vllm-project/vllm#27764.
We'll remove model files in the future

Known issue:
1. ray doesn't work 
2. the accuracy of qwen3-next is not correct
3. qwen3-vl is broken
4. prefix cache+ ascend scheduler + deepseek v2 lite is broken.

Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: 22dimensions <waitingwind@foxmail.com>
Co-authored-by: shen-shanshan <467638484@qq.com>


- vLLM version: v0.11.2

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>

zhandaz mentioned this pull request

Update the docs for --async-scheduling compatibility vllm-project/recipes#131

Merged

Meihan-chen pushed a commit to Meihan-chen/vllm-ascend that referenced this pull request


          upgrade to vllm 0.11.2 (vllm-project#4400)

285874b

Bump vLLM version to v0.11.2

What's broken and changed by vLLM:
1. structured_output is broken by
vllm-project/vllm#26866
2. get_mrope_input_positions is broken by
vllm-project/vllm#28399
3. graph mode is broken by
vllm-project/vllm#25110 we'll upgrade torch to
2.8 to fix the problem later
4. embedding is broken by
vllm-project/vllm#27583
5. `get_attn_backend_cls` and attention backend is broken are broken by
vllm-project/vllm#28534
6. spec decode is broken by
vllm-project/vllm#28771
7. sp feature is broken by
vllm-project/vllm#27126
8. mtp is broken by vllm-project/vllm#27922
9. lora is broken by vllm-project/vllm#21068
10. execute_model is broken by
vllm-project/vllm#26866
11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by
vllm-project/vllm#28159
12. kv cahe is broken by vllm-project/vllm#27753
13. dp is broken by vllm-project/vllm#25110

 
What's broken and changed by ourself:
1. qwen vl is broken by vllm-project/vllm#28455
We'll remove model files in the future to avoid this kind of error
2. Engine core is broken by
vllm-project/vllm#23691 We'll remove the patch
file in the future.
3. Ascend scheduler is broken by
vllm-project/vllm#28733 We'll remove ascend
scheudler later.
4. qwen3-next is broken by
vllm-project/vllm#28083 We'll remove model files
in the future to avoid this kind of error
5. qwen vl is broken by vllm-project/vllm#27764.
We'll remove model files in the future

Known issue:
1. ray doesn't work 
2. the accuracy of qwen3-next is not correct
3. qwen3-vl is broken
4. prefix cache+ ascend scheduler + deepseek v2 lite is broken.

Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: 22dimensions <waitingwind@foxmail.com>
Co-authored-by: shen-shanshan <467638484@qq.com>


- vLLM version: v0.11.2

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>

Clorist33 pushed a commit to Clorist33/vllm-ascend that referenced this pull request


          upgrade to vllm 0.11.2 (vllm-project#4400)

4a1c72a

Bump vLLM version to v0.11.2

What's broken and changed by vLLM:
1. structured_output is broken by
vllm-project/vllm#26866
2. get_mrope_input_positions is broken by
vllm-project/vllm#28399
3. graph mode is broken by
vllm-project/vllm#25110 we'll upgrade torch to
2.8 to fix the problem later
4. embedding is broken by
vllm-project/vllm#27583
5. `get_attn_backend_cls` and attention backend is broken are broken by
vllm-project/vllm#28534
6. spec decode is broken by
vllm-project/vllm#28771
7. sp feature is broken by
vllm-project/vllm#27126
8. mtp is broken by vllm-project/vllm#27922
9. lora is broken by vllm-project/vllm#21068
10. execute_model is broken by
vllm-project/vllm#26866
11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by
vllm-project/vllm#28159
12. kv cahe is broken by vllm-project/vllm#27753
13. dp is broken by vllm-project/vllm#25110

What's broken and changed by ourself:
1. qwen vl is broken by vllm-project/vllm#28455
We'll remove model files in the future to avoid this kind of error
2. Engine core is broken by
vllm-project/vllm#23691 We'll remove the patch
file in the future.
3. Ascend scheduler is broken by
vllm-project/vllm#28733 We'll remove ascend
scheudler later.
4. qwen3-next is broken by
vllm-project/vllm#28083 We'll remove model files
in the future to avoid this kind of error
5. qwen vl is broken by vllm-project/vllm#27764.
We'll remove model files in the future

Known issue:
1. ray doesn't work
2. the accuracy of qwen3-next is not correct
3. qwen3-vl is broken
4. prefix cache+ ascend scheduler + deepseek v2 lite is broken.

Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: 22dimensions <waitingwind@foxmail.com>
Co-authored-by: shen-shanshan <467638484@qq.com>

- vLLM version: v0.11.2

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Signed-off-by: tanqingshan (A) <50050625@china.huawei.com>

Clorist33 pushed a commit to Clorist33/vllm-ascend that referenced this pull request


          upgrade to vllm 0.11.2 (vllm-project#4400)

8531ec7

Bump vLLM version to v0.11.2

What's broken and changed by vLLM:
1. structured_output is broken by
vllm-project/vllm#26866
2. get_mrope_input_positions is broken by
vllm-project/vllm#28399
3. graph mode is broken by
vllm-project/vllm#25110 we'll upgrade torch to
2.8 to fix the problem later
4. embedding is broken by
vllm-project/vllm#27583
5. `get_attn_backend_cls` and attention backend is broken are broken by
vllm-project/vllm#28534
6. spec decode is broken by
vllm-project/vllm#28771
7. sp feature is broken by
vllm-project/vllm#27126
8. mtp is broken by vllm-project/vllm#27922
9. lora is broken by vllm-project/vllm#21068
10. execute_model is broken by
vllm-project/vllm#26866
11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by
vllm-project/vllm#28159
12. kv cahe is broken by vllm-project/vllm#27753
13. dp is broken by vllm-project/vllm#25110

 
What's broken and changed by ourself:
1. qwen vl is broken by vllm-project/vllm#28455
We'll remove model files in the future to avoid this kind of error
2. Engine core is broken by
vllm-project/vllm#23691 We'll remove the patch
file in the future.
3. Ascend scheduler is broken by
vllm-project/vllm#28733 We'll remove ascend
scheudler later.
4. qwen3-next is broken by
vllm-project/vllm#28083 We'll remove model files
in the future to avoid this kind of error
5. qwen vl is broken by vllm-project/vllm#27764.
We'll remove model files in the future

Known issue:
1. ray doesn't work 
2. the accuracy of qwen3-next is not correct
3. qwen3-vl is broken
4. prefix cache+ ascend scheduler + deepseek v2 lite is broken.

Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: 22dimensions <waitingwind@foxmail.com>
Co-authored-by: shen-shanshan <467638484@qq.com>


- vLLM version: v0.11.2

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>

wjunLu mentioned this pull request

[Main2Main] Upgrade vllm commit to 1230 vllm-project/vllm-ascend#5495

Merged

wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request


          [Main2Main] Upgrade vllm commit to 1230 (#5495)

3c2d3e5

### What this PR does / why we need it?

Upgrade vllm commit to 1230

Affected by vllm-project/vllm#27614 (and the
core PR vllm-project/vllm#26866), we have to
make the following changes:

1. Modify `tests/e2e/multicard/test_aclgraph_capture_replay.py` to keep
compatible with both vllm version of `v0.13.0` and latest main commitID,
while vllm enables async scheduling by default
2. Skip `test_guided_decoding.py` due to xgrammar errors
(#5524)

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@45c1ca1

---------

Signed-off-by: wjunLu <wjunlu217@gmail.com>

wangyibo1005 pushed a commit to wangyibo1005/vllm-ascend that referenced this pull request


          [Main2Main] Upgrade vllm commit to 1230 (vllm-project#5495)

842acc9

### What this PR does / why we need it?

Upgrade vllm commit to 1230

Affected by vllm-project/vllm#27614 (and the
core PR vllm-project/vllm#26866), we have to
make the following changes:

1. Modify `tests/e2e/multicard/test_aclgraph_capture_replay.py` to keep
compatible with both vllm version of `v0.13.0` and latest main commitID,
while vllm enables async scheduling by default
2. Skip `test_guided_decoding.py` due to xgrammar errors
(vllm-project#5524)

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@45c1ca1

---------

Signed-off-by: wjunLu <wjunlu217@gmail.com>

Rozwel-dx pushed a commit to Rozwel-dx/vllm-ascend that referenced this pull request


          [Main2Main] Upgrade vllm commit to 1230 (vllm-project#5495)

bc506e5

### What this PR does / why we need it?

Upgrade vllm commit to 1230

Affected by vllm-project/vllm#27614 (and the
core PR vllm-project/vllm#26866), we have to
make the following changes:

1. Modify `tests/e2e/multicard/test_aclgraph_capture_replay.py` to keep
compatible with both vllm version of `v0.13.0` and latest main commitID,
while vllm enables async scheduling by default
2. Skip `test_guided_decoding.py` due to xgrammar errors
(vllm-project#5524)

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@45c1ca1

---------

Signed-off-by: wjunLu <wjunlu217@gmail.com>

rebel-jaehwang mentioned this pull request

fix: port v0.12 scheduler code RBLN-SW/vllm-rbln#254

Merged

12 tasks

ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request


          [Main2Main] Upgrade vllm commit to 1230 (vllm-project#5495)

e33b4e4

### What this PR does / why we need it?

Upgrade vllm commit to 1230

Affected by vllm-project/vllm#27614 (and the
core PR vllm-project/vllm#26866), we have to
make the following changes:

1. Modify `tests/e2e/multicard/test_aclgraph_capture_replay.py` to keep
compatible with both vllm version of `v0.13.0` and latest main commitID,
while vllm enables async scheduling by default
2. Skip `test_guided_decoding.py` due to xgrammar errors
(vllm-project#5524)

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@45c1ca1

---------

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request


          [Main2Main] Upgrade vllm commit to 1230 (vllm-project#5495)

3953cf0

### What this PR does / why we need it?

Upgrade vllm commit to 1230

Affected by vllm-project/vllm#27614 (and the
core PR vllm-project/vllm#26866), we have to
make the following changes:

1. Modify `tests/e2e/multicard/test_aclgraph_capture_replay.py` to keep
compatible with both vllm version of `v0.13.0` and latest main commitID,
while vllm enables async scheduling by default
2. Skip `test_guided_decoding.py` due to xgrammar errors
(vllm-project#5524)

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@45c1ca1

---------

Signed-off-by: wjunLu <wjunlu217@gmail.com>

ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request


          [Main2Main] Upgrade vllm commit to 1230 (vllm-project#5495)

559d414

### What this PR does / why we need it?

Upgrade vllm commit to 1230

Affected by vllm-project/vllm#27614 (and the
core PR vllm-project/vllm#26866), we have to
make the following changes:

1. Modify `tests/e2e/multicard/test_aclgraph_capture_replay.py` to keep
compatible with both vllm version of `v0.13.0` and latest main commitID,
while vllm enables async scheduling by default
2. Skip `test_guided_decoding.py` due to xgrammar errors
(vllm-project#5524)

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@45c1ca1

---------

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

chatgpt-codex-connector[bot] chatgpt-codex-connector[bot] left review comments

WoosukKwon WoosukKwon approved these changes

NickLucche Awaiting requested review from NickLucche NickLucche is a code owner

ApostaC Awaiting requested review from ApostaC ApostaC is a code owner

robertgshaw2-redhat Awaiting requested review from robertgshaw2-redhat robertgshaw2-redhat is a code owner

ywang96 Awaiting requested review from ywang96 ywang96 is a code owner

comaniac Awaiting requested review from comaniac

alexm-redhat Awaiting requested review from alexm-redhat alexm-redhat is a code owner

heheda12345 Awaiting requested review from heheda12345 heheda12345 is a code owner

mgoin Awaiting requested review from mgoin mgoin is a code owner

russellb Awaiting requested review from russellb russellb is a code owner

aarnphm Awaiting requested review from aarnphm aarnphm is a code owner

benchislett Awaiting requested review from benchislett benchislett is a code owner

chaunceyjiang Awaiting requested review from chaunceyjiang

+2 more reviewers

weireweire weireweire left review comments

nvpohanh nvpohanh left review comments

Labels

frontend kv-connector ready structured-output suppress-bc-linter tpu v1