[Core] Whisper support `torch.compile` by NickLucche · Pull Request #30385 · vllm-project/vllm

NickLucche · 2025-12-10T11:11:31Z

This PR is yet another Whisper performance optimization, adding support for torch.compile during decoding step.
It follows a very similar approach to #30072 (and should also land after that to ensure best defaults) in which only the 2nd decoder steps onward are compiled.
This is due to the fact that step0 in enc-dec models computes and caches crossattn KVs, requiring encoder_output as additional input and hence generating a different graph from the other steps.

Updated profiling:

Considerations

I've attempted to add the "2nd decoder step" selection logic directly in the model runner (_model_forward).
I am aware that _model_forward is currently used by OOT runners (#25084), although no "official" runner interface contract is maintained to ensure compatibility (unlike connectors to name one), which makes maintaining these kinds of methods without breaking external usage quite hairy.
As this change may affect those runners I am also pinging @patrick-toulme .

Happy to change to a less invasive logic if we find a cleaner way to do it @LucasWilkinson .
Other options are adding the flag to attn_metadata and then retrieving the metadata from the support_torch_compile wrapper OR hacking the WhisperDecoder __call__ (definitely not nice).

UPDATE:
Following @ProExpertProg suggestion, I've moved to implementing the alternative option in which the skip_compiled logic is generically handled inside the compile decorator.
So no change to _model_forward is actually needed.

Related PRs #29421 #30072

Test with

Compilation is enabled by default:

vllm serve openai/whisper-large-v3-turbo

cc @DarkLight1337 @robertgshaw2-redhat

chatgpt-codex-connector · 2025-12-10T11:11:41Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

gemini-code-assist

Code Review

This pull request introduces torch.compile support for the Whisper model's decoder, aimed at improving performance during the decoding phase. The implementation cleverly compiles only the decoding steps from the second step onwards, correctly identifying that the first step has a different computation graph due to cross-attention key-value cache generation. This is achieved by adding a force_eager flag to the _model_forward method in GPUModelRunner, which is conditionally set based on the presence of encoder inputs. The changes are well-designed, backward-compatible, and the generic approach in GPUModelRunner could be beneficial for other encoder-decoder models in the future. The code appears to be correct and I could not identify any issues of high or critical severity.

DarkLight1337

I think we should have at least one test that uses Whisper with CUDA graph

patrick-toulme · 2025-12-10T18:11:05Z

Changes look fine to me. All you are doing is adding a gated option to run eager mode in model_forward. Any downstream consumers who are subclassing just have to add that variable now. LGTM

robertgshaw2-redhat · 2025-12-12T03:39:36Z

note: this generates incorrect answers

mergify · 2025-12-18T20:04:20Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @NickLucche.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

NickLucche · 2026-01-13T15:45:09Z

@robertgshaw2-redhat reviving this PR, are there any blockers? Happy to look at accuracy issues if any, can't spot them from usual tests

jikunshang · 2026-01-14T03:33:33Z

a noob question: I noticed that this only enable Decoder part. Is there any blocker to enable torch.compile + Encoder part? just like mm + compile suport https://docs.vllm.ai/en/latest/design/torch_compile_multimodal/

NickLucche · 2026-01-14T10:32:15Z

@jikunshang #30549

DarkLight1337 · 2026-01-14T10:36:06Z

Let's wait for @robertgshaw2-redhat to elaborate on the accuracy issues first

ProExpertProg · 2026-01-14T15:03:55Z

@@ -2919,6 +2920,17 @@ def _model_forward(
        Returns:
            Model output tensor
        """
+
+        if force_eager:


Let's just add force_eager to ForwardContext and read that in the compile decorator?

There is an option for enable_if in the support_torch_compile decorator - perhaps we can leverage that?

see

vllm/vllm/model_executor/models/qwen2_5_vl.py

Line 522 in 3a61232

enable_if=should_torch_compile_mm_vit,

for example usage

that flag is for optional compilation, here I need to always compile, but optionally do eager (aka not call the compiled graph)

NickLucche · 2026-01-14T18:08:33Z

I've implemented @ProExpertProg suggested approach and updated the description.

ProExpertProg · 2026-01-16T17:58:15Z

@@ -156,7 +156,9 @@ def test_wer_correctness(
    model_name, dataset_repo, expected_wer, n_examples=-1, max_concurrent_request=None
 ):
    # TODO refactor to use `ASRDataset`
-    with RemoteOpenAIServer(model_name, ["--enforce-eager"]) as remote_server:
+    with RemoteOpenAIServer(


-cc.mode=NONE if you want to just disable compilation but keep cuda graphs

will follow up with more cg+compilation tests (which is the default vllm serve setup)

Signed-off-by: NickLucche <nlucches@redhat.com>

### What this PR does / why we need it? 1. ✅ Upgrade vllm commit to: 0115 (8471b27) Modify import paths due to the refactors： vllm-project/vllm#32245 vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21034239336/job/60490156965?pr=5913 2. ✅Upgrade vllm commit to: 0119 (9a1f16d) Fix `WorkerProc.__init__() missing 1 required positional argument: 'is_driver_worker'` due to vllm-project/vllm#28506 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21156263050/job/60841668755?5569 3. ✅Upgrade vllm commit to: 0120(148117e) 1. Add `skip_compiled` param in `set_forward_context` due to vllm-project/vllm#30385 2. Modify `tests/ut/spec_decode/test_eagle_proposer.py` due to vllm-project/vllm#24322 change `self.max_num_tokens = vllm_config.scheduler_config.max_num_batched_tokens + max_batch_size` 3. Modify UT import paths due to the refactors：vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21204851770/job/60999046946 4. ✅Upgrade vllm commit to: 0121(f23fb5a) 1. vLLM switched `uses_mrope` from target to draft model config, making `positions`/`mrope_positions` mutually exclusive, breaking vllm-ascend's direct self.positions access and tests missing `draft_model_config.uses_mrope`. vllm-project/vllm#32048 2. Moved bs_to_padded_graph_size from CompilationConfig to CudagraphDispatcher due to the refactor vllm-project/vllm#30143 3. Remove unused `maybe_setup_kv_connector` due to vllm-project/vllm#32077 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21217728738/job/61043738834 6. ✅Upgrade vllm commit to: 0122(8ebf271) Updating FusedMoEParallelConfig (added enable_eplb) and FusedMoEConfig due to vllm-project/vllm#32414 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21249922546/job/61148613054 8. ✅Upgrade vllm commit to: 0123(dc917cc) Setting temperature=0.0 due to the removal of the default temperature value in vllm-project/vllm#32723 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21280796875 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: wjunLu <wjunlu217@gmail.com>

### What this PR does / why we need it? 1. ✅ Upgrade vllm commit to: 0115 (8471b27) Modify import paths due to the refactors： vllm-project/vllm#32245 vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21034239336/job/60490156965?pr=5913 2. ✅Upgrade vllm commit to: 0119 (9a1f16d) Fix `WorkerProc.__init__() missing 1 required positional argument: 'is_driver_worker'` due to vllm-project/vllm#28506 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21156263050/job/60841668755?5569 3. ✅Upgrade vllm commit to: 0120(148117e) 1. Add `skip_compiled` param in `set_forward_context` due to vllm-project/vllm#30385 2. Modify `tests/ut/spec_decode/test_eagle_proposer.py` due to vllm-project/vllm#24322 change `self.max_num_tokens = vllm_config.scheduler_config.max_num_batched_tokens + max_batch_size` 3. Modify UT import paths due to the refactors：vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21204851770/job/60999046946 4. ✅Upgrade vllm commit to: 0121(f23fb5a) 1. vLLM switched `uses_mrope` from target to draft model config, making `positions`/`mrope_positions` mutually exclusive, breaking vllm-ascend's direct self.positions access and tests missing `draft_model_config.uses_mrope`. vllm-project/vllm#32048 2. Moved bs_to_padded_graph_size from CompilationConfig to CudagraphDispatcher due to the refactor vllm-project/vllm#30143 3. Remove unused `maybe_setup_kv_connector` due to vllm-project/vllm#32077 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21217728738/job/61043738834 6. ✅Upgrade vllm commit to: 0122(8ebf271) Updating FusedMoEParallelConfig (added enable_eplb) and FusedMoEConfig due to vllm-project/vllm#32414 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21249922546/job/61148613054 8. ✅Upgrade vllm commit to: 0123(dc917cc) Setting temperature=0.0 due to the removal of the default temperature value in vllm-project/vllm#32723 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21280796875 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: momochenchuw <chenchuw@huawei.com>

### What this PR does / why we need it? 1. ✅ Upgrade vllm commit to: 0115 (8471b27) Modify import paths due to the refactors： vllm-project/vllm#32245 vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21034239336/job/60490156965?pr=5913 2. ✅Upgrade vllm commit to: 0119 (9a1f16d) Fix `WorkerProc.__init__() missing 1 required positional argument: 'is_driver_worker'` due to vllm-project/vllm#28506 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21156263050/job/60841668755?5569 3. ✅Upgrade vllm commit to: 0120(148117e) 1. Add `skip_compiled` param in `set_forward_context` due to vllm-project/vllm#30385 2. Modify `tests/ut/spec_decode/test_eagle_proposer.py` due to vllm-project/vllm#24322 change `self.max_num_tokens = vllm_config.scheduler_config.max_num_batched_tokens + max_batch_size` 3. Modify UT import paths due to the refactors：vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21204851770/job/60999046946 4. ✅Upgrade vllm commit to: 0121(f23fb5a) 1. vLLM switched `uses_mrope` from target to draft model config, making `positions`/`mrope_positions` mutually exclusive, breaking vllm-ascend's direct self.positions access and tests missing `draft_model_config.uses_mrope`. vllm-project/vllm#32048 2. Moved bs_to_padded_graph_size from CompilationConfig to CudagraphDispatcher due to the refactor vllm-project/vllm#30143 3. Remove unused `maybe_setup_kv_connector` due to vllm-project/vllm#32077 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21217728738/job/61043738834 6. ✅Upgrade vllm commit to: 0122(8ebf271) Updating FusedMoEParallelConfig (added enable_eplb) and FusedMoEConfig due to vllm-project/vllm#32414 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21249922546/job/61148613054 8. ✅Upgrade vllm commit to: 0123(dc917cc) Setting temperature=0.0 due to the removal of the default temperature value in vllm-project/vllm#32723 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21280796875 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? 1. ✅ Upgrade vllm commit to: 0115 (8471b27) Modify import paths due to the refactors： vllm-project/vllm#32245 vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21034239336/job/60490156965?pr=5913 2. ✅Upgrade vllm commit to: 0119 (9a1f16d) Fix `WorkerProc.__init__() missing 1 required positional argument: 'is_driver_worker'` due to vllm-project/vllm#28506 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21156263050/job/60841668755?5569 3. ✅Upgrade vllm commit to: 0120(148117e) 1. Add `skip_compiled` param in `set_forward_context` due to vllm-project/vllm#30385 2. Modify `tests/ut/spec_decode/test_eagle_proposer.py` due to vllm-project/vllm#24322 change `self.max_num_tokens = vllm_config.scheduler_config.max_num_batched_tokens + max_batch_size` 3. Modify UT import paths due to the refactors：vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21204851770/job/60999046946 4. ✅Upgrade vllm commit to: 0121(f23fb5a) 1. vLLM switched `uses_mrope` from target to draft model config, making `positions`/`mrope_positions` mutually exclusive, breaking vllm-ascend's direct self.positions access and tests missing `draft_model_config.uses_mrope`. vllm-project/vllm#32048 2. Moved bs_to_padded_graph_size from CompilationConfig to CudagraphDispatcher due to the refactor vllm-project/vllm#30143 3. Remove unused `maybe_setup_kv_connector` due to vllm-project/vllm#32077 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21217728738/job/61043738834 6. ✅Upgrade vllm commit to: 0122(8ebf271) Updating FusedMoEParallelConfig (added enable_eplb) and FusedMoEConfig due to vllm-project/vllm#32414 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21249922546/job/61148613054 8. ✅Upgrade vllm commit to: 0123(dc917cc) Setting temperature=0.0 due to the removal of the default temperature value in vllm-project/vllm#32723 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21280796875 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: wjunLu <wjunlu217@gmail.com>

### What this PR does / why we need it? 1. ✅ Upgrade vllm commit to: 0115 (8471b27) Modify import paths due to the refactors： vllm-project/vllm#32245 vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21034239336/job/60490156965?pr=5913 2. ✅Upgrade vllm commit to: 0119 (9a1f16d) Fix `WorkerProc.__init__() missing 1 required positional argument: 'is_driver_worker'` due to vllm-project/vllm#28506 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21156263050/job/60841668755?5569 3. ✅Upgrade vllm commit to: 0120(148117e) 1. Add `skip_compiled` param in `set_forward_context` due to vllm-project/vllm#30385 2. Modify `tests/ut/spec_decode/test_eagle_proposer.py` due to vllm-project/vllm#24322 change `self.max_num_tokens = vllm_config.scheduler_config.max_num_batched_tokens + max_batch_size` 3. Modify UT import paths due to the refactors：vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21204851770/job/60999046946 4. ✅Upgrade vllm commit to: 0121(f23fb5a) 1. vLLM switched `uses_mrope` from target to draft model config, making `positions`/`mrope_positions` mutually exclusive, breaking vllm-ascend's direct self.positions access and tests missing `draft_model_config.uses_mrope`. vllm-project/vllm#32048 2. Moved bs_to_padded_graph_size from CompilationConfig to CudagraphDispatcher due to the refactor vllm-project/vllm#30143 3. Remove unused `maybe_setup_kv_connector` due to vllm-project/vllm#32077 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21217728738/job/61043738834 6. ✅Upgrade vllm commit to: 0122(8ebf271) Updating FusedMoEParallelConfig (added enable_eplb) and FusedMoEConfig due to vllm-project/vllm#32414 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21249922546/job/61148613054 8. ✅Upgrade vllm commit to: 0123(dc917cc) Setting temperature=0.0 due to the removal of the default temperature value in vllm-project/vllm#32723 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21280796875 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? 1. ✅ Upgrade vllm commit to: 0115 (8471b27) Modify import paths due to the refactors： vllm-project/vllm#32245 vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21034239336/job/60490156965?pr=5913 2. ✅Upgrade vllm commit to: 0119 (9a1f16d) Fix `WorkerProc.__init__() missing 1 required positional argument: 'is_driver_worker'` due to vllm-project/vllm#28506 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21156263050/job/60841668755?5569 3. ✅Upgrade vllm commit to: 0120(148117e) 1. Add `skip_compiled` param in `set_forward_context` due to vllm-project/vllm#30385 2. Modify `tests/ut/spec_decode/test_eagle_proposer.py` due to vllm-project/vllm#24322 change `self.max_num_tokens = vllm_config.scheduler_config.max_num_batched_tokens + max_batch_size` 3. Modify UT import paths due to the refactors：vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21204851770/job/60999046946 4. ✅Upgrade vllm commit to: 0121(f23fb5a) 1. vLLM switched `uses_mrope` from target to draft model config, making `positions`/`mrope_positions` mutually exclusive, breaking vllm-ascend's direct self.positions access and tests missing `draft_model_config.uses_mrope`. vllm-project/vllm#32048 2. Moved bs_to_padded_graph_size from CompilationConfig to CudagraphDispatcher due to the refactor vllm-project/vllm#30143 3. Remove unused `maybe_setup_kv_connector` due to vllm-project/vllm#32077 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21217728738/job/61043738834 6. ✅Upgrade vllm commit to: 0122(8ebf271) Updating FusedMoEParallelConfig (added enable_eplb) and FusedMoEConfig due to vllm-project/vllm#32414 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21249922546/job/61148613054 8. ✅Upgrade vllm commit to: 0123(dc917cc) Setting temperature=0.0 due to the removal of the default temperature value in vllm-project/vllm#32723 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21280796875 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: wjunLu <wjunlu217@gmail.com>

Signed-off-by: NickLucche <nlucches@redhat.com>

### What this PR does / why we need it? 1. ✅ Upgrade vllm commit to: 0115 (8471b27) Modify import paths due to the refactors： vllm-project/vllm#32245 vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21034239336/job/60490156965?pr=5913 2. ✅Upgrade vllm commit to: 0119 (9a1f16d) Fix `WorkerProc.__init__() missing 1 required positional argument: 'is_driver_worker'` due to vllm-project/vllm#28506 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21156263050/job/60841668755?5569 3. ✅Upgrade vllm commit to: 0120(148117e) 1. Add `skip_compiled` param in `set_forward_context` due to vllm-project/vllm#30385 2. Modify `tests/ut/spec_decode/test_eagle_proposer.py` due to vllm-project/vllm#24322 change `self.max_num_tokens = vllm_config.scheduler_config.max_num_batched_tokens + max_batch_size` 3. Modify UT import paths due to the refactors：vllm-project/vllm#32060 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21204851770/job/60999046946 4. ✅Upgrade vllm commit to: 0121(f23fb5a) 1. vLLM switched `uses_mrope` from target to draft model config, making `positions`/`mrope_positions` mutually exclusive, breaking vllm-ascend's direct self.positions access and tests missing `draft_model_config.uses_mrope`. vllm-project/vllm#32048 2. Moved bs_to_padded_graph_size from CompilationConfig to CudagraphDispatcher due to the refactor vllm-project/vllm#30143 3. Remove unused `maybe_setup_kv_connector` due to vllm-project/vllm#32077 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21217728738/job/61043738834 6. ✅Upgrade vllm commit to: 0122(8ebf271) Updating FusedMoEParallelConfig (added enable_eplb) and FusedMoEConfig due to vllm-project/vllm#32414 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21249922546/job/61148613054 8. ✅Upgrade vllm commit to: 0123(dc917cc) Setting temperature=0.0 due to the removal of the default temperature value in vllm-project/vllm#32723 Test result: https://github.com/vllm-project/vllm-ascend/actions/runs/21280796875 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Co-authored-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: nanxing <1014662416@qq.com>

Signed-off-by: NickLucche <nlucches@redhat.com>

mergify Bot added the v1 label Dec 10, 2025

gemini-code-assist Bot reviewed Dec 10, 2025

View reviewed changes

DarkLight1337 reviewed Dec 10, 2025

View reviewed changes

This was referenced Dec 11, 2025

[Bug]: The inference speed of the whisper model under the v1 engine is much slower than v0 #24946

Closed

[CI] Whisper logprobs tests #30504

Merged

NickLucche mentioned this pull request Dec 16, 2025

[Core] WhisperEncoder support torch.compile #30549

Open

NickLucche force-pushed the whisper-compile branch from b211ce8 to 3922aaa Compare December 16, 2025 17:40

mergify Bot added the needs-rebase label Dec 18, 2025

NickLucche force-pushed the whisper-compile branch from 3922aaa to 44b7a00 Compare January 13, 2026 15:32

mergify Bot removed the needs-rebase label Jan 13, 2026

cursor Bot reviewed Jan 13, 2026

View reviewed changes

Comment thread vllm/model_executor/models/whisper.py

DarkLight1337 requested review from Isotr0py and ywang96 January 14, 2026 10:35

ProExpertProg reviewed Jan 14, 2026

View reviewed changes

NickLucche requested review from youkaichao and zou3519 as code owners January 14, 2026 18:02

ProExpertProg approved these changes Jan 14, 2026

View reviewed changes

NickLucche enabled auto-merge (squash) January 15, 2026 10:31

github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 15, 2026

JakubCerven mentioned this pull request Jan 15, 2026

[Feature]: FlexAttention + encoder_decoder support #25172

Closed

1 task

NickLucche force-pushed the whisper-compile branch from 5e45c90 to 60c956c Compare January 16, 2026 13:23

ProExpertProg reviewed Jan 16, 2026

View reviewed changes

NickLucche added 4 commits January 19, 2026 08:21

init

6b92996

Signed-off-by: NickLucche <nlucches@redhat.com>

rebase cruft

ddeedb5

Signed-off-by: NickLucche <nlucches@redhat.com>

move force_eager logic to forwardcontext+decorator

c047e31

Signed-off-by: NickLucche <nlucches@redhat.com>

increase startup timeout

3366e2e

Signed-off-by: NickLucche <nlucches@redhat.com>

NickLucche force-pushed the whisper-compile branch from 0b4ba1b to 3366e2e Compare January 19, 2026 08:21

NickLucche merged commit 74c583b into vllm-project:main Jan 19, 2026
58 checks passed

gopalsarda pushed a commit to gopalsarda/vllm that referenced this pull request Jan 20, 2026

[Core] Whisper support torch.compile (vllm-project#30385)

8296a7b

Signed-off-by: NickLucche <nlucches@redhat.com>

Meihan-chen mentioned this pull request Jan 21, 2026

[Main2Main] Upgrade vllm commit to 0120 vllm-project/vllm-ascend#6040

Closed

Meihan-chen mentioned this pull request Jan 26, 2026

[Main2Main] Upgrade vllm commit to 0123 vllm-project/vllm-ascend#6169

Merged

mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026

[Core] Whisper support torch.compile (vllm-project#30385)

1e04ae3

Signed-off-by: NickLucche <nlucches@redhat.com>

my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026

[Core] Whisper support torch.compile (vllm-project#30385)

32e83e6

Signed-off-by: NickLucche <nlucches@redhat.com>

my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026

[Core] Whisper support torch.compile (vllm-project#30385)

0bc3e87

Signed-off-by: NickLucche <nlucches@redhat.com>

0826joyce pushed a commit to 0826joyce/vllm-serving-optimization that referenced this pull request May 19, 2026

[Core] Whisper support torch.compile (vllm-project#30385)

84907ed

Signed-off-by: NickLucche <nlucches@redhat.com>

Uh oh!

Conversation

NickLucche commented Dec 10, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Considerations

Test with

Uh oh!

chatgpt-codex-connector Bot commented Dec 10, 2025

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

patrick-toulme commented Dec 10, 2025

Uh oh!

robertgshaw2-redhat commented Dec 12, 2025

Uh oh!

mergify Bot commented Dec 18, 2025

Uh oh!

Uh oh!

NickLucche commented Jan 13, 2026

Uh oh!

jikunshang commented Jan 14, 2026

Uh oh!

NickLucche commented Jan 14, 2026

Uh oh!

DarkLight1337 commented Jan 14, 2026

Uh oh!

ProExpertProg Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

Lucaskabela Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

NickLucche Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

NickLucche commented Jan 14, 2026

Uh oh!

ProExpertProg Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

NickLucche Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

NickLucche commented Dec 10, 2025 •

edited by github-actions Bot

Loading