[Serve.llm] Disable TP=2 VLM batch test#53825
Merged
kouroshHakha merged 2 commits intoray-project:masterfrom Jun 16, 2025
Merged
[Serve.llm] Disable TP=2 VLM batch test#53825kouroshHakha merged 2 commits intoray-project:masterfrom
kouroshHakha merged 2 commits intoray-project:masterfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR temporarily disables the TP=2 VLM batch test due to a known torch_compile_cache issue (ray#53824).
- Commented out the Qwen/Qwen2.5-VL-3B-Instruct test case
- Added a TODO to re-enable the test once the underlying issue is resolved
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
cc3abea to
22a04b1
Compare
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Comment on lines
+196
to
+199
| # todo(seiji): Commenting out due to https://github.com/ray-project/ray/issues/53824 | ||
| # Need to follow up once torch_compile_cache issue is fixed or PyTorch 2.8 | ||
| if model_source == "mistral-community/pixtral-12b": | ||
| pytest.skip("Skipping test due to torch_compile_cache issue") |
Contributor
There was a problem hiding this comment.
question: Would enforce_eager=True in the engine kwargs make it work? I think that'd be better than skipping?
Contributor
Author
There was a problem hiding this comment.
Gave it a shot, failed with error RuntimeError: Worker failed with error 'NoTritonConfigsError: No valid triton configs.
Traceback:
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) `ptxas` stderr: [repeated 8x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ptxas /tmp/tmpbc993top.ptx, line 113; fatal : Parsing error near '.': syntax error [repeated 8x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ptxas fatal : Ptx assembly aborted due to errors [repeated 8x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) Repro command: /home/ray/anaconda3/lib/python3.11/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_89 /tmp/tmpbc993top.ptx -o /tmp/tmpbc993top.ptx.o [repeated 8x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) Traceback (most recent call last): [repeated 44x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) (VllmWorker rank=0 pid=12628) [rank0]:E0616 09:44:59.624000 12628 site-packages/torch/_inductor/runtime/triton_heuristics.py:539] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [repeated 6x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) Task exception was never retrieved [repeated 25x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) future: <Task finished name='Task-4' coro=<vLLMEngineWrapper.generate_async() done, defined at /home/ray/anaconda3/lib/python3.11/site-packages/ray/llm/_internal/batch/stages/vllm_engine_stage.py:303> exception=EngineDeadError('EngineCore encountered an issue. See stack trace (above) for the root cause.')> [repeated 25x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/llm/_internal/batch/stages/vllm_engine_stage.py", line 317, in generate_async [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) output = await self._generate_async(request) [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/llm/_internal/batch/stages/vllm_engine_stage.py", line 399, in generate_async_v1 [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) async for request_output in stream: [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 315, in generate [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) out = q.get_nowait() or await q.get() [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ^^^^^^^^^^^^^ [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/output_processor.py", line 51, in get [repeated 432x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) raise output [repeated 433x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 366, in output_handler [repeated 26x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) outputs = await engine_core.get_output_async() [repeated 26x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [repeated 30x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 806, in get_output_async [repeated 26x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) raise self._format_exception(outputs) from None [repeated 26x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause. [repeated 26x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "python/ray/_raylet.pyx", line 1392, in ray._raylet.execute_streaming_generator_sync [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) for output in gen: [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/data/_internal/execution/operators/actor_pool_map_operator.py", line 469, in submit [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) yield from _map_task( [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_operator.py", line 557, in _map_task [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) for b_out in map_transformer.apply_transform(iter(blocks), ctx): [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 327, in __call__ [repeated 48x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) for data in iter: [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 211, in _udf_timed_iter [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) output = next(input) [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ^^^^^^^^^^^ [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) yield from self._batch_fn(input, ctx) [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 485, in transform_fn [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) raise out_item [repeated 24x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) Process EngineCore_0:
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) self.run()
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) self._target(*self._args, **self._kwargs)
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 493, in run_engine_core [repeated 2x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) raise e
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) engine_core.run_busy_loop()
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 520, in run_busy_loop
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) self._process_engine_step()
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 545, in _process_engine_step
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) outputs = self.step_fn()
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ^^^^^^^^^^^^^^
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 226, in step
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) model_output = self.execute_model(scheduler_output)
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 158, in execute_model [repeated 3x across cluster]
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) raise err
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) return self.model_executor.execute_model(scheduler_output)
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) (output, ) = self.collective_rpc("execute_model",
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 215, in collective_rpc
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) result = get_response(w, dequeue_timeout)
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 202, in get_response
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) raise RuntimeError(
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) RuntimeError: Worker failed with error 'NoTritonConfigsError: No valid triton configs. PTXASError: PTXAS error: Internal Triton PTX codegen error
(MapWorker(MapBatches(vLLMEngineStageUDF)) pid=12390, ip=10.0.118.90) Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
kouroshHakha
approved these changes
Jun 16, 2025
elliot-barn
pushed a commit
that referenced
this pull request
Jun 18, 2025
Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
minerharry
pushed a commit
to minerharry/ray
that referenced
this pull request
Jun 27, 2025
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
elliot-barn
pushed a commit
that referenced
this pull request
Jul 2, 2025
Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
Failing due to #53824. Relatively rare configuration and and likely downstream of vLLM: vllm-project/vllm#18851.
Related issue number
Revisit once #53824, vllm-project/vllm#18851 is closed or PyTorch 2.8, when vLLM will no longer need to monkeypatch to access torch compile.
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.