Build vLLM on CPU only machine - e.g. VLLM_TARGET_DEVICE=cpu python3 setup.py bdist_wheel
Then run: vllm bench throughout and you'll hit this error:
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] EngineCore failed to start.
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] Traceback (most recent call last):
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] File "/home/fadara01/vllm-torch-update/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 926, in run_engine_core
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] File "/home/fadara01/vllm-torch-update/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 691, in __init__
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] super().__init__(
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] File "/home/fadara01/vllm-torch-update/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 105, in __init__
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] File "/home/fadara01/vllm-torch-update/venv/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 101, in __init__
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] self._init_executor()
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] File "/home/fadara01/vllm-torch-update/venv/lib/python3.10/site-packages/vllm/v1/executor/uniproc_executor.py", line 46, in _init_executor
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] self.driver_worker.init_worker(all_kwargs=[kwargs])
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] File "/home/fadara01/vllm-torch-update/venv/lib/python3.10/site-packages/vllm/v1/worker/worker_base.py", line 252, in init_worker
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] worker_class: type[WorkerBase] = resolve_obj_by_qualname(
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] File "/home/fadara01/vllm-torch-update/venv/lib/python3.10/site-packages/vllm/utils/import_utils.py", line 111, in resolve_obj_by_qualname
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] module = importlib.import_module(module_name)
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] return _bootstrap._gcd_import(name[level:], package, level)
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] File "<frozen importlib._bootstrap_external>", line 883, in exec_module
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] File "/home/fadara01/vllm-torch-update/venv/lib/python3.10/site-packages/vllm/v1/worker/cpu_worker.py", line 18, in <module>
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] from vllm.v1.worker.gpu_worker import Worker, init_worker_distributed_environment
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] File "/home/fadara01/vllm-torch-update/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 38, in <module>
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] from vllm.model_executor.warmup.kernel_warmup import kernel_warmup
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] File "/home/fadara01/vllm-torch-update/venv/lib/python3.10/site-packages/vllm/model_executor/warmup/kernel_warmup.py", line 15, in <module>
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] from vllm.model_executor.warmup.deep_gemm_warmup import deep_gemm_warmup
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] File "/home/fadara01/vllm-torch-update/venv/lib/python3.10/site-packages/vllm/model_executor/warmup/deep_gemm_warmup.py", line 22, in <module>
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] from vllm.model_executor.layers.quantization.fp8 import Fp8LinearMethod
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] File "/home/fadara01/vllm-torch-update/venv/lib/python3.10/site-packages/vllm/model_executor/layers/quantization/fp8.py", line 33, in <module>
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] from vllm.model_executor.layers.fused_moe.oracle.fp8 import (
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] File "/home/fadara01/vllm-torch-update/venv/lib/python3.10/site-packages/vllm/model_executor/layers/fused_moe/oracle/fp8.py", line 17, in <module>
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] from vllm.model_executor.layers.fused_moe.flashinfer_trtllm_moe import (
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] File "/home/fadara01/vllm-torch-update/venv/lib/python3.10/site-packages/vllm/model_executor/layers/fused_moe/flashinfer_trtllm_moe.py", line 195, in <module>
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] direct_register_custom_op(
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] File "/home/fadara01/vllm-torch-update/venv/lib/python3.10/site-packages/vllm/utils/torch_utils.py", line 753, in direct_register_custom_op
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] my_lib.define(op_name + schema_str, tags=tags)
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] File "/home/fadara01/vllm-torch-update/venv/lib/python3.10/site-packages/torch/library.py", line 172, in define
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] result = self.m.define(schema, alias_analysis, tuple(tags))
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] RuntimeError:
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] invalid numeric default value:
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] flashinfer_fused_moe_blockscale_fp8(Tensor routing_logits, Tensor routing_bias, Tensor x, Tensor w13_weight, Tensor w13_weight_scale_inv, Tensor w2_weight, Tensor w2_weight_scale_inv, SymInt global_num_experts, SymInt top_k, SymInt? num_expert_group, SymInt? topk_group, SymInt intermediate_size, SymInt expert_offset, SymInt local_num_experts, SymInt[] block_shape, SymInt routing_method_type=RoutingMethodType.DeepSeekV3, float? routed_scaling=1.0) -> Tensor
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935] ~ <--- HERE
(EngineCore_DP0 pid=29197) ERROR 01-22 09:14:05 [core.py:935]
Your current environment
The output of
python collect_env.py🐛 Describe the bug
Build vLLM on CPU only machine - e.g.
VLLM_TARGET_DEVICE=cpu python3 setup.py bdist_wheelThen run:
vllm bench throughoutand you'll hit this error:Before submitting a new issue...