[Bug] Fix TypeError when hf_config.architectures is None during model loading#38849
Conversation
2983aeb to
ca394a9
Compare
There was a problem hiding this comment.
Code Review
This pull request updates the model architecture retrieval logic to handle None values by defaulting to an empty list. The review feedback recommends refactoring this repeated logic into a helper function or normalizing it within the ModelConfig class to improve code consistency and maintainability.
| @@ -175,7 +175,7 @@ def device_loading_context(module: torch.nn.Module, target_device: torch.device) | |||
| def _get_model_architecture(model_config: ModelConfig) -> tuple[type[nn.Module], str]: | |||
| from vllm.model_executor.models.adapters import as_embedding_model, as_seq_cls_model | |||
|
|
|||
| architectures = getattr(model_config.hf_config, "architectures", []) | |||
| architectures = getattr(model_config.hf_config, "architectures", None) or [] | |||
There was a problem hiding this comment.
| @@ -215,7 +215,7 @@ def get_model_architecture(model_config: ModelConfig) -> tuple[type[nn.Module], | |||
| model_config.runner_type, | |||
| model_config.trust_remote_code, | |||
| model_config.model_impl, | |||
| tuple(getattr(model_config.hf_config, "architectures", [])), | |||
| tuple(getattr(model_config.hf_config, "architectures", None) or []), | |||
ca394a9 to
9a84f51
Compare
|
added ready label. cc @hmellor PTAL. |
Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>
9a84f51 to
f0498d4
Compare
|
Hi @TihoElek, thank you for your quick response! However, I'm still encountering an error after applying the fix: Error message ("No model architectures are specified")(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] EngineCore failed to start.
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] Traceback (most recent call last):
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1082, in run_engine_core
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] return func(*args, **kwargs)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 848, in __init__
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] super().__init__(
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 114, in __init__
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] self.model_executor = executor_class(vllm_config)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] return func(*args, **kwargs)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 109, in __init__
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] self._init_executor()
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 52, in _init_executor
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] self.driver_worker.load_model()
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 323, in load_model
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] return func(*args, **kwargs)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4749, in load_model
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] self.model = model_loader.load_model(
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] return func(*args, **kwargs)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 55, in load_model
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] model = initialize_model(
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] return func(*args, **kwargs)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 57, in initialize_model
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] model = model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/mistral3.py", line 437, in __init__
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] self.language_model = init_vllm_registered_model(
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 372, in init_vllm_registered_model
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] return initialize_model(vllm_config=vllm_config, prefix=prefix)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] return func(*args, **kwargs)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 47, in initialize_model
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] model_class, _ = get_model_architecture(model_config)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 224, in get_model_architecture
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] model_arch = _get_model_architecture(model_config)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 180, in _get_model_architecture
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] model_cls, arch = model_config.registry.resolve_model_cls(
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 1138, in resolve_model_cls
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] raise ValueError("No model architectures are specified")
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ValueError: No model architectures are specified
(EngineCore pid=154) Process EngineCore:
(EngineCore pid=154) Traceback (most recent call last):
(EngineCore pid=154) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore pid=154) self.run()
(EngineCore pid=154) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore pid=154) self._target(*self._args, **self._kwargs)
(EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1112, in run_engine_core
(EngineCore pid=154) raise e
(EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1082, in run_engine_core
(EngineCore pid=154) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=154) return func(*args, **kwargs)
(EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 848, in __init__
(EngineCore pid=154) super().__init__(
(EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 114, in __init__
(EngineCore pid=154) self.model_executor = executor_class(vllm_config)
(EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=154) return func(*args, **kwargs)
(EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 109, in __init__
(EngineCore pid=154) self._init_executor()
(EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 52, in _init_executor
(EngineCore pid=154) self.driver_worker.load_model()
(EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 323, in load_model
(EngineCore pid=154) self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
(EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=154) return func(*args, **kwargs)
(EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4749, in load_model
(EngineCore pid=154) self.model = model_loader.load_model(
(EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=154) return func(*args, **kwargs)
(EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 55, in load_model
(EngineCore pid=154) model = initialize_model(
(EngineCore pid=154) ^^^^^^^^^^^^^^^^^
(EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=154) return func(*args, **kwargs)
(EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 57, in initialize_model
(EngineCore pid=154) model = model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/mistral3.py", line 437, in __init__
(EngineCore pid=154) self.language_model = init_vllm_registered_model(
(EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 372, in init_vllm_registered_model
(EngineCore pid=154) return initialize_model(vllm_config=vllm_config, prefix=prefix)
(EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=154) return func(*args, **kwargs)
(EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 47, in initialize_model
(EngineCore pid=154) model_class, _ = get_model_architecture(model_config)
(EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 224, in get_model_architecture
(EngineCore pid=154) model_arch = _get_model_architecture(model_config)
(EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 180, in _get_model_architecture
(EngineCore pid=154) model_cls, arch = model_config.registry.resolve_model_cls(
(EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 1138, in resolve_model_cls
(EngineCore pid=154) raise ValueError("No model architectures are specified")
(EngineCore pid=154) ValueError: No model architectures are specified
[rank0]:[W403 08:34:39.355421752 ProcessGroupNCCL.cpp:1648] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=74) Traceback (most recent call last):
(APIServer pid=74) File "/usr/local/bin/vllm", line 10, in
(APIServer pid=74) sys.exit(main())
(APIServer pid=74) ^^^^^^
(APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 75, in main
(APIServer pid=74) args.dispatch_function(args)
(APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd
(APIServer pid=74) uvloop.run(run_server(args))
(APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=74) return __asyncio.run(
(APIServer pid=74) ^^^^^^^^^^^^^^
(APIServer pid=74) File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
(APIServer pid=74) return runner.run(main)
(APIServer pid=74) ^^^^^^^^^^^^^^^^
(APIServer pid=74) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=74) return self._loop.run_until_complete(task)
(APIServer pid=74) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=74) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=74) return await main
(APIServer pid=74) ^^^^^^^^^^
(APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 684, in run_server
(APIServer pid=74) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 698, in run_server_worker
(APIServer pid=74) async with build_async_engine_client(
(APIServer pid=74) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=74) return await anext(self.gen)
(APIServer pid=74) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client
(APIServer pid=74) async with build_async_engine_client_from_engine_args(
(APIServer pid=74) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=74) return await anext(self.gen)
(APIServer pid=74) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client_from_engine_args
(APIServer pid=74) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=74) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config
(APIServer pid=74) return cls(
(APIServer pid=74) ^^^^
(APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 154, in __init__
(APIServer pid=74) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=74) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=74) return func(*args, **kwargs)
(APIServer pid=74) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 129, in make_async_mp_client
(APIServer pid=74) return AsyncMPClient(*client_args)
(APIServer pid=74) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=74) return func(*args, **kwargs)
(APIServer pid=74) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 872, in __init__
(APIServer pid=74) super().__init__(
(APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 534, in __init__
(APIServer pid=74) with launch_core_engines(
(APIServer pid=74) File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=74) next(self.gen)
(APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1073, in launch_core_engines
(APIServer pid=74) wait_for_engine_startup(
(APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1132, in wait_for_engine_startup
(APIServer pid=74) raise RuntimeError(
(APIServer pid=74) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}Thanks again for your help! |
Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>
|
@thomasmaindron The error is now clearly propagated and uncovers a wider issue. Could you check now? |
|
@TihoElek After applying the changes in Error message ("No module or parameter named 'model.layers.0.mlp.down_proj.activation.scale' in TransformersMultiModalForCausalLM.")(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] EngineCore failed to start.
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] Traceback (most recent call last):
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1082, in run_engine_core
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] return func(*args, **kwargs)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 848, in __init__
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] super().__init__(
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 114, in __init__
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] self.model_executor = executor_class(vllm_config)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] return func(*args, **kwargs)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 109, in __init__
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] self._init_executor()
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 52, in _init_executor
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] self.driver_worker.load_model()
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 323, in load_model
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] return func(*args, **kwargs)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4749, in load_model
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] self.model = model_loader.load_model(
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] return func(*args, **kwargs)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 64, in load_model
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] self.load_weights(model, model_config)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] return func(*args, **kwargs)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 381, in load_weights
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/mistral3.py", line 563, in load_weights
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] return original_load_weights(self, weights, *args, **kwargs)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 348, in load_weights
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] yield from self._load_module(
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 268, in _load_module
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] loaded_params = module_load_weights(weights)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/transformers/base.py", line 609, in load_weights
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] return original_load_weights(self, weights, *args, **kwargs)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 348, in load_weights
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] yield from self._load_module(
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] yield from self._load_module(
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] yield from self._load_module(
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] [Previous line repeated 2 more times]
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 332, in _load_module
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] raise ValueError(msg)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ValueError: There is no module or parameter named 'model.layers.0.mlp.down_proj.activation_scale' in TransformersMultiModalForCausalLM. The available parameters belonging to model.layers.0.mlp.down_proj (RowParallelLinear) are: {'model.layers.0.mlp.down_proj.weight_scale', 'model.layers.0.mlp.down_proj.input_scale', 'model.layers.0.mlp.down_proj.weight'}
(EngineCore pid=195) Process EngineCore:
(EngineCore pid=195) Traceback (most recent call last):
(EngineCore pid=195) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore pid=195) self.run()
(EngineCore pid=195) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore pid=195) self._target(*self._args, **self._kwargs)
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1112, in run_engine_core
(EngineCore pid=195) raise e
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1082, in run_engine_core
(EngineCore pid=195) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=195) return func(*args, **kwargs)
(EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 848, in __init__
(EngineCore pid=195) super().__init__(
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 114, in __init__
(EngineCore pid=195) self.model_executor = executor_class(vllm_config)
(EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=195) return func(*args, **kwargs)
(EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 109, in __init__
(EngineCore pid=195) self._init_executor()
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 52, in _init_executor
(EngineCore pid=195) self.driver_worker.load_model()
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 323, in load_model
(EngineCore pid=195) self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=195) return func(*args, **kwargs)
(EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4749, in load_model
(EngineCore pid=195) self.model = model_loader.load_model(
(EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=195) return func(*args, **kwargs)
(EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 64, in load_model
(EngineCore pid=195) self.load_weights(model, model_config)
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=195) return func(*args, **kwargs)
(EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 381, in load_weights
(EngineCore pid=195) loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
(EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/mistral3.py", line 563, in load_weights
(EngineCore pid=195) return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
(EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
(EngineCore pid=195) return original_load_weights(self, weights, *args, **kwargs)
(EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 348, in load_weights
(EngineCore pid=195) autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(EngineCore pid=195) yield from self._load_module(
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 268, in _load_module
(EngineCore pid=195) loaded_params = module_load_weights(weights)
(EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/transformers/base.py", line 609, in load_weights
(EngineCore pid=195) return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
(EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
(EngineCore pid=195) return original_load_weights(self, weights, *args, **kwargs)
(EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 348, in load_weights
(EngineCore pid=195) autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(EngineCore pid=195) yield from self._load_module(
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(EngineCore pid=195) yield from self._load_module(
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(EngineCore pid=195) yield from self._load_module(
(EngineCore pid=195) [Previous line repeated 2 more times]
(EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 332, in _load_module
(EngineCore pid=195) raise ValueError(msg)
(EngineCore pid=195) ValueError: There is no module or parameter named 'model.layers.0.mlp.down_proj.activation_scale' in TransformersMultiModalForCausalLM. The available parameters belonging to model.layers.0.mlp.down_proj (RowParallelLinear) are: {'model.layers.0.mlp.down_proj.weight_scale', 'model.layers.0.mlp.down_proj.input_scale', 'model.layers.0.mlp.down_proj.weight'}
Loading safetensors checkpoint shards: 0% Completed | 0/6 [00:01
(APIServer pid=72) sys.exit(main())
(APIServer pid=72) ^^^^^^
(APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 75, in main
(APIServer pid=72) args.dispatch_function(args)
(APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd
(APIServer pid=72) uvloop.run(run_server(args))
(APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=72) return __asyncio.run(
(APIServer pid=72) ^^^^^^^^^^^^^^
(APIServer pid=72) File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
(APIServer pid=72) return runner.run(main)
(APIServer pid=72) ^^^^^^^^^^^^^^^^
(APIServer pid=72) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=72) return self._loop.run_until_complete(task)
(APIServer pid=72) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=72) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=72) return await main
(APIServer pid=72) ^^^^^^^^^^
(APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 684, in run_server
(APIServer pid=72) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 698, in run_server_worker
(APIServer pid=72) async with build_async_engine_client(
(APIServer pid=72) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=72) return await anext(self.gen)
(APIServer pid=72) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client
(APIServer pid=72) async with build_async_engine_client_from_engine_args(
(APIServer pid=72) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=72) return await anext(self.gen)
(APIServer pid=72) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client_from_engine_args
(APIServer pid=72) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=72) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config
(APIServer pid=72) return cls(
(APIServer pid=72) ^^^^
(APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 154, in __init__
(APIServer pid=72) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=72) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=72) return func(*args, **kwargs)
(APIServer pid=72) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 129, in make_async_mp_client
(APIServer pid=72) return AsyncMPClient(*client_args)
(APIServer pid=72) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=72) return func(*args, **kwargs)
(APIServer pid=72) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 872, in __init__
(APIServer pid=72) super().__init__(
(APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 534, in __init__
(APIServer pid=72) with launch_core_engines(
(APIServer pid=72) File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=72) next(self.gen)
(APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1073, in launch_core_engines
(APIServer pid=72) wait_for_engine_startup(
(APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1132, in wait_for_engine_startup
(APIServer pid=72) raise RuntimeError(
(APIServer pid=72) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} |
|
The multimodal architecture no longer seems to be the problem. Have Mistral models ever been able to run using the HF configuration? I've never tried. |
|
Unless im missing something I was able to run this fine: What the above shows is a using latest nightly base image, wrapping the precompiled bits in a wheel then cloning and installing his code (as you can see from the commit sha list) @thomasmaindron The error you're seeing now (No module or parameter named 'model.layers.0.mlp.down_proj.activation_scale') is a separate issue from the original TypeError — the architecture/config fix in this PR is working correctly. A few things to note:
|
|
@Gregory-Pereira What I'm trying to load here is Devstral-Small-2-24B-Instruct-2512, not my fine-tuned model. But I guess I'll just continue to look for a solution on my original issue (which this pull request is based on). Thanks for your help anyway! |
|
@Gregory-Pereira thank you to for the e2e test and the logs. |
|
@patrickvonplaten could you clarify whether we are supposed to be able to load Devstral Small 2 in HF format? |
|
@thomasmaindron @hmellor updated. Feel free to review this PR for this specific issue. |
hmellor
left a comment
There was a problem hiding this comment.
Thanks this looks good now, just a small nit so that transformers is only imported if absolutely necessary
| import torch | ||
| from packaging.version import Version | ||
| from pydantic import ConfigDict, Field, model_validator | ||
| from transformers.models.auto.modeling_auto import MODEL_FOR_CAUSAL_LM_MAPPING_NAMES |
There was a problem hiding this comment.
Could you delay the import as is done in https://github.com/vllm-project/vllm/pull/39293/changes#diff-bee6813076031d3ca1edc903c1b02b81e4676519afc562ce3fefe37f20c7b650
|
Let's merge this one (almost) as is, then @thomasmaindron we can merge the extra changes (FP8 scales for example) you added in your PR |
Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>
|
@hmellor Got it! I'll remove my implementation when I can. |
…l loading (vllm-project#38849) Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>
…l loading (vllm-project#38849) Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>
…l loading (vllm-project#38849) Signed-off-by: Tihomir Elek <tiho.elek@gmail.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
…l loading (vllm-project#38849) Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>
…l loading (vllm-project#38849) Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>
…l loading (vllm-project#38849) Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>
…l loading (vllm-project#38849) Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>
…l loading (vllm-project#38849) Signed-off-by: Tihomir Elek <tiho.elek@gmail.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Purpose
Fixes #38818
PretrainedConfigin Transformers definesarchitectures: list[str] | None = Noneas a class-level attribute. Fine-tuned models saved without
"architectures"inconfig.json(or configs loaded programmatically) will havehf_config.architectures = None.The existing code used
getattr(hf_config, "architectures", []), which only fallsback to
[]when the attribute is absent. Sincearchitecturesis always presenton the class (as
None), the default never fires.tuple(None)then raises:TypeError: 'NoneType' object is not iterableThe fix uses
getattr(..., None) or [], which correctly normalises both the absentand the explicitly-
Nonecases to an empty list.Test Plan
Ran