Skip to content

[Bug] Fix TypeError when hf_config.architectures is None during model loading#38849

Merged
hmellor merged 8 commits into
vllm-project:mainfrom
TihoElek:fix/hf-config-architectures-none-crash
Apr 13, 2026
Merged

[Bug] Fix TypeError when hf_config.architectures is None during model loading#38849
hmellor merged 8 commits into
vllm-project:mainfrom
TihoElek:fix/hf-config-architectures-none-crash

Conversation

@TihoElek

@TihoElek TihoElek commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

Purpose

Fixes #38818

PretrainedConfig in Transformers defines architectures: list[str] | None = None
as a class-level attribute. Fine-tuned models saved without "architectures" in
config.json (or configs loaded programmatically) will have hf_config.architectures = None.

The existing code used getattr(hf_config, "architectures", []), which only falls
back to [] when the attribute is absent. Since architectures is always present
on the class (as None), the default never fires. tuple(None) then raises:
TypeError: 'NoneType' object is not iterable

The fix uses getattr(..., None) or [], which correctly normalises both the absent
and the explicitly-None cases to an empty list.

Test Plan

Ran

 # Reproduces the bug with the old pattern
 from types import SimpleNamespace
 hf = SimpleNamespace(architectures=None)
 tuple(getattr(hf, 'architectures', []))   # TypeError: 'NoneType' object is not iterable

## Test Result
 # Fixed
 tuple(getattr(hf, 'architectures', None) or [])  # ()

@TihoElek TihoElek requested a review from 22quinn as a code owner April 2, 2026 20:54
@mergify mergify Bot added intel-gpu Related to Intel GPU bug Something isn't working labels Apr 2, 2026
@TihoElek TihoElek force-pushed the fix/hf-config-architectures-none-crash branch from 2983aeb to ca394a9 Compare April 2, 2026 20:56

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the model architecture retrieval logic to handle None values by defaulting to an empty list. The review feedback recommends refactoring this repeated logic into a helper function or normalizing it within the ModelConfig class to improve code consistency and maintainability.

@@ -175,7 +175,7 @@ def device_loading_context(module: torch.nn.Module, target_device: torch.device)
def _get_model_architecture(model_config: ModelConfig) -> tuple[type[nn.Module], str]:
from vllm.model_executor.models.adapters import as_embedding_model, as_seq_cls_model

architectures = getattr(model_config.hf_config, "architectures", [])
architectures = getattr(model_config.hf_config, "architectures", None) or []

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The use of getattr(..., None) or [] is a robust way to handle potential None values for architectures. However, to improve readability and maintain consistency, consider using a helper function or a more explicit check if this pattern is repeated frequently across the codebase.

@@ -215,7 +215,7 @@ def get_model_architecture(model_config: ModelConfig) -> tuple[type[nn.Module],
model_config.runner_type,
model_config.trust_remote_code,
model_config.model_impl,
tuple(getattr(model_config.hf_config, "architectures", [])),
tuple(getattr(model_config.hf_config, "architectures", None) or []),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The repeated use of getattr(..., None) or [] in get_model_architecture suggests that model_config.hf_config.architectures should ideally be normalized once, perhaps within the ModelConfig class itself, to avoid redundant logic and potential errors.

@TihoElek TihoElek force-pushed the fix/hf-config-architectures-none-crash branch from ca394a9 to 9a84f51 Compare April 2, 2026 21:03
@TihoElek

TihoElek commented Apr 2, 2026

Copy link
Copy Markdown
Contributor Author

Hi, @22quinn is out sick — could another maintainer add the ready label to trigger CI? This is a one-line bug fix for #38818 (TypeError when hf_config.architectures is None on nested model configs). Thanks!

@jikunshang jikunshang added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 3, 2026
@jikunshang

Copy link
Copy Markdown
Member

added ready label. cc @hmellor PTAL.

Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>
@TihoElek TihoElek force-pushed the fix/hf-config-architectures-none-crash branch from 9a84f51 to f0498d4 Compare April 3, 2026 07:23
@thomasmaindron

thomasmaindron commented Apr 3, 2026

Copy link
Copy Markdown
Contributor

Hi @TihoElek, thank you for your quick response! However, I'm still encountering an error after applying the fix:

Error message ("No model architectures are specified") (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] EngineCore failed to start. (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] Traceback (most recent call last): (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1082, in run_engine_core (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] return func(*args, **kwargs) (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 848, in __init__ (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] super().__init__( (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 114, in __init__ (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] self.model_executor = executor_class(vllm_config) (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] return func(*args, **kwargs) (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 109, in __init__ (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] self._init_executor() (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 52, in _init_executor (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] self.driver_worker.load_model() (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 323, in load_model (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] self.model_runner.load_model(load_dummy_weights=load_dummy_weights) (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] return func(*args, **kwargs) (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4749, in load_model (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] self.model = model_loader.load_model( (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] return func(*args, **kwargs) (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 55, in load_model (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] model = initialize_model( (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^ (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] return func(*args, **kwargs) (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 57, in initialize_model (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] model = model_class(vllm_config=vllm_config, prefix=prefix) (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/mistral3.py", line 437, in __init__ (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] self.language_model = init_vllm_registered_model( (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 372, in init_vllm_registered_model (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] return initialize_model(vllm_config=vllm_config, prefix=prefix) (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] return func(*args, **kwargs) (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 47, in initialize_model (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] model_class, _ = get_model_architecture(model_config) (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 224, in get_model_architecture (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] model_arch = _get_model_architecture(model_config) (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 180, in _get_model_architecture (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] model_cls, arch = model_config.registry.resolve_model_cls( (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 1138, in resolve_model_cls (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] raise ValueError("No model architectures are specified") (EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ValueError: No model architectures are specified (EngineCore pid=154) Process EngineCore: (EngineCore pid=154) Traceback (most recent call last): (EngineCore pid=154) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap (EngineCore pid=154) self.run() (EngineCore pid=154) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run (EngineCore pid=154) self._target(*self._args, **self._kwargs) (EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1112, in run_engine_core (EngineCore pid=154) raise e (EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1082, in run_engine_core (EngineCore pid=154) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=154) return func(*args, **kwargs) (EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 848, in __init__ (EngineCore pid=154) super().__init__( (EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 114, in __init__ (EngineCore pid=154) self.model_executor = executor_class(vllm_config) (EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=154) return func(*args, **kwargs) (EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 109, in __init__ (EngineCore pid=154) self._init_executor() (EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 52, in _init_executor (EngineCore pid=154) self.driver_worker.load_model() (EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 323, in load_model (EngineCore pid=154) self.model_runner.load_model(load_dummy_weights=load_dummy_weights) (EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=154) return func(*args, **kwargs) (EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4749, in load_model (EngineCore pid=154) self.model = model_loader.load_model( (EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=154) return func(*args, **kwargs) (EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 55, in load_model (EngineCore pid=154) model = initialize_model( (EngineCore pid=154) ^^^^^^^^^^^^^^^^^ (EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=154) return func(*args, **kwargs) (EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 57, in initialize_model (EngineCore pid=154) model = model_class(vllm_config=vllm_config, prefix=prefix) (EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/mistral3.py", line 437, in __init__ (EngineCore pid=154) self.language_model = init_vllm_registered_model( (EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 372, in init_vllm_registered_model (EngineCore pid=154) return initialize_model(vllm_config=vllm_config, prefix=prefix) (EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=154) return func(*args, **kwargs) (EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 47, in initialize_model (EngineCore pid=154) model_class, _ = get_model_architecture(model_config) (EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 224, in get_model_architecture (EngineCore pid=154) model_arch = _get_model_architecture(model_config) (EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 180, in _get_model_architecture (EngineCore pid=154) model_cls, arch = model_config.registry.resolve_model_cls( (EngineCore pid=154) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=154) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 1138, in resolve_model_cls (EngineCore pid=154) raise ValueError("No model architectures are specified") (EngineCore pid=154) ValueError: No model architectures are specified [rank0]:[W403 08:34:39.355421752 ProcessGroupNCCL.cpp:1648] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) (APIServer pid=74) Traceback (most recent call last): (APIServer pid=74) File "/usr/local/bin/vllm", line 10, in (APIServer pid=74) sys.exit(main()) (APIServer pid=74) ^^^^^^ (APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 75, in main (APIServer pid=74) args.dispatch_function(args) (APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd (APIServer pid=74) uvloop.run(run_server(args)) (APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run (APIServer pid=74) return __asyncio.run( (APIServer pid=74) ^^^^^^^^^^^^^^ (APIServer pid=74) File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run (APIServer pid=74) return runner.run(main) (APIServer pid=74) ^^^^^^^^^^^^^^^^ (APIServer pid=74) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run (APIServer pid=74) return self._loop.run_until_complete(task) (APIServer pid=74) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=74) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper (APIServer pid=74) return await main (APIServer pid=74) ^^^^^^^^^^ (APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 684, in run_server (APIServer pid=74) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 698, in run_server_worker (APIServer pid=74) async with build_async_engine_client( (APIServer pid=74) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__ (APIServer pid=74) return await anext(self.gen) (APIServer pid=74) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client (APIServer pid=74) async with build_async_engine_client_from_engine_args( (APIServer pid=74) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__ (APIServer pid=74) return await anext(self.gen) (APIServer pid=74) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client_from_engine_args (APIServer pid=74) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=74) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config (APIServer pid=74) return cls( (APIServer pid=74) ^^^^ (APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 154, in __init__ (APIServer pid=74) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=74) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=74) return func(*args, **kwargs) (APIServer pid=74) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 129, in make_async_mp_client (APIServer pid=74) return AsyncMPClient(*client_args) (APIServer pid=74) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=74) return func(*args, **kwargs) (APIServer pid=74) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 872, in __init__ (APIServer pid=74) super().__init__( (APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 534, in __init__ (APIServer pid=74) with launch_core_engines( (APIServer pid=74) File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__ (APIServer pid=74) next(self.gen) (APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1073, in launch_core_engines (APIServer pid=74) wait_for_engine_startup( (APIServer pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1132, in wait_for_engine_startup (APIServer pid=74) raise RuntimeError( (APIServer pid=74) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Thanks again for your help!

@jikunshang jikunshang removed the intel-gpu Related to Intel GPU label Apr 3, 2026
@mergify mergify Bot added the intel-gpu Related to Intel GPU label Apr 3, 2026
Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>
@mergify mergify Bot added the multi-modality Related to multi-modality (#4194) label Apr 3, 2026
@DarkLight1337 DarkLight1337 requested a review from hmellor April 3, 2026 11:13
@TihoElek

TihoElek commented Apr 3, 2026

Copy link
Copy Markdown
Contributor Author

@thomasmaindron The error is now clearly propagated and uncovers a wider issue.
My understanding is:
Root config.json architecture is used to resolve the outer Mistral3ForConditionalGeneration. During nested LM initialization, vLLM switches from the root config to config.text_config, which does not carry the needed inner LM architecture. Without an explicit inner architecture, nested LM resolution reaches registry with no architectures and fails with No model architectures are specified. Fix by explicitly passing the inner LM architecture: Ministral3ForCausalLM.

Could you check now?

@thomasmaindron

Copy link
Copy Markdown
Contributor

@TihoElek After applying the changes in utils.py and mistral3.py, the error now occurs after trying to load the safetensors:

Error message ("No module or parameter named 'model.layers.0.mlp.down_proj.activation.scale' in TransformersMultiModalForCausalLM.") (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] EngineCore failed to start. (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] Traceback (most recent call last): (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1082, in run_engine_core (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] return func(*args, **kwargs) (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 848, in __init__ (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] super().__init__( (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 114, in __init__ (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] self.model_executor = executor_class(vllm_config) (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] return func(*args, **kwargs) (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 109, in __init__ (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] self._init_executor() (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 52, in _init_executor (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] self.driver_worker.load_model() (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 323, in load_model (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] self.model_runner.load_model(load_dummy_weights=load_dummy_weights) (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] return func(*args, **kwargs) (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4749, in load_model (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] self.model = model_loader.load_model( (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] return func(*args, **kwargs) (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 64, in load_model (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] self.load_weights(model, model_config) (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] return func(*args, **kwargs) (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 381, in load_weights (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] loaded_weights = model.load_weights(self.get_all_weights(model_config, model)) (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/mistral3.py", line 563, in load_weights (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper) (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] return original_load_weights(self, weights, *args, **kwargs) (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 348, in load_weights (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] autoloaded_weights = set(self._load_module("", self.module, weights)) (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] yield from self._load_module( (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 268, in _load_module (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] loaded_params = module_load_weights(weights) (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/transformers/base.py", line 609, in load_weights (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper) (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] return original_load_weights(self, weights, *args, **kwargs) (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 348, in load_weights (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] autoloaded_weights = set(self._load_module("", self.module, weights)) (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] yield from self._load_module( (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] yield from self._load_module( (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] yield from self._load_module( (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] [Previous line repeated 2 more times] (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 332, in _load_module (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] raise ValueError(msg) (EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ValueError: There is no module or parameter named 'model.layers.0.mlp.down_proj.activation_scale' in TransformersMultiModalForCausalLM. The available parameters belonging to model.layers.0.mlp.down_proj (RowParallelLinear) are: {'model.layers.0.mlp.down_proj.weight_scale', 'model.layers.0.mlp.down_proj.input_scale', 'model.layers.0.mlp.down_proj.weight'} (EngineCore pid=195) Process EngineCore: (EngineCore pid=195) Traceback (most recent call last): (EngineCore pid=195) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap (EngineCore pid=195) self.run() (EngineCore pid=195) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run (EngineCore pid=195) self._target(*self._args, **self._kwargs) (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1112, in run_engine_core (EngineCore pid=195) raise e (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1082, in run_engine_core (EngineCore pid=195) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=195) return func(*args, **kwargs) (EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 848, in __init__ (EngineCore pid=195) super().__init__( (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 114, in __init__ (EngineCore pid=195) self.model_executor = executor_class(vllm_config) (EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=195) return func(*args, **kwargs) (EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 109, in __init__ (EngineCore pid=195) self._init_executor() (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 52, in _init_executor (EngineCore pid=195) self.driver_worker.load_model() (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 323, in load_model (EngineCore pid=195) self.model_runner.load_model(load_dummy_weights=load_dummy_weights) (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=195) return func(*args, **kwargs) (EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4749, in load_model (EngineCore pid=195) self.model = model_loader.load_model( (EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=195) return func(*args, **kwargs) (EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 64, in load_model (EngineCore pid=195) self.load_weights(model, model_config) (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=195) return func(*args, **kwargs) (EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 381, in load_weights (EngineCore pid=195) loaded_weights = model.load_weights(self.get_all_weights(model_config, model)) (EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/mistral3.py", line 563, in load_weights (EngineCore pid=195) return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper) (EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights (EngineCore pid=195) return original_load_weights(self, weights, *args, **kwargs) (EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 348, in load_weights (EngineCore pid=195) autoloaded_weights = set(self._load_module("", self.module, weights)) (EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module (EngineCore pid=195) yield from self._load_module( (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 268, in _load_module (EngineCore pid=195) loaded_params = module_load_weights(weights) (EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/transformers/base.py", line 609, in load_weights (EngineCore pid=195) return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper) (EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights (EngineCore pid=195) return original_load_weights(self, weights, *args, **kwargs) (EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 348, in load_weights (EngineCore pid=195) autoloaded_weights = set(self._load_module("", self.module, weights)) (EngineCore pid=195) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module (EngineCore pid=195) yield from self._load_module( (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module (EngineCore pid=195) yield from self._load_module( (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module (EngineCore pid=195) yield from self._load_module( (EngineCore pid=195) [Previous line repeated 2 more times] (EngineCore pid=195) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 332, in _load_module (EngineCore pid=195) raise ValueError(msg) (EngineCore pid=195) ValueError: There is no module or parameter named 'model.layers.0.mlp.down_proj.activation_scale' in TransformersMultiModalForCausalLM. The available parameters belonging to model.layers.0.mlp.down_proj (RowParallelLinear) are: {'model.layers.0.mlp.down_proj.weight_scale', 'model.layers.0.mlp.down_proj.input_scale', 'model.layers.0.mlp.down_proj.weight'} Loading safetensors checkpoint shards: 0% Completed | 0/6 [00:01 (APIServer pid=72) sys.exit(main()) (APIServer pid=72) ^^^^^^ (APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 75, in main (APIServer pid=72) args.dispatch_function(args) (APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd (APIServer pid=72) uvloop.run(run_server(args)) (APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run (APIServer pid=72) return __asyncio.run( (APIServer pid=72) ^^^^^^^^^^^^^^ (APIServer pid=72) File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run (APIServer pid=72) return runner.run(main) (APIServer pid=72) ^^^^^^^^^^^^^^^^ (APIServer pid=72) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run (APIServer pid=72) return self._loop.run_until_complete(task) (APIServer pid=72) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=72) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper (APIServer pid=72) return await main (APIServer pid=72) ^^^^^^^^^^ (APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 684, in run_server (APIServer pid=72) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 698, in run_server_worker (APIServer pid=72) async with build_async_engine_client( (APIServer pid=72) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__ (APIServer pid=72) return await anext(self.gen) (APIServer pid=72) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client (APIServer pid=72) async with build_async_engine_client_from_engine_args( (APIServer pid=72) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__ (APIServer pid=72) return await anext(self.gen) (APIServer pid=72) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client_from_engine_args (APIServer pid=72) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=72) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config (APIServer pid=72) return cls( (APIServer pid=72) ^^^^ (APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 154, in __init__ (APIServer pid=72) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=72) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=72) return func(*args, **kwargs) (APIServer pid=72) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 129, in make_async_mp_client (APIServer pid=72) return AsyncMPClient(*client_args) (APIServer pid=72) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=72) return func(*args, **kwargs) (APIServer pid=72) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 872, in __init__ (APIServer pid=72) super().__init__( (APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 534, in __init__ (APIServer pid=72) with launch_core_engines( (APIServer pid=72) File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__ (APIServer pid=72) next(self.gen) (APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1073, in launch_core_engines (APIServer pid=72) wait_for_engine_startup( (APIServer pid=72) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1132, in wait_for_engine_startup (APIServer pid=72) raise RuntimeError( (APIServer pid=72) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

@thomasmaindron

thomasmaindron commented Apr 3, 2026

Copy link
Copy Markdown
Contributor

The multimodal architecture no longer seems to be the problem. Have Mistral models ever been able to run using the HF configuration? I've never tried.

@Gregory-Pereira

Gregory-Pereira commented Apr 5, 2026

Copy link
Copy Markdown
Contributor

Unless im missing something I was able to run this fine:

k logs pod/vllm-test-fix-38849 -f
=== [1/6] Installing git and pytest ===
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
=== [2/6] Cloning https://github.com/TihoElek/vllm.git @ fix/hf-config-architectures-none-crash ===
Cloning into '/tmp/vllm-fix'...
=== [3/6] Setting up upstream remote and syncing tags ===
HEAD: b38f61a98 Fix nested Devstral/Mistral3 LM architecture resolution
=== [4/6] Packing local wheel from container's .so files ===
Packing 96 files
Created /tmp/vllm-local-precompiled.whl (410 MB)
=== [5/6] Installing test dependencies ===
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
=== [6/6] Editable install ===
    Found existing installation: vllm 0.19.1rc1.dev29+g93726b2a1
    Uninstalling vllm-0.19.1rc1.dev29+g93726b2a1:
      Successfully uninstalled vllm-0.19.1rc1.dev29+g93726b2a1
Successfully installed vllm-0.19.1rc1.dev3+gb38f61a98.precompiled
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.

=== Ready ===
vLLM 0.19.1rc1.dev3+gb38f61a98 from /tmp/vllm-fix/vllm/__init__.py

=== Running tests: tests/models/multimodal/test_mistral3.py -v ===
============================= test session starts ==============================
platform linux -- Python 3.12.13, pytest-9.0.2, pluggy-1.6.0 -- /usr/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default'
rootdir: /tmp/vllm-fix
configfile: pyproject.toml
plugins: typeguard-4.5.1, hypothesis-6.151.11, timeout-2.4.0, shard-0.1.2, rerunfailures-16.1, mock-3.15.1, forked-1.6.0, asyncio-1.3.0, hydra-core-1.3.2, buildkite-test-collector-0.1.9, cov-7.1.0, schemathesis-4.15.0, anyio-4.13.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collecting ... collected 1 item
Running 1 items in this shard: tests/models/multimodal/test_mistral3.py::test_mistral3_passes_inner_lm_architecture

tests/models/multimodal/test_mistral3.py::test_mistral3_passes_inner_lm_architecture PASSED [100%]

=============================== warnings summary ===============================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

../../usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:362: 14 warnings
  /usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:362: DeprecationWarning: `torch.jit.script_method` is deprecated. Please switch to `torch.compile` or `torch.export`.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================== 1 passed, 16 warnings in 2.74s ========================

=== Tests PASSED ===
=== Starting vLLM server: mistralai/Devstral-Small-2-24B-Instruct-2512 ===
(APIServer pid=1) INFO 04-05 21:18:02 [utils.py:299]
(APIServer pid=1) INFO 04-05 21:18:02 [utils.py:299]        █     █     █▄   ▄█
(APIServer pid=1) INFO 04-05 21:18:02 [utils.py:299]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.19.1rc1.dev3+gb38f61a98
(APIServer pid=1) INFO 04-05 21:18:02 [utils.py:299]   █▄█▀ █     █     █     █  model   mistralai/Devstral-Small-2-24B-Instruct-2512
(APIServer pid=1) INFO 04-05 21:18:02 [utils.py:299]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀
(APIServer pid=1) INFO 04-05 21:18:02 [utils.py:299]
(APIServer pid=1) INFO 04-05 21:18:02 [utils.py:233] non-default args: {'model': 'mistralai/Devstral-Small-2-24B-Instruct-2512'}
(APIServer pid=1) WARNING 04-05 21:18:02 [envs.py:1783] Unknown vLLM environment variable detected: VLLM_TEST_UBATCH_SERVICE_HOST
(APIServer pid=1) WARNING 04-05 21:18:02 [envs.py:1783] Unknown vLLM environment variable detected: VLLM_FORK_URL
(APIServer pid=1) WARNING 04-05 21:18:02 [envs.py:1783] Unknown vLLM environment variable detected: VLLM_TEST_UBATCH_PORT_8000_TCP_ADDR
(APIServer pid=1) WARNING 04-05 21:18:02 [envs.py:1783] Unknown vLLM environment variable detected: VLLM_TEST_UBATCH_PORT
(APIServer pid=1) WARNING 04-05 21:18:02 [envs.py:1783] Unknown vLLM environment variable detected: VLLM_TEST_PATH
(APIServer pid=1) WARNING 04-05 21:18:02 [envs.py:1783] Unknown vLLM environment variable detected: VLLM_TEST_UBATCH_PORT_8000_TCP_PROTO
(APIServer pid=1) WARNING 04-05 21:18:02 [envs.py:1783] Unknown vLLM environment variable detected: VLLM_SERVE_MODEL
(APIServer pid=1) WARNING 04-05 21:18:02 [envs.py:1783] Unknown vLLM environment variable detected: VLLM_TEST_UBATCH_PORT_8000_TCP
(APIServer pid=1) WARNING 04-05 21:18:02 [envs.py:1783] Unknown vLLM environment variable detected: VLLM_SERVE_ARGS
(APIServer pid=1) WARNING 04-05 21:18:02 [envs.py:1783] Unknown vLLM environment variable detected: VLLM_TEST_UBATCH_PORT_8000_TCP_PORT
(APIServer pid=1) WARNING 04-05 21:18:02 [envs.py:1783] Unknown vLLM environment variable detected: VLLM_TEST_UBATCH_SERVICE_PORT
(APIServer pid=1) WARNING 04-05 21:18:02 [envs.py:1783] Unknown vLLM environment variable detected: VLLM_BRANCH
Parse safetensors files: 100%|██████████| 2/2 [00:00<00:00,  6.11it/s]
(APIServer pid=1) INFO 04-05 21:18:06 [config.py:289] Inferred from consolidated*.safetensors files torch.bfloat16 dtype.
(APIServer pid=1) INFO 04-05 21:18:14 [model.py:549] Resolved architecture: PixtralForConditionalGeneration
(APIServer pid=1) INFO 04-05 21:18:14 [model.py:1680] Using max model len 393216
(APIServer pid=1) INFO 04-05 21:18:15 [scheduler.py:238] Chunked prefill is enabled with max_num_batched_tokens=8192.
(APIServer pid=1) INFO 04-05 21:18:15 [vllm.py:799] Asynchronous scheduling is enabled.
(APIServer pid=1) INFO 04-05 21:18:15 [kernel.py:196] Final IR op priority after setting platform defaults: IrOpPriorityConfig(rms_norm=['native'])
(EngineCore pid=1792) INFO 04-05 21:18:25 [core.py:105] Initializing a V1 LLM engine (v0.19.1rc1.dev3+gb38f61a98) with config: model='mistralai/Devstral-Small-2-24B-Instruct-2512', speculative_config=None, tokenizer='mistralai/Devstral-Small-2-24B-Instruct-2512', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=393216, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, decode_context_parallel_size=1, dcp_comm_backend=ag_rs, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=mistralai/Devstral-Small-2-24B-Instruct-2512, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'ir_enable_torch_wrap': True, 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::olmo_hybrid_gdn_full_forward', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update', 'vllm::unified_mla_kv_cache_update'], 'compile_mm_encoder': False, 'cudagraph_mm_encoder': False, 'encoder_cudagraph_token_budgets': [], 'encoder_cudagraph_max_images_per_batch': 0, 'compile_sizes': [], 'compile_ranges_endpoints': [8192], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'size_asserts': False, 'alignment_asserts': False, 'scalar_asserts': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': True, 'static_all_moe_layers': []}, kernel_config=KernelConfig(ir_op_priority=IrOpPriorityConfig(rms_norm=['native']), enable_flashinfer_autotune=True, moe_backend='auto')
(EngineCore pid=1792) INFO 04-05 21:18:26 [parallel_state.py:1400] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.0.2.178:35411 backend=nccl
(EngineCore pid=1792) INFO 04-05 21:18:26 [parallel_state.py:1712] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A, EPLB rank N/A
(EngineCore pid=1792) INFO 04-05 21:18:27 [gpu_model_runner.py:4733] Starting to load model mistralai/Devstral-Small-2-24B-Instruct-2512...
(EngineCore pid=1792) INFO 04-05 21:18:27 [vllm.py:799] Asynchronous scheduling is enabled.
(EngineCore pid=1792) INFO 04-05 21:18:27 [kernel.py:196] Final IR op priority after setting platform defaults: IrOpPriorityConfig(rms_norm=['native'])
(EngineCore pid=1792) INFO 04-05 21:18:27 [__init__.py:261] Selected CutlassFP8ScaledMMLinearKernel for Fp8LinearMethod
(EngineCore pid=1792) INFO 04-05 21:18:27 [deep_gemm.py:115] DeepGEMM E8M0 enabled on current platform.
(EngineCore pid=1792) INFO 04-05 21:18:27 [cuda.py:362] Using FLASH_ATTN attention backend out of potential backends: ['FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION'].
(EngineCore pid=1792) INFO 04-05 21:18:27 [flash_attn.py:622] Using FlashAttention version 3
(EngineCore pid=1792) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
(EngineCore pid=1792) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
(EngineCore pid=1792) INFO 04-05 21:18:58 [weight_utils.py:583] Time spent downloading weights for mistralai/Devstral-Small-2-24B-Instruct-2512: 30.365908 seconds
Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  50% Completed | 1/2 [00:02<00:02,  2.32s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:03<00:00,  1.44s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:03<00:00,  1.57s/it]
(EngineCore pid=1792)
(EngineCore pid=1792) INFO 04-05 21:19:01 [default_loader.py:384] Loading weights took 3.19 seconds
(EngineCore pid=1792) INFO 04-05 21:19:02 [gpu_model_runner.py:4818] Model loading took 24.12 GiB memory and 34.811804 seconds
(EngineCore pid=1792) INFO 04-05 21:19:02 [gpu_model_runner.py:5758] Encoder cache will be initialized with a budget of 8192 tokens, and profiled with 2 image items of the maximum feature size.
(EngineCore pid=1792) WARNING 04-05 21:19:03 [op.py:236] Priority not set for op rms_norm, using native implementation.
(EngineCore pid=1792) INFO 04-05 21:19:12 [backends.py:1051] Using cache directory: /root/.cache/vllm/torch_compile_cache/f06ac4bc20/rank_0_0/backbone for vLLM's torch.compile
(EngineCore pid=1792) INFO 04-05 21:19:12 [backends.py:1111] Dynamo bytecode transform time: 7.42 s
(EngineCore pid=1792) INFO 04-05 21:19:16 [backends.py:372] Cache the graph of compile range (1, 8192) for later use
(EngineCore pid=1792) INFO 04-05 21:19:20 [backends.py:390] Compiling a graph for compile range (1, 8192) takes 7.22 s
(EngineCore pid=1792) INFO 04-05 21:19:23 [decorators.py:655] saved AOT compiled function to /root/.cache/vllm/torch_compile_cache/torch_aot_compile/aafd0a688359ce5b801ae7e55c557c62b559618980fe70c59e2e8e3d13f73ed7/rank_0_0/model
(EngineCore pid=1792) INFO 04-05 21:19:23 [monitor.py:48] torch.compile took 17.72 s in total
(EngineCore pid=1792) INFO 04-05 21:19:23 [monitor.py:76] Initial profiling/warmup run took 0.54 s
(EngineCore pid=1792) INFO 04-05 21:19:28 [kv_cache_utils.py:829] Overriding num_gpu_blocks=0 with num_gpu_blocks_override=512
(EngineCore pid=1792) INFO 04-05 21:19:28 [gpu_model_runner.py:5881] Profiling CUDA graph memory: PIECEWISE=51 (largest=512), FULL=51 (largest=512)
(EngineCore pid=1792) INFO 04-05 21:19:30 [gpu_model_runner.py:5960] Estimated CUDA graph memory: 0.50 GiB total
(EngineCore pid=1792) INFO 04-05 21:19:30 [gpu_worker.py:436] Available KV cache memory: 99.44 GiB
(EngineCore pid=1792) INFO 04-05 21:19:30 [gpu_worker.py:470] In v0.19, CUDA graph memory profiling will be enabled by default (VLLM_MEMORY_PROFILER_ESTIMATE_CUDAGRAPHS=1), which more accurately accounts for CUDA graph memory during KV cache allocation. To try it now, set VLLM_MEMORY_PROFILER_ESTIMATE_CUDAGRAPHS=1 and increase --gpu-memory-utilization from 0.9000 to 0.9035 to maintain the same effective KV cache size.
(EngineCore pid=1792) INFO 04-05 21:19:30 [kv_cache_utils.py:1319] GPU KV cache size: 651,664 tokens
(EngineCore pid=1792) INFO 04-05 21:19:30 [kv_cache_utils.py:1324] Maximum concurrency for 393,216 tokens per request: 1.66x
(EngineCore pid=1792) 2026-04-05 21:19:30,463 - INFO - autotuner.py:446 - flashinfer.jit: [Autotuner]: Autotuning process starts ...
(EngineCore pid=1792) 2026-04-05 21:19:30,472 - INFO - autotuner.py:455 - flashinfer.jit: [Autotuner]: Autotuning process ends
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%|██████████| 51/51 [00:01<00:00, 31.75it/s]
Capturing CUDA graphs (decode, FULL): 100%|██████████| 51/51 [00:01<00:00, 35.96it/s]
(EngineCore pid=1792) INFO 04-05 21:19:34 [gpu_model_runner.py:6051] Graph capturing finished in 4 secs, took 0.69 GiB
(EngineCore pid=1792) INFO 04-05 21:19:34 [gpu_worker.py:597] CUDA graph pool memory: 0.69 GiB (actual), 0.5 GiB (estimated), difference: 0.2 GiB (28.2%).
(EngineCore pid=1792) INFO 04-05 21:19:34 [core.py:283] init engine (profile, create kv cache, warmup model) took 31.66 seconds
(EngineCore pid=1792) INFO 04-05 21:19:34 [kernel.py:196] Final IR op priority after setting platform defaults: IrOpPriorityConfig(rms_norm=['native'])
(APIServer pid=1) INFO 04-05 21:19:34 [api_server.py:604] Supported tasks: ['generate']
(APIServer pid=1) WARNING 04-05 21:19:35 [model.py:1437] Default vLLM sampling parameters have been overridden by the model's `generation_config.json`: `{'temperature': 0.15}`. If this is not intended, please relaunch vLLM instance with `--generation-config vllm`.
(APIServer pid=1) INFO 04-05 21:19:36 [base.py:245] Multi-modal warmup completed in 0.152s
(APIServer pid=1) INFO 04-05 21:19:37 [api_server.py:608] Starting vLLM server on http://0.0.0.0:8000
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:37] Available routes are:
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /openapi.json, Methods: GET, HEAD
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /docs, Methods: GET, HEAD
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /docs/oauth2-redirect, Methods: GET, HEAD
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /redoc, Methods: GET, HEAD
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /tokenize, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /detokenize, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /load, Methods: GET
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /version, Methods: GET
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /health, Methods: GET
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /metrics, Methods: GET
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /v1/models, Methods: GET
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /ping, Methods: GET
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /ping, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /invocations, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /v1/chat/completions, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /v1/chat/completions/batch, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /v1/responses, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /v1/responses/{response_id}, Methods: GET
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /v1/responses/{response_id}/cancel, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /v1/completions, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /v1/messages, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /v1/messages/count_tokens, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /inference/v1/generate, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /v1/chat/completions/render, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /v1/completions/render, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /generative_scoring, Methods: POST
(APIServer pid=1) INFO:     Started server process [1]
(APIServer pid=1) INFO:     Waiting for application startup.
(APIServer pid=1) INFO:     Application startup complete.
(APIServer pid=1) INFO:     127.0.0.1:57834 - "GET /v1/models HTTP/1.1" 200 OK

What the above shows is a using latest nightly base image, wrapping the precompiled bits in a wheel then cloning and installing his code (as you can see from the commit sha list)

@thomasmaindron The error you're seeing now (No module or parameter named 'model.layers.0.mlp.down_proj.activation_scale') is a separate issue from the original TypeError — the architecture/config fix in this PR is working correctly.

A few things to note:

  1. This PR fixes what it claims to fix. We verified it end-to-end by serving the upstream mistralai/Devstral-Small-2-24B-Instruct-2512 model (HF format) on an H100. It starts and serves successfully — the original TypeError: 'NoneType' object is not iterable and the follow-up No model architectures are specified errors are both resolved.
  2. Your new error is a weight-loading issue specific to your fine-tuned model. The activation_scale parameter in your safetensors doesn't match what vLLM expects (activation.scale vs activation_scale). Notice the error is happening in TransformersMultiModalForCausalLM, not Mistral3ForConditionalGeneration — this suggests your fine-tuned model may have been saved with a different quantization format (likely from unsloth's FP8 export).
  3. Based on the above yes, Mistral3 models work with HF format config in vLLM. The issue you're hitting now is specific to the weight format of your fine-tuned checkpoint, not the HF config format itself.

@thomasmaindron

thomasmaindron commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

@Gregory-Pereira What I'm trying to load here is Devstral-Small-2-24B-Instruct-2512, not my fine-tuned model. But I guess I'll just continue to look for a solution on my original issue (which this pull request is based on). Thanks for your help anyway!

@TihoElek

TihoElek commented Apr 7, 2026

Copy link
Copy Markdown
Contributor Author

@Gregory-Pereira thank you to for the e2e test and the logs.
@patrickvonplaten, @DarkLight1337 (or other maintainers), what are the preferred next steps here? Is this fix sufficient for this issue?

@DarkLight1337

Copy link
Copy Markdown
Member

@patrickvonplaten could you clarify whether we are supposed to be able to load Devstral Small 2 in HF format?

@thomasmaindron

thomasmaindron commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

@hmellor I actually just implemented this in #39293 while you were posting happy to move it here or keep it there, whichever you prefer!

@TihoElek

TihoElek commented Apr 9, 2026

Copy link
Copy Markdown
Contributor Author

@thomasmaindron @hmellor updated. Feel free to review this PR for this specific issue.

@hmellor hmellor left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this looks good now, just a small nit so that transformers is only imported if absolutely necessary

Comment thread vllm/config/vllm.py Outdated
import torch
from packaging.version import Version
from pydantic import ConfigDict, Field, model_validator
from transformers.models.auto.modeling_auto import MODEL_FOR_CAUSAL_LM_MAPPING_NAMES

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hmellor

hmellor commented Apr 9, 2026

Copy link
Copy Markdown
Member

Let's merge this one (almost) as is, then @thomasmaindron we can merge the extra changes (FP8 scales for example) you added in your PR

Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>
@TihoElek TihoElek requested a review from hmellor April 9, 2026 10:07
@thomasmaindron

thomasmaindron commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

@hmellor Got it! I'll remove my implementation when I can.

@hmellor hmellor merged commit 8d825b8 into vllm-project:main Apr 13, 2026
57 of 58 checks passed
wojciech-wais pushed a commit to wojciech-wais/vllm that referenced this pull request Apr 13, 2026
…l loading (vllm-project#38849)

Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>
whk-lab pushed a commit to whk-lab/vllm that referenced this pull request Apr 23, 2026
…l loading (vllm-project#38849)

Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>
avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Apr 27, 2026
…l loading (vllm-project#38849)

Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>
Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026
…l loading (vllm-project#38849)

Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
…l loading (vllm-project#38849)

Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
…l loading (vllm-project#38849)

Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>
jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026
…l loading (vllm-project#38849)

Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
…l loading (vllm-project#38849)

Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working intel-gpu Related to Intel GPU multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Error when running Devstral Small 2 with HF format

7 participants