[Bug] Fix TypeError when hf_config.architectures is None during model loading by TihoElek · Pull Request #38849 · vllm-project/vllm

TihoElek · 2026-04-02T20:54:10Z

Purpose

PretrainedConfig in Transformers defines architectures: list[str] | None = None
as a class-level attribute. Fine-tuned models saved without "architectures" in
config.json (or configs loaded programmatically) will have hf_config.architectures = None.

The existing code used getattr(hf_config, "architectures", []), which only falls
back to [] when the attribute is absent. Since architectures is always present
on the class (as None), the default never fires. tuple(None) then raises:
TypeError: 'NoneType' object is not iterable

The fix uses getattr(..., None) or [], which correctly normalises both the absent
and the explicitly-None cases to an empty list.

Test Plan

Ran

 # Reproduces the bug with the old pattern
 from types import SimpleNamespace
 hf = SimpleNamespace(architectures=None)
 tuple(getattr(hf, 'architectures', []))   # TypeError: 'NoneType' object is not iterable

## Test Result
 # Fixed
 tuple(getattr(hf, 'architectures', None) or [])  # ()

gemini-code-assist

Code Review

This pull request updates the model architecture retrieval logic to handle None values by defaulting to an empty list. The review feedback recommends refactoring this repeated logic into a helper function or normalizing it within the ModelConfig class to improve code consistency and maintainability.

gemini-code-assist · 2026-04-02T20:56:55Z

@@ -175,7 +175,7 @@ def device_loading_context(module: torch.nn.Module, target_device: torch.device)
 def _get_model_architecture(model_config: ModelConfig) -> tuple[type[nn.Module], str]:
    from vllm.model_executor.models.adapters import as_embedding_model, as_seq_cls_model

-    architectures = getattr(model_config.hf_config, "architectures", [])
+    architectures = getattr(model_config.hf_config, "architectures", None) or []


The use of getattr(..., None) or [] is a robust way to handle potential None values for architectures. However, to improve readability and maintain consistency, consider using a helper function or a more explicit check if this pattern is repeated frequently across the codebase.

gemini-code-assist · 2026-04-02T20:56:55Z

@@ -215,7 +215,7 @@ def get_model_architecture(model_config: ModelConfig) -> tuple[type[nn.Module],
            model_config.runner_type,
            model_config.trust_remote_code,
            model_config.model_impl,
-            tuple(getattr(model_config.hf_config, "architectures", [])),
+            tuple(getattr(model_config.hf_config, "architectures", None) or []),


The repeated use of getattr(..., None) or [] in get_model_architecture suggests that model_config.hf_config.architectures should ideally be normalized once, perhaps within the ModelConfig class itself, to avoid redundant logic and potential errors.

TihoElek · 2026-04-02T21:10:01Z

Hi, @22quinn is out sick — could another maintainer add the ready label to trigger CI? This is a one-line bug fix for #38818 (TypeError when hf_config.architectures is None on nested model configs). Thanks!

jikunshang · 2026-04-03T00:55:51Z

added ready label. cc @hmellor PTAL.

Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>

thomasmaindron · 2026-04-03T08:42:28Z

Hi @TihoElek, thank you for your quick response! However, I'm still encountering an error after applying the fix:

Error message ("No model architectures are specified")

(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] EngineCore failed to start.
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] Traceback (most recent call last):
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1082, in run_engine_core
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 848, in __init__
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]     super().__init__(
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 114, in __init__
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]     self.model_executor = executor_class(vllm_config)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 109, in __init__
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]     self._init_executor()
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 52, in _init_executor
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]     self.driver_worker.load_model()
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 323, in load_model
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]     self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4749, in load_model
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]     self.model = model_loader.load_model(
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]                  ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 55, in load_model
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]     model = initialize_model(
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]             ^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 57, in initialize_model
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]     model = model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/mistral3.py", line 437, in __init__
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]     self.language_model = init_vllm_registered_model(
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 372, in init_vllm_registered_model
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]     return initialize_model(vllm_config=vllm_config, prefix=prefix)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 47, in initialize_model
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]     model_class, _ = get_model_architecture(model_config)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 224, in get_model_architecture
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]     model_arch = _get_model_architecture(model_config)
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 180, in _get_model_architecture
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]     model_cls, arch = model_config.registry.resolve_model_cls(
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 1138, in resolve_model_cls
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108]     raise ValueError("No model architectures are specified")
(EngineCore pid=154) ERROR 04-03 08:34:39 [core.py:1108] ValueError: No model architectures are specified
(EngineCore pid=154) Process EngineCore:
(EngineCore pid=154) Traceback (most recent call last):
(EngineCore pid=154)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore pid=154)     self.run()
(EngineCore pid=154)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore pid=154)     self._target(*self._args, **self._kwargs)
(EngineCore pid=154)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1112, in run_engine_core
(EngineCore pid=154)     raise e
(EngineCore pid=154)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1082, in run_engine_core
(EngineCore pid=154)     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=154)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=154)     return func(*args, **kwargs)
(EngineCore pid=154)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 848, in __init__
(EngineCore pid=154)     super().__init__(
(EngineCore pid=154)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 114, in __init__
(EngineCore pid=154)     self.model_executor = executor_class(vllm_config)
(EngineCore pid=154)                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=154)     return func(*args, **kwargs)
(EngineCore pid=154)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 109, in __init__
(EngineCore pid=154)     self._init_executor()
(EngineCore pid=154)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 52, in _init_executor
(EngineCore pid=154)     self.driver_worker.load_model()
(EngineCore pid=154)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 323, in load_model
(EngineCore pid=154)     self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
(EngineCore pid=154)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=154)     return func(*args, **kwargs)
(EngineCore pid=154)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4749, in load_model
(EngineCore pid=154)     self.model = model_loader.load_model(
(EngineCore pid=154)                  ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=154)     return func(*args, **kwargs)
(EngineCore pid=154)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 55, in load_model
(EngineCore pid=154)     model = initialize_model(
(EngineCore pid=154)             ^^^^^^^^^^^^^^^^^
(EngineCore pid=154)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=154)     return func(*args, **kwargs)
(EngineCore pid=154)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 57, in initialize_model
(EngineCore pid=154)     model = model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore pid=154)             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/mistral3.py", line 437, in __init__
(EngineCore pid=154)     self.language_model = init_vllm_registered_model(
(EngineCore pid=154)                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 372, in init_vllm_registered_model
(EngineCore pid=154)     return initialize_model(vllm_config=vllm_config, prefix=prefix)
(EngineCore pid=154)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=154)     return func(*args, **kwargs)
(EngineCore pid=154)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 47, in initialize_model
(EngineCore pid=154)     model_class, _ = get_model_architecture(model_config)
(EngineCore pid=154)                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 224, in get_model_architecture
(EngineCore pid=154)     model_arch = _get_model_architecture(model_config)
(EngineCore pid=154)                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 180, in _get_model_architecture
(EngineCore pid=154)     model_cls, arch = model_config.registry.resolve_model_cls(
(EngineCore pid=154)                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=154)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py", line 1138, in resolve_model_cls
(EngineCore pid=154)     raise ValueError("No model architectures are specified")
(EngineCore pid=154) ValueError: No model architectures are specified
[rank0]:[W403 08:34:39.355421752 ProcessGroupNCCL.cpp:1648] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=74) Traceback (most recent call last):
(APIServer pid=74)   File "/usr/local/bin/vllm", line 10, in 
(APIServer pid=74)     sys.exit(main())
(APIServer pid=74)              ^^^^^^
(APIServer pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 75, in main
(APIServer pid=74)     args.dispatch_function(args)
(APIServer pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd
(APIServer pid=74)     uvloop.run(run_server(args))
(APIServer pid=74)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=74)     return __asyncio.run(
(APIServer pid=74)            ^^^^^^^^^^^^^^
(APIServer pid=74)   File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
(APIServer pid=74)     return runner.run(main)
(APIServer pid=74)            ^^^^^^^^^^^^^^^^
(APIServer pid=74)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=74)     return self._loop.run_until_complete(task)
(APIServer pid=74)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=74)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=74)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=74)     return await main
(APIServer pid=74)            ^^^^^^^^^^
(APIServer pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 684, in run_server
(APIServer pid=74)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 698, in run_server_worker
(APIServer pid=74)     async with build_async_engine_client(
(APIServer pid=74)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=74)     return await anext(self.gen)
(APIServer pid=74)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client
(APIServer pid=74)     async with build_async_engine_client_from_engine_args(
(APIServer pid=74)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=74)     return await anext(self.gen)
(APIServer pid=74)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client_from_engine_args
(APIServer pid=74)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=74)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config
(APIServer pid=74)     return cls(
(APIServer pid=74)            ^^^^
(APIServer pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 154, in __init__
(APIServer pid=74)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=74)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=74)     return func(*args, **kwargs)
(APIServer pid=74)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 129, in make_async_mp_client
(APIServer pid=74)     return AsyncMPClient(*client_args)
(APIServer pid=74)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=74)     return func(*args, **kwargs)
(APIServer pid=74)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 872, in __init__
(APIServer pid=74)     super().__init__(
(APIServer pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 534, in __init__
(APIServer pid=74)     with launch_core_engines(
(APIServer pid=74)   File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=74)     next(self.gen)
(APIServer pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1073, in launch_core_engines
(APIServer pid=74)     wait_for_engine_startup(
(APIServer pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1132, in wait_for_engine_startup
(APIServer pid=74)     raise RuntimeError(
(APIServer pid=74) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Thanks again for your help!

Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>

TihoElek · 2026-04-03T11:21:02Z

@thomasmaindron The error is now clearly propagated and uncovers a wider issue.
My understanding is:
Root config.json architecture is used to resolve the outer Mistral3ForConditionalGeneration. During nested LM initialization, vLLM switches from the root config to config.text_config, which does not carry the needed inner LM architecture. Without an explicit inner architecture, nested LM resolution reaches registry with no architectures and fails with No model architectures are specified. Fix by explicitly passing the inner LM architecture: Ministral3ForCausalLM.

Could you check now?

thomasmaindron · 2026-04-03T12:34:48Z

@TihoElek After applying the changes in utils.py and mistral3.py, the error now occurs after trying to load the safetensors:

Error message ("No module or parameter named 'model.layers.0.mlp.down_proj.activation.scale' in TransformersMultiModalForCausalLM.")

(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] EngineCore failed to start.
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] Traceback (most recent call last):
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1082, in run_engine_core
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 848, in __init__
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     super().__init__(
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 114, in __init__
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     self.model_executor = executor_class(vllm_config)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 109, in __init__
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     self._init_executor()
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 52, in _init_executor
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     self.driver_worker.load_model()
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 323, in load_model
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4749, in load_model
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     self.model = model_loader.load_model(
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]                  ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 64, in load_model
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     self.load_weights(model, model_config)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 381, in load_weights
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/mistral3.py", line 563, in load_weights
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     return original_load_weights(self, weights, *args, **kwargs)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 348, in load_weights
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     yield from self._load_module(
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 268, in _load_module
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     loaded_params = module_load_weights(weights)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/transformers/base.py", line 609, in load_weights
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     return original_load_weights(self, weights, *args, **kwargs)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 348, in load_weights
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     yield from self._load_module(
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     yield from self._load_module(
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     yield from self._load_module(
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   [Previous line repeated 2 more times]
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 332, in _load_module
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108]     raise ValueError(msg)
(EngineCore pid=195) ERROR 04-03 12:24:00 [core.py:1108] ValueError: There is no module or parameter named 'model.layers.0.mlp.down_proj.activation_scale' in TransformersMultiModalForCausalLM. The available parameters belonging to model.layers.0.mlp.down_proj (RowParallelLinear) are: {'model.layers.0.mlp.down_proj.weight_scale', 'model.layers.0.mlp.down_proj.input_scale', 'model.layers.0.mlp.down_proj.weight'}
(EngineCore pid=195) Process EngineCore:
(EngineCore pid=195) Traceback (most recent call last):
(EngineCore pid=195)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore pid=195)     self.run()
(EngineCore pid=195)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore pid=195)     self._target(*self._args, **self._kwargs)
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1112, in run_engine_core
(EngineCore pid=195)     raise e
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1082, in run_engine_core
(EngineCore pid=195)     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=195)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=195)     return func(*args, **kwargs)
(EngineCore pid=195)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 848, in __init__
(EngineCore pid=195)     super().__init__(
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 114, in __init__
(EngineCore pid=195)     self.model_executor = executor_class(vllm_config)
(EngineCore pid=195)                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=195)     return func(*args, **kwargs)
(EngineCore pid=195)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 109, in __init__
(EngineCore pid=195)     self._init_executor()
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 52, in _init_executor
(EngineCore pid=195)     self.driver_worker.load_model()
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 323, in load_model
(EngineCore pid=195)     self.model_runner.load_model(load_dummy_weights=load_dummy_weights)
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=195)     return func(*args, **kwargs)
(EngineCore pid=195)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4749, in load_model
(EngineCore pid=195)     self.model = model_loader.load_model(
(EngineCore pid=195)                  ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=195)     return func(*args, **kwargs)
(EngineCore pid=195)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 64, in load_model
(EngineCore pid=195)     self.load_weights(model, model_config)
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=195)     return func(*args, **kwargs)
(EngineCore pid=195)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 381, in load_weights
(EngineCore pid=195)     loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
(EngineCore pid=195)                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/mistral3.py", line 563, in load_weights
(EngineCore pid=195)     return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
(EngineCore pid=195)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
(EngineCore pid=195)     return original_load_weights(self, weights, *args, **kwargs)
(EngineCore pid=195)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 348, in load_weights
(EngineCore pid=195)     autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore pid=195)                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(EngineCore pid=195)     yield from self._load_module(
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 268, in _load_module
(EngineCore pid=195)     loaded_params = module_load_weights(weights)
(EngineCore pid=195)                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/transformers/base.py", line 609, in load_weights
(EngineCore pid=195)     return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
(EngineCore pid=195)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
(EngineCore pid=195)     return original_load_weights(self, weights, *args, **kwargs)
(EngineCore pid=195)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 348, in load_weights
(EngineCore pid=195)     autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore pid=195)                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(EngineCore pid=195)     yield from self._load_module(
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(EngineCore pid=195)     yield from self._load_module(
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(EngineCore pid=195)     yield from self._load_module(
(EngineCore pid=195)   [Previous line repeated 2 more times]
(EngineCore pid=195)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 332, in _load_module
(EngineCore pid=195)     raise ValueError(msg)
(EngineCore pid=195) ValueError: There is no module or parameter named 'model.layers.0.mlp.down_proj.activation_scale' in TransformersMultiModalForCausalLM. The available parameters belonging to model.layers.0.mlp.down_proj (RowParallelLinear) are: {'model.layers.0.mlp.down_proj.weight_scale', 'model.layers.0.mlp.down_proj.input_scale', 'model.layers.0.mlp.down_proj.weight'}
Loading safetensors checkpoint shards:   0% Completed | 0/6 [00:01
(APIServer pid=72)     sys.exit(main())
(APIServer pid=72)              ^^^^^^
(APIServer pid=72)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 75, in main
(APIServer pid=72)     args.dispatch_function(args)
(APIServer pid=72)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd
(APIServer pid=72)     uvloop.run(run_server(args))
(APIServer pid=72)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=72)     return __asyncio.run(
(APIServer pid=72)            ^^^^^^^^^^^^^^
(APIServer pid=72)   File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
(APIServer pid=72)     return runner.run(main)
(APIServer pid=72)            ^^^^^^^^^^^^^^^^
(APIServer pid=72)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=72)     return self._loop.run_until_complete(task)
(APIServer pid=72)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=72)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=72)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=72)     return await main
(APIServer pid=72)            ^^^^^^^^^^
(APIServer pid=72)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 684, in run_server
(APIServer pid=72)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=72)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 698, in run_server_worker
(APIServer pid=72)     async with build_async_engine_client(
(APIServer pid=72)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=72)     return await anext(self.gen)
(APIServer pid=72)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=72)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client
(APIServer pid=72)     async with build_async_engine_client_from_engine_args(
(APIServer pid=72)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=72)     return await anext(self.gen)
(APIServer pid=72)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=72)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client_from_engine_args
(APIServer pid=72)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=72)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=72)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config
(APIServer pid=72)     return cls(
(APIServer pid=72)            ^^^^
(APIServer pid=72)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 154, in __init__
(APIServer pid=72)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=72)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=72)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=72)     return func(*args, **kwargs)
(APIServer pid=72)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=72)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 129, in make_async_mp_client
(APIServer pid=72)     return AsyncMPClient(*client_args)
(APIServer pid=72)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=72)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=72)     return func(*args, **kwargs)
(APIServer pid=72)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=72)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 872, in __init__
(APIServer pid=72)     super().__init__(
(APIServer pid=72)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 534, in __init__
(APIServer pid=72)     with launch_core_engines(
(APIServer pid=72)   File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=72)     next(self.gen)
(APIServer pid=72)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1073, in launch_core_engines
(APIServer pid=72)     wait_for_engine_startup(
(APIServer pid=72)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1132, in wait_for_engine_startup
(APIServer pid=72)     raise RuntimeError(
(APIServer pid=72) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

thomasmaindron · 2026-04-03T12:45:02Z

The multimodal architecture no longer seems to be the problem. Have Mistral models ever been able to run using the HF configuration? I've never tried.

Gregory-Pereira · 2026-04-05T21:23:42Z

Unless im missing something I was able to run this fine:

k logs pod/vllm-test-fix-38849 -f
=== [1/6] Installing git and pytest ===
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
=== [2/6] Cloning https://github.com/TihoElek/vllm.git @ fix/hf-config-architectures-none-crash ===
Cloning into '/tmp/vllm-fix'...
=== [3/6] Setting up upstream remote and syncing tags ===
HEAD: b38f61a98 Fix nested Devstral/Mistral3 LM architecture resolution
=== [4/6] Packing local wheel from container's .so files ===
Packing 96 files
Created /tmp/vllm-local-precompiled.whl (410 MB)
=== [5/6] Installing test dependencies ===
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
=== [6/6] Editable install ===
    Found existing installation: vllm 0.19.1rc1.dev29+g93726b2a1
    Uninstalling vllm-0.19.1rc1.dev29+g93726b2a1:
      Successfully uninstalled vllm-0.19.1rc1.dev29+g93726b2a1
Successfully installed vllm-0.19.1rc1.dev3+gb38f61a98.precompiled
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.

=== Ready ===
vLLM 0.19.1rc1.dev3+gb38f61a98 from /tmp/vllm-fix/vllm/__init__.py

=== Running tests: tests/models/multimodal/test_mistral3.py -v ===
============================= test session starts ==============================
platform linux -- Python 3.12.13, pytest-9.0.2, pluggy-1.6.0 -- /usr/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default'
rootdir: /tmp/vllm-fix
configfile: pyproject.toml
plugins: typeguard-4.5.1, hypothesis-6.151.11, timeout-2.4.0, shard-0.1.2, rerunfailures-16.1, mock-3.15.1, forked-1.6.0, asyncio-1.3.0, hydra-core-1.3.2, buildkite-test-collector-0.1.9, cov-7.1.0, schemathesis-4.15.0, anyio-4.13.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collecting ... collected 1 item
Running 1 items in this shard: tests/models/multimodal/test_mistral3.py::test_mistral3_passes_inner_lm_architecture

tests/models/multimodal/test_mistral3.py::test_mistral3_passes_inner_lm_architecture PASSED [100%]

=============================== warnings summary ===============================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

../../usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:362: 14 warnings
  /usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:362: DeprecationWarning: `torch.jit.script_method` is deprecated. Please switch to `torch.compile` or `torch.export`.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================== 1 passed, 16 warnings in 2.74s ========================

=== Tests PASSED ===
=== Starting vLLM server: mistralai/Devstral-Small-2-24B-Instruct-2512 ===
(APIServer pid=1) INFO 04-05 21:18:02 [utils.py:299]
(APIServer pid=1) INFO 04-05 21:18:02 [utils.py:299]        █     █     █▄   ▄█
(APIServer pid=1) INFO 04-05 21:18:02 [utils.py:299]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.19.1rc1.dev3+gb38f61a98
(APIServer pid=1) INFO 04-05 21:18:02 [utils.py:299]   █▄█▀ █     █     █     █  model   mistralai/Devstral-Small-2-24B-Instruct-2512
(APIServer pid=1) INFO 04-05 21:18:02 [utils.py:299]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀
(APIServer pid=1) INFO 04-05 21:18:02 [utils.py:299]
(APIServer pid=1) INFO 04-05 21:18:02 [utils.py:233] non-default args: {'model': 'mistralai/Devstral-Small-2-24B-Instruct-2512'}
(APIServer pid=1) WARNING 04-05 21:18:02 [envs.py:1783] Unknown vLLM environment variable detected: VLLM_TEST_UBATCH_SERVICE_HOST
(APIServer pid=1) WARNING 04-05 21:18:02 [envs.py:1783] Unknown vLLM environment variable detected: VLLM_FORK_URL
(APIServer pid=1) WARNING 04-05 21:18:02 [envs.py:1783] Unknown vLLM environment variable detected: VLLM_TEST_UBATCH_PORT_8000_TCP_ADDR
(APIServer pid=1) WARNING 04-05 21:18:02 [envs.py:1783] Unknown vLLM environment variable detected: VLLM_TEST_UBATCH_PORT
(APIServer pid=1) WARNING 04-05 21:18:02 [envs.py:1783] Unknown vLLM environment variable detected: VLLM_TEST_PATH
(APIServer pid=1) WARNING 04-05 21:18:02 [envs.py:1783] Unknown vLLM environment variable detected: VLLM_TEST_UBATCH_PORT_8000_TCP_PROTO
(APIServer pid=1) WARNING 04-05 21:18:02 [envs.py:1783] Unknown vLLM environment variable detected: VLLM_SERVE_MODEL
(APIServer pid=1) WARNING 04-05 21:18:02 [envs.py:1783] Unknown vLLM environment variable detected: VLLM_TEST_UBATCH_PORT_8000_TCP
(APIServer pid=1) WARNING 04-05 21:18:02 [envs.py:1783] Unknown vLLM environment variable detected: VLLM_SERVE_ARGS
(APIServer pid=1) WARNING 04-05 21:18:02 [envs.py:1783] Unknown vLLM environment variable detected: VLLM_TEST_UBATCH_PORT_8000_TCP_PORT
(APIServer pid=1) WARNING 04-05 21:18:02 [envs.py:1783] Unknown vLLM environment variable detected: VLLM_TEST_UBATCH_SERVICE_PORT
(APIServer pid=1) WARNING 04-05 21:18:02 [envs.py:1783] Unknown vLLM environment variable detected: VLLM_BRANCH
Parse safetensors files: 100%|██████████| 2/2 [00:00<00:00,  6.11it/s]
(APIServer pid=1) INFO 04-05 21:18:06 [config.py:289] Inferred from consolidated*.safetensors files torch.bfloat16 dtype.
(APIServer pid=1) INFO 04-05 21:18:14 [model.py:549] Resolved architecture: PixtralForConditionalGeneration
(APIServer pid=1) INFO 04-05 21:18:14 [model.py:1680] Using max model len 393216
(APIServer pid=1) INFO 04-05 21:18:15 [scheduler.py:238] Chunked prefill is enabled with max_num_batched_tokens=8192.
(APIServer pid=1) INFO 04-05 21:18:15 [vllm.py:799] Asynchronous scheduling is enabled.
(APIServer pid=1) INFO 04-05 21:18:15 [kernel.py:196] Final IR op priority after setting platform defaults: IrOpPriorityConfig(rms_norm=['native'])
(EngineCore pid=1792) INFO 04-05 21:18:25 [core.py:105] Initializing a V1 LLM engine (v0.19.1rc1.dev3+gb38f61a98) with config: model='mistralai/Devstral-Small-2-24B-Instruct-2512', speculative_config=None, tokenizer='mistralai/Devstral-Small-2-24B-Instruct-2512', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=393216, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, decode_context_parallel_size=1, dcp_comm_backend=ag_rs, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=mistralai/Devstral-Small-2-24B-Instruct-2512, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'ir_enable_torch_wrap': True, 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::olmo_hybrid_gdn_full_forward', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update', 'vllm::unified_mla_kv_cache_update'], 'compile_mm_encoder': False, 'cudagraph_mm_encoder': False, 'encoder_cudagraph_token_budgets': [], 'encoder_cudagraph_max_images_per_batch': 0, 'compile_sizes': [], 'compile_ranges_endpoints': [8192], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'size_asserts': False, 'alignment_asserts': False, 'scalar_asserts': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': True, 'static_all_moe_layers': []}, kernel_config=KernelConfig(ir_op_priority=IrOpPriorityConfig(rms_norm=['native']), enable_flashinfer_autotune=True, moe_backend='auto')
(EngineCore pid=1792) INFO 04-05 21:18:26 [parallel_state.py:1400] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.0.2.178:35411 backend=nccl
(EngineCore pid=1792) INFO 04-05 21:18:26 [parallel_state.py:1712] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A, EPLB rank N/A
(EngineCore pid=1792) INFO 04-05 21:18:27 [gpu_model_runner.py:4733] Starting to load model mistralai/Devstral-Small-2-24B-Instruct-2512...
(EngineCore pid=1792) INFO 04-05 21:18:27 [vllm.py:799] Asynchronous scheduling is enabled.
(EngineCore pid=1792) INFO 04-05 21:18:27 [kernel.py:196] Final IR op priority after setting platform defaults: IrOpPriorityConfig(rms_norm=['native'])
(EngineCore pid=1792) INFO 04-05 21:18:27 [__init__.py:261] Selected CutlassFP8ScaledMMLinearKernel for Fp8LinearMethod
(EngineCore pid=1792) INFO 04-05 21:18:27 [deep_gemm.py:115] DeepGEMM E8M0 enabled on current platform.
(EngineCore pid=1792) INFO 04-05 21:18:27 [cuda.py:362] Using FLASH_ATTN attention backend out of potential backends: ['FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION'].
(EngineCore pid=1792) INFO 04-05 21:18:27 [flash_attn.py:622] Using FlashAttention version 3
(EngineCore pid=1792) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
(EngineCore pid=1792) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
(EngineCore pid=1792) INFO 04-05 21:18:58 [weight_utils.py:583] Time spent downloading weights for mistralai/Devstral-Small-2-24B-Instruct-2512: 30.365908 seconds
Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  50% Completed | 1/2 [00:02<00:02,  2.32s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:03<00:00,  1.44s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:03<00:00,  1.57s/it]
(EngineCore pid=1792)
(EngineCore pid=1792) INFO 04-05 21:19:01 [default_loader.py:384] Loading weights took 3.19 seconds
(EngineCore pid=1792) INFO 04-05 21:19:02 [gpu_model_runner.py:4818] Model loading took 24.12 GiB memory and 34.811804 seconds
(EngineCore pid=1792) INFO 04-05 21:19:02 [gpu_model_runner.py:5758] Encoder cache will be initialized with a budget of 8192 tokens, and profiled with 2 image items of the maximum feature size.
(EngineCore pid=1792) WARNING 04-05 21:19:03 [op.py:236] Priority not set for op rms_norm, using native implementation.
(EngineCore pid=1792) INFO 04-05 21:19:12 [backends.py:1051] Using cache directory: /root/.cache/vllm/torch_compile_cache/f06ac4bc20/rank_0_0/backbone for vLLM's torch.compile
(EngineCore pid=1792) INFO 04-05 21:19:12 [backends.py:1111] Dynamo bytecode transform time: 7.42 s
(EngineCore pid=1792) INFO 04-05 21:19:16 [backends.py:372] Cache the graph of compile range (1, 8192) for later use
(EngineCore pid=1792) INFO 04-05 21:19:20 [backends.py:390] Compiling a graph for compile range (1, 8192) takes 7.22 s
(EngineCore pid=1792) INFO 04-05 21:19:23 [decorators.py:655] saved AOT compiled function to /root/.cache/vllm/torch_compile_cache/torch_aot_compile/aafd0a688359ce5b801ae7e55c557c62b559618980fe70c59e2e8e3d13f73ed7/rank_0_0/model
(EngineCore pid=1792) INFO 04-05 21:19:23 [monitor.py:48] torch.compile took 17.72 s in total
(EngineCore pid=1792) INFO 04-05 21:19:23 [monitor.py:76] Initial profiling/warmup run took 0.54 s
(EngineCore pid=1792) INFO 04-05 21:19:28 [kv_cache_utils.py:829] Overriding num_gpu_blocks=0 with num_gpu_blocks_override=512
(EngineCore pid=1792) INFO 04-05 21:19:28 [gpu_model_runner.py:5881] Profiling CUDA graph memory: PIECEWISE=51 (largest=512), FULL=51 (largest=512)
(EngineCore pid=1792) INFO 04-05 21:19:30 [gpu_model_runner.py:5960] Estimated CUDA graph memory: 0.50 GiB total
(EngineCore pid=1792) INFO 04-05 21:19:30 [gpu_worker.py:436] Available KV cache memory: 99.44 GiB
(EngineCore pid=1792) INFO 04-05 21:19:30 [gpu_worker.py:470] In v0.19, CUDA graph memory profiling will be enabled by default (VLLM_MEMORY_PROFILER_ESTIMATE_CUDAGRAPHS=1), which more accurately accounts for CUDA graph memory during KV cache allocation. To try it now, set VLLM_MEMORY_PROFILER_ESTIMATE_CUDAGRAPHS=1 and increase --gpu-memory-utilization from 0.9000 to 0.9035 to maintain the same effective KV cache size.
(EngineCore pid=1792) INFO 04-05 21:19:30 [kv_cache_utils.py:1319] GPU KV cache size: 651,664 tokens
(EngineCore pid=1792) INFO 04-05 21:19:30 [kv_cache_utils.py:1324] Maximum concurrency for 393,216 tokens per request: 1.66x
(EngineCore pid=1792) 2026-04-05 21:19:30,463 - INFO - autotuner.py:446 - flashinfer.jit: [Autotuner]: Autotuning process starts ...
(EngineCore pid=1792) 2026-04-05 21:19:30,472 - INFO - autotuner.py:455 - flashinfer.jit: [Autotuner]: Autotuning process ends
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%|██████████| 51/51 [00:01<00:00, 31.75it/s]
Capturing CUDA graphs (decode, FULL): 100%|██████████| 51/51 [00:01<00:00, 35.96it/s]
(EngineCore pid=1792) INFO 04-05 21:19:34 [gpu_model_runner.py:6051] Graph capturing finished in 4 secs, took 0.69 GiB
(EngineCore pid=1792) INFO 04-05 21:19:34 [gpu_worker.py:597] CUDA graph pool memory: 0.69 GiB (actual), 0.5 GiB (estimated), difference: 0.2 GiB (28.2%).
(EngineCore pid=1792) INFO 04-05 21:19:34 [core.py:283] init engine (profile, create kv cache, warmup model) took 31.66 seconds
(EngineCore pid=1792) INFO 04-05 21:19:34 [kernel.py:196] Final IR op priority after setting platform defaults: IrOpPriorityConfig(rms_norm=['native'])
(APIServer pid=1) INFO 04-05 21:19:34 [api_server.py:604] Supported tasks: ['generate']
(APIServer pid=1) WARNING 04-05 21:19:35 [model.py:1437] Default vLLM sampling parameters have been overridden by the model's `generation_config.json`: `{'temperature': 0.15}`. If this is not intended, please relaunch vLLM instance with `--generation-config vllm`.
(APIServer pid=1) INFO 04-05 21:19:36 [base.py:245] Multi-modal warmup completed in 0.152s
(APIServer pid=1) INFO 04-05 21:19:37 [api_server.py:608] Starting vLLM server on http://0.0.0.0:8000
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:37] Available routes are:
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /openapi.json, Methods: GET, HEAD
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /docs, Methods: GET, HEAD
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /docs/oauth2-redirect, Methods: GET, HEAD
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /redoc, Methods: GET, HEAD
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /tokenize, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /detokenize, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /load, Methods: GET
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /version, Methods: GET
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /health, Methods: GET
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /metrics, Methods: GET
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /v1/models, Methods: GET
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /ping, Methods: GET
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /ping, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /invocations, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /v1/chat/completions, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /v1/chat/completions/batch, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /v1/responses, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /v1/responses/{response_id}, Methods: GET
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /v1/responses/{response_id}/cancel, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /v1/completions, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /v1/messages, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /v1/messages/count_tokens, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /inference/v1/generate, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /v1/chat/completions/render, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /v1/completions/render, Methods: POST
(APIServer pid=1) INFO 04-05 21:19:37 [launcher.py:46] Route: /generative_scoring, Methods: POST
(APIServer pid=1) INFO:     Started server process [1]
(APIServer pid=1) INFO:     Waiting for application startup.
(APIServer pid=1) INFO:     Application startup complete.
(APIServer pid=1) INFO:     127.0.0.1:57834 - "GET /v1/models HTTP/1.1" 200 OK

What the above shows is a using latest nightly base image, wrapping the precompiled bits in a wheel then cloning and installing his code (as you can see from the commit sha list)

@thomasmaindron The error you're seeing now (No module or parameter named 'model.layers.0.mlp.down_proj.activation_scale') is a separate issue from the original TypeError — the architecture/config fix in this PR is working correctly.

A few things to note:

This PR fixes what it claims to fix. We verified it end-to-end by serving the upstream mistralai/Devstral-Small-2-24B-Instruct-2512 model (HF format) on an H100. It starts and serves successfully — the original TypeError: 'NoneType' object is not iterable and the follow-up No model architectures are specified errors are both resolved.
Your new error is a weight-loading issue specific to your fine-tuned model. The activation_scale parameter in your safetensors doesn't match what vLLM expects (activation.scale vs activation_scale). Notice the error is happening in TransformersMultiModalForCausalLM, not Mistral3ForConditionalGeneration — this suggests your fine-tuned model may have been saved with a different quantization format (likely from unsloth's FP8 export).
Based on the above yes, Mistral3 models work with HF format config in vLLM. The issue you're hitting now is specific to the weight format of your fine-tuned checkpoint, not the HF config format itself.

thomasmaindron · 2026-04-07T07:09:19Z

@Gregory-Pereira What I'm trying to load here is Devstral-Small-2-24B-Instruct-2512, not my fine-tuned model. But I guess I'll just continue to look for a solution on my original issue (which this pull request is based on). Thanks for your help anyway!

TihoElek · 2026-04-07T08:14:18Z

@Gregory-Pereira thank you to for the e2e test and the logs.
@patrickvonplaten, @DarkLight1337 (or other maintainers), what are the preferred next steps here? Is this fix sufficient for this issue?

DarkLight1337 · 2026-04-07T08:19:26Z

@patrickvonplaten could you clarify whether we are supposed to be able to load Devstral Small 2 in HF format?

thomasmaindron · 2026-04-09T09:47:06Z

@hmellor I actually just implemented this in #39293 while you were posting happy to move it here or keep it there, whichever you prefer!

TihoElek · 2026-04-09T09:50:59Z

@thomasmaindron @hmellor updated. Feel free to review this PR for this specific issue.

hmellor

Thanks this looks good now, just a small nit so that transformers is only imported if absolutely necessary

hmellor · 2026-04-09T09:57:57Z

 import torch
 from packaging.version import Version
 from pydantic import ConfigDict, Field, model_validator
+from transformers.models.auto.modeling_auto import MODEL_FOR_CAUSAL_LM_MAPPING_NAMES


Could you delay the import as is done in https://github.com/vllm-project/vllm/pull/39293/changes#diff-bee6813076031d3ca1edc903c1b02b81e4676519afc562ce3fefe37f20c7b650

hmellor · 2026-04-09T09:59:23Z

Let's merge this one (almost) as is, then @thomasmaindron we can merge the extra changes (FP8 scales for example) you added in your PR

Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>

thomasmaindron · 2026-04-09T10:13:54Z

@hmellor Got it! I'll remove my implementation when I can.

…l loading (vllm-project#38849) Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>

…l loading (vllm-project#38849) Signed-off-by: Tihomir Elek <tiho.elek@gmail.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

…l loading (vllm-project#38849) Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>

…l loading (vllm-project#38849) Signed-off-by: Tihomir Elek <tiho.elek@gmail.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

TihoElek requested a review from 22quinn as a code owner April 2, 2026 20:54

mergify Bot added intel-gpu Related to Intel GPU bug Something isn't working labels Apr 2, 2026

TihoElek force-pushed the fix/hf-config-architectures-none-crash branch from 2983aeb to ca394a9 Compare April 2, 2026 20:56

gemini-code-assist Bot reviewed Apr 2, 2026

View reviewed changes

TihoElek force-pushed the fix/hf-config-architectures-none-crash branch from ca394a9 to 9a84f51 Compare April 2, 2026 21:03

jikunshang added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 3, 2026

Fix TypeError when hf_config.architectures

f0498d4

Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>

TihoElek force-pushed the fix/hf-config-architectures-none-crash branch from 9a84f51 to f0498d4 Compare April 3, 2026 07:23

jikunshang removed the intel-gpu Related to Intel GPU label Apr 3, 2026

mergify Bot added the intel-gpu Related to Intel GPU label Apr 3, 2026

Fix nested Devstral/Mistral3 LM architecture resolution

b38f61a

Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>

TihoElek requested review from DarkLight1337, patrickvonplaten and ywang96 as code owners April 3, 2026 11:11

mergify Bot added the multi-modality Related to multi-modality (#4194) label Apr 3, 2026

DarkLight1337 requested a review from hmellor April 3, 2026 11:13

thomasmaindron mentioned this pull request Apr 8, 2026

[Bugfix][Model] Fix Devstral Small 2 HF format weight loading #39293

Merged

3 tasks

TihoElek requested review from ProExpertProg, houseroad, mgoin, tlrmchlsmth and yewentao256 as code owners April 9, 2026 09:46

hmellor reviewed Apr 9, 2026

View reviewed changes

Move MODEL_FOR_CAUSAL_LM_MAPPING_NAMES import

33da69b

Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>

TihoElek requested a review from hmellor April 9, 2026 10:07

hmellor approved these changes Apr 9, 2026

View reviewed changes

TihoElek added 3 commits April 9, 2026 14:59

Merge branch 'main' into fix/hf-config-architectures-none-crash

0a1478a

Merge branch 'main' into fix/hf-config-architectures-none-crash

550cdde

Merge branch 'main' into fix/hf-config-architectures-none-crash

b8a0864

hmellor merged commit 8d825b8 into vllm-project:main Apr 13, 2026
57 of 58 checks passed

wojciech-wais pushed a commit to wojciech-wais/vllm that referenced this pull request Apr 13, 2026

[Bug] Fix TypeError when hf_config.architectures is None during mode…

73af0bd

…l loading (vllm-project#38849) Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>

whk-lab pushed a commit to whk-lab/vllm that referenced this pull request Apr 23, 2026

[Bug] Fix TypeError when hf_config.architectures is None during mode…

505acc1

…l loading (vllm-project#38849) Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>

mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026

[Bug] Fix TypeError when hf_config.architectures is None during mode…

3466797

…l loading (vllm-project#38849) Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>

my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026

[Bug] Fix TypeError when hf_config.architectures is None during mode…

9e01d65

…l loading (vllm-project#38849) Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>

my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026

[Bug] Fix TypeError when hf_config.architectures is None during mode…

c6abea3

…l loading (vllm-project#38849) Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>

jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026

[Bug] Fix TypeError when hf_config.architectures is None during mode…

71c9dac

…l loading (vllm-project#38849) Signed-off-by: Tihomir Elek <tiho.elek@gmail.com>

Uh oh!

Conversation

TihoElek commented Apr 2, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

TihoElek commented Apr 2, 2026

Uh oh!

jikunshang commented Apr 3, 2026

Uh oh!

thomasmaindron commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TihoElek commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomasmaindron commented Apr 3, 2026

Uh oh!

thomasmaindron commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Gregory-Pereira commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomasmaindron commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TihoElek commented Apr 7, 2026

Uh oh!

DarkLight1337 commented Apr 7, 2026

Uh oh!

thomasmaindron commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TihoElek commented Apr 9, 2026

Uh oh!

hmellor left a comment

Choose a reason for hiding this comment

Uh oh!

hmellor Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

hmellor commented Apr 9, 2026

Uh oh!

thomasmaindron commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

TihoElek commented Apr 2, 2026 •

edited by github-actions Bot

Loading

thomasmaindron commented Apr 3, 2026 •

edited

Loading

TihoElek commented Apr 3, 2026 •

edited

Loading

thomasmaindron commented Apr 3, 2026 •

edited

Loading

Gregory-Pereira commented Apr 5, 2026 •

edited

Loading

thomasmaindron commented Apr 7, 2026 •

edited

Loading

thomasmaindron commented Apr 9, 2026 •

edited

Loading

thomasmaindron commented Apr 9, 2026 •

edited

Loading