Skip to content

[Bug] start Google Gemma3 with SGLang 0.4.4.post1: ValueError: '<class 'sglang.srt.configs.qwen2_5_vl_config.Qwen2_5_VLConfig'>' is already used by a Transformers model. #4607

@didier-durand

Description

@didier-durand

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

Hi,

I try to start Google Gemma3 with v0.4.4.post1.

It fails with the following after upgrade to last transformers:

[ec2-user@ip-10-0-0-66 ~]$ python3.12 -m sglang.launch_server   --model google/gemma-3-12b-pt --model-path /home/model/google/gemma-3-12b-pt   --host 0.0.0.0 --port 30000 --tensor-parallel-size 4   --log-level info   --enable-metrics --trust-remote-code --enable-p2p-check
-bash: python3.12: command not found
[ec2-user@ip-10-0-0-66 ~]$ docker exec -it cc9d452cddc2 python3.12 -m sglang.launch_server   --model google/gemma-3-12b-pt --model-path /home/model/google/gemma-3-12b-pt   --host 0.0.0.0 --port 30000 --tensor-parallel-size 4   --log-level info   --enable-metrics --trust-remote-code --enable-p2p-check

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/local/lib/python3.12/site-packages/sglang/launch_server.py", line 6, in <module>
    from sglang.srt.entrypoints.http_server import launch_server
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/entrypoints/http_server.py", line 44, in <module>
    from sglang.srt.entrypoints.engine import _launch_subprocesses
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/entrypoints/engine.py", line 36, in <module>
    from sglang.srt.managers.data_parallel_controller import (
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/managers/data_parallel_controller.py", line 27, in <module>
    from sglang.srt.managers.io_struct import (
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/managers/io_struct.py", line 25, in <module>
    from sglang.srt.managers.schedule_batch import BaseFinishReason
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/managers/schedule_batch.py", line 43, in <module>
    from sglang.srt.configs.model_config import ModelConfig
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/configs/__init__.py", line 5, in <module>
    from sglang.srt.configs.qwen2_5_vl_config import (
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/configs/qwen2_5_vl_config.py", line 1005, in <module>
    AutoImageProcessor.register(Qwen2_5_VLConfig, None, Qwen2_5_VLImageProcessor, None)
  File "/usr/local/lib/python3.12/site-packages/transformers/models/auto/image_processing_auto.py", line 628, in register
    IMAGE_PROCESSOR_MAPPING.register(
  File "/usr/local/lib/python3.12/site-packages/transformers/models/auto/auto_factory.py", line 833, in register
    raise ValueError(f"'{key}' is already used by a Transformers model.")
ValueError: '<class 'sglang.srt.configs.qwen2_5_vl_config.Qwen2_5_VLConfig'>' is already used by a Transformers model.

List of packages:

[ec2-user@ip-10-0-0-66 ~]$ docker exec -it cc9d452cddc2 pip list
Package                           Version
--------------------------------- -------------------
aiohappyeyeballs                  2.6.1
aiohttp                           3.11.14
aiohttp-cors                      0.8.0
aiosignal                         1.3.2
airportsdata                      20250224
annotated-types                   0.7.0
anthropic                         0.49.0
anyio                             4.9.0
astor                             0.8.1
asttokens                         3.0.0
attrs                             25.3.0
blake3                            1.0.4
cachetools                        5.5.2
certifi                           2025.1.31
charset-normalizer                3.4.1
click                             8.1.8
cloudpickle                       3.1.1
colorful                          0.5.6
compressed-tensors                0.9.1
cuda-bindings                     12.8.0
cuda-python                       12.8.0
datasets                          3.4.1
decorator                         5.2.1
decord                            0.6.0
depyf                             0.18.0
dill                              0.3.8
diskcache                         5.6.3
distlib                           0.3.9
distro                            1.9.0
einops                            0.8.1
executing                         2.2.0
fastapi                           0.115.11
filelock                          3.18.0
flashinfer-python                 0.2.3+cu124torch2.5
frozenlist                        1.5.0
fsspec                            2024.12.0
gguf                              0.10.0
google-api-core                   2.24.2
google-auth                       2.38.0
googleapis-common-protos          1.69.2
grpcio                            1.71.0
h11                               0.14.0
hf_transfer                       0.1.9
httpcore                          1.0.7
httptools                         0.6.4
httpx                             0.28.1
huggingface-hub                   0.29.3
idna                              3.10
importlib_metadata                8.6.1
interegular                       0.3.3
ipython                           9.0.2
ipython_pygments_lexers           1.1.1
jedi                              0.19.2
Jinja2                            3.1.6
jiter                             0.9.0
jsonschema                        4.23.0
jsonschema-specifications         2024.10.1
lark                              1.2.2
litellm                           1.63.11
llguidance                        0.7.2
lm-format-enforcer                0.10.11
MarkupSafe                        3.0.2
matplotlib-inline                 0.1.7
mistral_common                    1.5.4
modelscope                        1.24.0
mpmath                            1.3.0
msgpack                           1.1.0
msgspec                           0.19.0
multidict                         6.2.0
multiprocess                      0.70.16
nest-asyncio                      1.6.0
networkx                          3.4.2
ninja                             1.11.1.3
numpy                             1.26.4
nvidia-cublas-cu12                12.4.5.8
nvidia-cuda-cupti-cu12            12.4.127
nvidia-cuda-nvrtc-cu12            12.4.127
nvidia-cuda-runtime-cu12          12.4.127
nvidia-cudnn-cu12                 9.1.0.70
nvidia-cufft-cu12                 11.2.1.3
nvidia-curand-cu12                10.3.5.147
nvidia-cusolver-cu12              11.6.1.9
nvidia-cusparse-cu12              12.3.1.170
nvidia-ml-py                      12.570.86
nvidia-nccl-cu12                  2.21.5
nvidia-nvjitlink-cu12             12.4.127
nvidia-nvtx-cu12                  12.4.127
openai                            1.66.5
opencensus                        0.11.4
opencensus-context                0.1.3
opencv-python-headless            4.11.0.86
orjson                            3.10.15
outlines                          0.1.11
outlines_core                     0.1.26
packaging                         24.2
pandas                            2.2.3
parso                             0.8.4
partial-json-parser               0.2.1.1.post5
pexpect                           4.9.0
pillow                            11.1.0
pip                               25.0.1
platformdirs                      4.3.6
prometheus_client                 0.21.1
prometheus-fastapi-instrumentator 7.0.2
prompt_toolkit                    3.0.50
propcache                         0.3.0
proto-plus                        1.26.1
protobuf                          6.30.1
psutil                            7.0.0
ptyprocess                        0.7.0
pure_eval                         0.2.3
py-cpuinfo                        9.0.0
py-spy                            0.4.0
pyarrow                           19.0.1
pyasn1                            0.6.1
pyasn1_modules                    0.4.1
pycountry                         24.6.1
pydantic                          2.10.6
pydantic_core                     2.27.2
Pygments                          2.19.1
python-dateutil                   2.9.0.post0
python-dotenv                     1.0.1
python-multipart                  0.0.20
pytz                              2025.1
PyYAML                            6.0.2
pyzmq                             26.3.0
ray                               2.43.0
referencing                       0.36.2
regex                             2024.11.6
requests                          2.32.3
rpds-py                           0.23.1
rsa                               4.9
safetensors                       0.5.3
sentencepiece                     0.2.0
setproctitle                      1.3.5
setuptools                        76.1.0
sgl-kernel                        0.0.5
sglang                            0.4.4.post1
six                               1.17.0
smart-open                        7.1.0
sniffio                           1.3.1
stack-data                        0.6.3
starlette                         0.46.1
sympy                             1.13.1
tiktoken                          0.9.0
tokenizers                        0.21.1
torch                             2.5.1
torchao                           0.9.0
torchaudio                        2.5.1
torchvision                       0.20.1
tqdm                              4.67.1
traitlets                         5.14.3
transformers                      4.49.0
triton                            3.1.0
typing_extensions                 4.12.2
tzdata                            2025.1
urllib3                           2.3.0
uvicorn                           0.34.0
uvloop                            0.21.0
virtualenv                        20.29.3
vllm                              0.7.2
watchfiles                        1.0.4
wcwidth                           0.2.13
websockets                        15.0.1
wrapt                             1.17.2
xformers                          0.0.28.post3
xgrammar                          0.1.15
xxhash                            3.5.0
yarl                              1.18.3
zipp                              3.21.0

Initially, before transformer update (v0.4.4.post1 installs transformers at v4.48.3) it fails with:

sgl start command: python3.12 -m sglang.launch_server   --model google/gemma-3-12b-pt --model-path /home/model/google/gemma-3-12b-pt   --host 0.0.0.0 --port 30000 --tensor-parallel-size 4   --log-level info   --enable-metrics --trust-remote-code --enable-p2p-check
INFO 03-20 05:06:07 __init__.py:190] Automatically detected platform cuda.
[2025-03-20 05:06:09] server_args=ServerArgs(model_path='/home/model/google/gemma-3-12b-pt', tokenizer_path='/home/model/google/gemma-3-12b-pt', tokenizer_mode='auto', skip_tokenizer_init=False, load_format='auto', trust_remote_code=True, dtype='auto', kv_cache_dtype='auto', quantization=None, quantization_param_path=None, context_length=None, device='cuda', served_model_name='/home/model/google/gemma-3-12b-pt', chat_template=None, is_embedding=False, revision=None, host='0.0.0.0', port=30000, mem_fraction_static=0.85, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=2048, max_prefill_tokens=16384, schedule_policy='fcfs', schedule_conservativeness=1.0, cpu_offload_gb=0, page_size=1, tp_size=4, stream_interval=1, stream_output=False, random_seed=156562109, constrained_json_whitespace_pattern=None, watchdog_timeout=300, dist_timeout=None, download_dir=None, base_gpu_id=0, gpu_id_step=1, log_level='info', log_level_http=None, log_requests=False, log_requests_level=0, show_time_cost=False, enable_metrics=True, decode_log_interval=40, api_key=None, file_storage_path='sglang_storage', enable_cache_report=False, reasoning_parser=None, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', lora_paths=None, max_loras_per_batch=8, lora_backend='triton', attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', speculative_algorithm=None, speculative_draft_model_path=None, speculative_num_steps=5, speculative_eagle_topk=4, speculative_num_draft_tokens=8, speculative_accept_threshold_single=1.0, speculative_accept_threshold_acc=1.0, speculative_token_map=None, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, disable_radix_cache=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, enable_nccl_nvls=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_ep_moe=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=80, cuda_graph_bs=None, torchao_config='', enable_nan_detection=False, enable_p2p_check=True, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, allow_auto_truncate=False, enable_custom_logit_processor=False, tool_call_parser=None, enable_hierarchical_cache=False, enable_flashinfer_mla=False, flashinfer_mla_disable_ragged=False, warmups=None, debug_tensor_dump_output_folder=None, debug_tensor_dump_input_file=None, debug_tensor_dump_inject=False)
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/transformers/models/auto/configuration_auto.py", line 1071, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
                   ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/models/auto/configuration_auto.py", line 773, in __getitem__
    raise KeyError(key)
KeyError: 'gemma3'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/local/lib/python3.12/site-packages/sglang/launch_server.py", line 14, in <module>
    launch_server(server_args)
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/entrypoints/http_server.py", line 619, in launch_server
    tokenizer_manager, scheduler_info = _launch_subprocesses(server_args=server_args)
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/entrypoints/engine.py", line 499, in _launch_subprocesses
    tokenizer_manager = TokenizerManager(server_args, port_args)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/managers/tokenizer_manager.py", line 155, in __init__
    self.model_config = ModelConfig(
                        ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/configs/model_config.py", line 59, in __init__
    self.hf_config = get_config(
                     ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sglang/srt/hf_transformers_utils.py", line 73, in get_config
    config = AutoConfig.from_pretrained(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/transformers/models/auto/configuration_auto.py", line 1073, in from_pretrained
    raise ValueError(
ValueError: The checkpoint you are trying to load has model type `gemma3` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`

Update of transformers to 4.49.0

[ec2-user@ip-10-0-0-66 ~]$ docker exec -it cc9d452cddc2 pip install -U transformers
Requirement already satisfied: transformers in /usr/local/lib/python3.12/site-packages (4.48.3)
Collecting transformers
  Downloading transformers-4.49.0-py3-none-any.whl.metadata (44 kB)
Requirement already satisfied: filelock in /usr/local/lib/python3.12/site-packages (from transformers) (3.18.0)
Requirement already satisfied: huggingface-hub<1.0,>=0.26.0 in /usr/local/lib/python3.12/site-packages (from transformers) (0.29.3)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib64/python3.12/site-packages (from transformers) (1.26.4)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.12/site-packages (from transformers) (24.2)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib64/python3.12/site-packages (from transformers) (6.0.2)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib64/python3.12/site-packages (from transformers) (2024.11.6)
Requirement already satisfied: requests in /usr/local/lib/python3.12/site-packages (from transformers) (2.32.3)
Requirement already satisfied: tokenizers<0.22,>=0.21 in /usr/local/lib64/python3.12/site-packages (from transformers) (0.21.1)
Requirement already satisfied: safetensors>=0.4.1 in /usr/local/lib64/python3.12/site-packages (from transformers) (0.5.3)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.12/site-packages (from transformers) (4.67.1)
Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.12/site-packages (from huggingface-hub<1.0,>=0.26.0->transformers) (2024.12.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.12/site-packages (from huggingface-hub<1.0,>=0.26.0->transformers) (4.12.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib64/python3.12/site-packages (from requests->transformers) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.12/site-packages (from requests->transformers) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/site-packages (from requests->transformers) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.12/site-packages (from requests->transformers) (2025.1.31)
Downloading transformers-4.49.0-py3-none-any.whl (10.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.0/10.0 MB 139.8 MB/s eta 0:00:00
Installing collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.48.3
    Uninstalling transformers-4.48.3:
      Successfully uninstalled transformers-4.48.3
Successfully installed transformers-4.49.0

Reproduction

  • Install sglang at v0.4.4.post1 or use our image on Docker Hub: didierdurand/lic-sglang:al2023-latest
  • load Gemma3

Environment

Amazon Linux 2023 latest version

Docker build file at https://github.com/didier-durand/llms-in-clouds-private/blob/main/docker/Dockerfile-al2023-sglang

Docker image executed on an AWS ECS Cluster with EC2 instance of type https://aws.amazon.com/ec2/instance-types/g6/

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions