I try to start Google Gemma3 with v0.4.4.post1.
[ec2-user@ip-10-0-0-66 ~]$ python3.12 -m sglang.launch_server --model google/gemma-3-12b-pt --model-path /home/model/google/gemma-3-12b-pt --host 0.0.0.0 --port 30000 --tensor-parallel-size 4 --log-level info --enable-metrics --trust-remote-code --enable-p2p-check
-bash: python3.12: command not found
[ec2-user@ip-10-0-0-66 ~]$ docker exec -it cc9d452cddc2 python3.12 -m sglang.launch_server --model google/gemma-3-12b-pt --model-path /home/model/google/gemma-3-12b-pt --host 0.0.0.0 --port 30000 --tensor-parallel-size 4 --log-level info --enable-metrics --trust-remote-code --enable-p2p-check
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/usr/local/lib/python3.12/site-packages/sglang/launch_server.py", line 6, in <module>
from sglang.srt.entrypoints.http_server import launch_server
File "/usr/local/lib/python3.12/site-packages/sglang/srt/entrypoints/http_server.py", line 44, in <module>
from sglang.srt.entrypoints.engine import _launch_subprocesses
File "/usr/local/lib/python3.12/site-packages/sglang/srt/entrypoints/engine.py", line 36, in <module>
from sglang.srt.managers.data_parallel_controller import (
File "/usr/local/lib/python3.12/site-packages/sglang/srt/managers/data_parallel_controller.py", line 27, in <module>
from sglang.srt.managers.io_struct import (
File "/usr/local/lib/python3.12/site-packages/sglang/srt/managers/io_struct.py", line 25, in <module>
from sglang.srt.managers.schedule_batch import BaseFinishReason
File "/usr/local/lib/python3.12/site-packages/sglang/srt/managers/schedule_batch.py", line 43, in <module>
from sglang.srt.configs.model_config import ModelConfig
File "/usr/local/lib/python3.12/site-packages/sglang/srt/configs/__init__.py", line 5, in <module>
from sglang.srt.configs.qwen2_5_vl_config import (
File "/usr/local/lib/python3.12/site-packages/sglang/srt/configs/qwen2_5_vl_config.py", line 1005, in <module>
AutoImageProcessor.register(Qwen2_5_VLConfig, None, Qwen2_5_VLImageProcessor, None)
File "/usr/local/lib/python3.12/site-packages/transformers/models/auto/image_processing_auto.py", line 628, in register
IMAGE_PROCESSOR_MAPPING.register(
File "/usr/local/lib/python3.12/site-packages/transformers/models/auto/auto_factory.py", line 833, in register
raise ValueError(f"'{key}' is already used by a Transformers model.")
ValueError: '<class 'sglang.srt.configs.qwen2_5_vl_config.Qwen2_5_VLConfig'>' is already used by a Transformers model.
[ec2-user@ip-10-0-0-66 ~]$ docker exec -it cc9d452cddc2 pip list
Package Version
--------------------------------- -------------------
aiohappyeyeballs 2.6.1
aiohttp 3.11.14
aiohttp-cors 0.8.0
aiosignal 1.3.2
airportsdata 20250224
annotated-types 0.7.0
anthropic 0.49.0
anyio 4.9.0
astor 0.8.1
asttokens 3.0.0
attrs 25.3.0
blake3 1.0.4
cachetools 5.5.2
certifi 2025.1.31
charset-normalizer 3.4.1
click 8.1.8
cloudpickle 3.1.1
colorful 0.5.6
compressed-tensors 0.9.1
cuda-bindings 12.8.0
cuda-python 12.8.0
datasets 3.4.1
decorator 5.2.1
decord 0.6.0
depyf 0.18.0
dill 0.3.8
diskcache 5.6.3
distlib 0.3.9
distro 1.9.0
einops 0.8.1
executing 2.2.0
fastapi 0.115.11
filelock 3.18.0
flashinfer-python 0.2.3+cu124torch2.5
frozenlist 1.5.0
fsspec 2024.12.0
gguf 0.10.0
google-api-core 2.24.2
google-auth 2.38.0
googleapis-common-protos 1.69.2
grpcio 1.71.0
h11 0.14.0
hf_transfer 0.1.9
httpcore 1.0.7
httptools 0.6.4
httpx 0.28.1
huggingface-hub 0.29.3
idna 3.10
importlib_metadata 8.6.1
interegular 0.3.3
ipython 9.0.2
ipython_pygments_lexers 1.1.1
jedi 0.19.2
Jinja2 3.1.6
jiter 0.9.0
jsonschema 4.23.0
jsonschema-specifications 2024.10.1
lark 1.2.2
litellm 1.63.11
llguidance 0.7.2
lm-format-enforcer 0.10.11
MarkupSafe 3.0.2
matplotlib-inline 0.1.7
mistral_common 1.5.4
modelscope 1.24.0
mpmath 1.3.0
msgpack 1.1.0
msgspec 0.19.0
multidict 6.2.0
multiprocess 0.70.16
nest-asyncio 1.6.0
networkx 3.4.2
ninja 1.11.1.3
numpy 1.26.4
nvidia-cublas-cu12 12.4.5.8
nvidia-cuda-cupti-cu12 12.4.127
nvidia-cuda-nvrtc-cu12 12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.2.1.3
nvidia-curand-cu12 10.3.5.147
nvidia-cusolver-cu12 11.6.1.9
nvidia-cusparse-cu12 12.3.1.170
nvidia-ml-py 12.570.86
nvidia-nccl-cu12 2.21.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.4.127
openai 1.66.5
opencensus 0.11.4
opencensus-context 0.1.3
opencv-python-headless 4.11.0.86
orjson 3.10.15
outlines 0.1.11
outlines_core 0.1.26
packaging 24.2
pandas 2.2.3
parso 0.8.4
partial-json-parser 0.2.1.1.post5
pexpect 4.9.0
pillow 11.1.0
pip 25.0.1
platformdirs 4.3.6
prometheus_client 0.21.1
prometheus-fastapi-instrumentator 7.0.2
prompt_toolkit 3.0.50
propcache 0.3.0
proto-plus 1.26.1
protobuf 6.30.1
psutil 7.0.0
ptyprocess 0.7.0
pure_eval 0.2.3
py-cpuinfo 9.0.0
py-spy 0.4.0
pyarrow 19.0.1
pyasn1 0.6.1
pyasn1_modules 0.4.1
pycountry 24.6.1
pydantic 2.10.6
pydantic_core 2.27.2
Pygments 2.19.1
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
python-multipart 0.0.20
pytz 2025.1
PyYAML 6.0.2
pyzmq 26.3.0
ray 2.43.0
referencing 0.36.2
regex 2024.11.6
requests 2.32.3
rpds-py 0.23.1
rsa 4.9
safetensors 0.5.3
sentencepiece 0.2.0
setproctitle 1.3.5
setuptools 76.1.0
sgl-kernel 0.0.5
sglang 0.4.4.post1
six 1.17.0
smart-open 7.1.0
sniffio 1.3.1
stack-data 0.6.3
starlette 0.46.1
sympy 1.13.1
tiktoken 0.9.0
tokenizers 0.21.1
torch 2.5.1
torchao 0.9.0
torchaudio 2.5.1
torchvision 0.20.1
tqdm 4.67.1
traitlets 5.14.3
transformers 4.49.0
triton 3.1.0
typing_extensions 4.12.2
tzdata 2025.1
urllib3 2.3.0
uvicorn 0.34.0
uvloop 0.21.0
virtualenv 20.29.3
vllm 0.7.2
watchfiles 1.0.4
wcwidth 0.2.13
websockets 15.0.1
wrapt 1.17.2
xformers 0.0.28.post3
xgrammar 0.1.15
xxhash 3.5.0
yarl 1.18.3
zipp 3.21.0
Initially, before transformer update (v0.4.4.post1 installs transformers at v4.48.3) it fails with:
sgl start command: python3.12 -m sglang.launch_server --model google/gemma-3-12b-pt --model-path /home/model/google/gemma-3-12b-pt --host 0.0.0.0 --port 30000 --tensor-parallel-size 4 --log-level info --enable-metrics --trust-remote-code --enable-p2p-check
INFO 03-20 05:06:07 __init__.py:190] Automatically detected platform cuda.
[2025-03-20 05:06:09] server_args=ServerArgs(model_path='/home/model/google/gemma-3-12b-pt', tokenizer_path='/home/model/google/gemma-3-12b-pt', tokenizer_mode='auto', skip_tokenizer_init=False, load_format='auto', trust_remote_code=True, dtype='auto', kv_cache_dtype='auto', quantization=None, quantization_param_path=None, context_length=None, device='cuda', served_model_name='/home/model/google/gemma-3-12b-pt', chat_template=None, is_embedding=False, revision=None, host='0.0.0.0', port=30000, mem_fraction_static=0.85, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=2048, max_prefill_tokens=16384, schedule_policy='fcfs', schedule_conservativeness=1.0, cpu_offload_gb=0, page_size=1, tp_size=4, stream_interval=1, stream_output=False, random_seed=156562109, constrained_json_whitespace_pattern=None, watchdog_timeout=300, dist_timeout=None, download_dir=None, base_gpu_id=0, gpu_id_step=1, log_level='info', log_level_http=None, log_requests=False, log_requests_level=0, show_time_cost=False, enable_metrics=True, decode_log_interval=40, api_key=None, file_storage_path='sglang_storage', enable_cache_report=False, reasoning_parser=None, dp_size=1, load_balance_method='round_robin', ep_size=1, dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', lora_paths=None, max_loras_per_batch=8, lora_backend='triton', attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', speculative_algorithm=None, speculative_draft_model_path=None, speculative_num_steps=5, speculative_eagle_topk=4, speculative_num_draft_tokens=8, speculative_accept_threshold_single=1.0, speculative_accept_threshold_acc=1.0, speculative_token_map=None, enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, disable_radix_cache=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, enable_nccl_nvls=False, disable_outlines_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_overlap_schedule=False, enable_mixed_chunk=False, enable_dp_attention=False, enable_ep_moe=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=80, cuda_graph_bs=None, torchao_config='', enable_nan_detection=False, enable_p2p_check=True, triton_attention_reduce_in_fp32=False, triton_attention_num_kv_splits=8, num_continuous_decode_steps=1, delete_ckpt_after_loading=False, enable_memory_saver=False, allow_auto_truncate=False, enable_custom_logit_processor=False, tool_call_parser=None, enable_hierarchical_cache=False, enable_flashinfer_mla=False, flashinfer_mla_disable_ragged=False, warmups=None, debug_tensor_dump_output_folder=None, debug_tensor_dump_input_file=None, debug_tensor_dump_inject=False)
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/transformers/models/auto/configuration_auto.py", line 1071, in from_pretrained
config_class = CONFIG_MAPPING[config_dict["model_type"]]
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/transformers/models/auto/configuration_auto.py", line 773, in __getitem__
raise KeyError(key)
KeyError: 'gemma3'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/usr/local/lib/python3.12/site-packages/sglang/launch_server.py", line 14, in <module>
launch_server(server_args)
File "/usr/local/lib/python3.12/site-packages/sglang/srt/entrypoints/http_server.py", line 619, in launch_server
tokenizer_manager, scheduler_info = _launch_subprocesses(server_args=server_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sglang/srt/entrypoints/engine.py", line 499, in _launch_subprocesses
tokenizer_manager = TokenizerManager(server_args, port_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sglang/srt/managers/tokenizer_manager.py", line 155, in __init__
self.model_config = ModelConfig(
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sglang/srt/configs/model_config.py", line 59, in __init__
self.hf_config = get_config(
^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/sglang/srt/hf_transformers_utils.py", line 73, in get_config
config = AutoConfig.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/transformers/models/auto/configuration_auto.py", line 1073, in from_pretrained
raise ValueError(
ValueError: The checkpoint you are trying to load has model type `gemma3` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`
[ec2-user@ip-10-0-0-66 ~]$ docker exec -it cc9d452cddc2 pip install -U transformers
Requirement already satisfied: transformers in /usr/local/lib/python3.12/site-packages (4.48.3)
Collecting transformers
Downloading transformers-4.49.0-py3-none-any.whl.metadata (44 kB)
Requirement already satisfied: filelock in /usr/local/lib/python3.12/site-packages (from transformers) (3.18.0)
Requirement already satisfied: huggingface-hub<1.0,>=0.26.0 in /usr/local/lib/python3.12/site-packages (from transformers) (0.29.3)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib64/python3.12/site-packages (from transformers) (1.26.4)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.12/site-packages (from transformers) (24.2)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib64/python3.12/site-packages (from transformers) (6.0.2)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib64/python3.12/site-packages (from transformers) (2024.11.6)
Requirement already satisfied: requests in /usr/local/lib/python3.12/site-packages (from transformers) (2.32.3)
Requirement already satisfied: tokenizers<0.22,>=0.21 in /usr/local/lib64/python3.12/site-packages (from transformers) (0.21.1)
Requirement already satisfied: safetensors>=0.4.1 in /usr/local/lib64/python3.12/site-packages (from transformers) (0.5.3)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.12/site-packages (from transformers) (4.67.1)
Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.12/site-packages (from huggingface-hub<1.0,>=0.26.0->transformers) (2024.12.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.12/site-packages (from huggingface-hub<1.0,>=0.26.0->transformers) (4.12.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib64/python3.12/site-packages (from requests->transformers) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.12/site-packages (from requests->transformers) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/site-packages (from requests->transformers) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.12/site-packages (from requests->transformers) (2025.1.31)
Downloading transformers-4.49.0-py3-none-any.whl (10.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.0/10.0 MB 139.8 MB/s eta 0:00:00
Installing collected packages: transformers
Attempting uninstall: transformers
Found existing installation: transformers 4.48.3
Uninstalling transformers-4.48.3:
Successfully uninstalled transformers-4.48.3
Successfully installed transformers-4.49.0
Checklist
Describe the bug
Hi,
I try to start Google Gemma3 with v0.4.4.post1.
It fails with the following after upgrade to last transformers:
List of packages:
Initially, before transformer update (v0.4.4.post1 installs transformers at v4.48.3) it fails with:
Update of transformers to 4.49.0
Reproduction
Environment
Amazon Linux 2023 latest version
Docker build file at https://github.com/didier-durand/llms-in-clouds-private/blob/main/docker/Dockerfile-al2023-sglang
Docker image executed on an AWS ECS Cluster with EC2 instance of type https://aws.amazon.com/ec2/instance-types/g6/