Checklist
Describe the bug
LoRA adapters trained with HuggingFace Transformers + PEFT often use prefixed module names like feed_forward.gate_proj, feed_forward.up_proj, etc. However, SGLang's current SUPPORTED_LORA_TARGET_MODULES only includes base module names (gate_proj, up_proj, etc.), causing adapter loading to fail even when using --lora-target-modules all.
The issue occurs at the validation stage before SGLang's existing normalization logic can handle the weight format conversion. While SGLang already has proper normalize_gate_up_proj functionality to handle the architectural differences between training (separate gate_proj/up_proj) and inference (merged gate_up_proj), the adapter loading is blocked by the target module validation.
Proposed Solutions
Simple fix: Add common prefixed module names to SUPPORTED_LORA_TARGET_MODULES like below
SUPPORTED_LORA_TARGET_MODULES = [
"q_proj",
"k_proj",
"v_proj",
"o_proj",
"gate_proj",
"up_proj",
"down_proj",
"qkv_proj",
"gate_up_proj",
# new modules
"feed_forward.gate_proj",
"feed_forward.up_proj",
"feed_forward.down_proj"
]
Better fix: Enhance the validation logic to automatically map prefixed names to base names during validation, allowing the existing normalization to work properly (TAKING LOOK NOW)
Reproduction
- My engine launch command
python -u -m sglang.launch_server \
--model-path /path/to/llama4-maverick \
--host 0.0.0.0 --port 30000 \
--tensor-parallel-size 8 \
--enable-lora --max-lora-rank 64 \
--lora-target-modules all \
--mem-fraction-static 0.85
Model page: https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
- Attempt to load a LoRA adapter with prefixed target modules:
curl -X POST http://localhost:30000/load_lora_adapter \
-H 'Content-Type: application/json' \
-d '{
"lora_name": "test_adapter",
"lora_path": "/path/to/adapter"
}'
- Adapter config contains prefixed modules:
{
"target_modules": [
"feed_forward.gate_proj",
"feed_forward.up_proj",
"feed_forward.down_proj",
"o_proj", "q_proj", "k_proj", "v_proj"
],
"r": 64
}
I can offer a sample adapter file offline, If needed.
Environment
Environment
SGLang version: Latest main branch (commit f4e3ebe)
Model: Llama4-Maverick (but affects any model with prefixed module names IIUC)
LoRA training framework: HuggingFace Transformers + PEFT
Hardware: 8 X H100
python3 -m sglang.check_env result below
/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
import pynvml # type: ignore[import]
Python: 3.12.11 (main, Jun 4 2025, 08:56:18) [GCC 11.4.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA H100 80GB HBM3
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.9, V12.9.86
CUDA Driver Version: 565.57.01
PyTorch: 2.8.0+cu129
sglang: 0.5.3rc0
sgl_kernel: 0.3.12
flashinfer_python: 0.4.0rc1
triton: 3.4.0
transformers: 4.56.1
torchao: 0.9.0
numpy: 2.3.3
aiohttp: 3.12.15
fastapi: 0.117.1
hf_transfer: 0.1.9
huggingface_hub: 0.35.1
interegular: 0.3.3
modelscope: 1.30.0
orjson: 3.11.3
outlines: 0.1.11
packaging: 25.0
psutil: 7.1.0
pydantic: 2.11.9
python-multipart: 0.0.20
pyzmq: 27.1.0
uvicorn: 0.37.0
uvloop: 0.21.0
vllm: Module Not Found
xgrammar: 0.1.24
openai: 1.99.1
tiktoken: 0.11.0
anthropic: 0.68.0
litellm: Module Not Found
decord: 0.6.0
NVIDIA Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 NIC2 NIC3 NIC4 NIC5 NIC6 NIC7 NIC8 NIC9 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV18 NV18 NV18 NV18 NV18 NV18 NV18 PIX PIX SYS SYS SYS SYS SYS SYS SYS SYS 0-11 0 N/A
GPU1 NV18 X NV18 NV18 NV18 NV18 NV18 NV18 SYS SYS PIX SYS SYS SYS SYS SYS SYS SYS 24-35 2 N/A
GPU2 NV18 NV18 X NV18 NV18 NV18 NV18 NV18 SYS SYS SYS PIX SYS SYS SYS SYS SYS SYS 36-47 3 N/A
GPU3 NV18 NV18 NV18 X NV18 NV18 NV18 NV18 SYS SYS SYS SYS PIX SYS SYS SYS SYS SYS 12-23 1 N/A
GPU4 NV18 NV18 NV18 NV18 X NV18 NV18 NV18 SYS SYS SYS SYS SYS PIX PIX SYS SYS SYS 48-59 4 N/A
GPU5 NV18 NV18 NV18 NV18 NV18 X NV18 NV18 SYS SYS SYS SYS SYS SYS SYS PIX SYS SYS 72-83 6 N/A
GPU6 NV18 NV18 NV18 NV18 NV18 NV18 X NV18 SYS SYS SYS SYS SYS SYS SYS SYS PIX SYS 84-95 7 N/A
GPU7 NV18 NV18 NV18 NV18 NV18 NV18 NV18 X SYS SYS SYS SYS SYS SYS SYS SYS SYS PIX 60-71 5 N/A
NIC0 PIX SYS SYS SYS SYS SYS SYS SYS X PIX SYS SYS SYS SYS SYS SYS SYS SYS
NIC1 PIX SYS SYS SYS SYS SYS SYS SYS PIX X SYS SYS SYS SYS SYS SYS SYS SYS
NIC2 SYS PIX SYS SYS SYS SYS SYS SYS SYS SYS X SYS SYS SYS SYS SYS SYS SYS
NIC3 SYS SYS PIX SYS SYS SYS SYS SYS SYS SYS SYS X SYS SYS SYS SYS SYS SYS
NIC4 SYS SYS SYS PIX SYS SYS SYS SYS SYS SYS SYS SYS X SYS SYS SYS SYS SYS
NIC5 SYS SYS SYS SYS PIX SYS SYS SYS SYS SYS SYS SYS SYS X PIX SYS SYS SYS
NIC6 SYS SYS SYS SYS PIX SYS SYS SYS SYS SYS SYS SYS SYS PIX X SYS SYS SYS
NIC7 SYS SYS SYS SYS SYS PIX SYS SYS SYS SYS SYS SYS SYS SYS SYS X SYS SYS
NIC8 SYS SYS SYS SYS SYS SYS PIX SYS SYS SYS SYS SYS SYS SYS SYS SYS X SYS
NIC9 SYS SYS SYS SYS SYS SYS SYS PIX SYS SYS SYS SYS SYS SYS SYS SYS SYS X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
NIC2: mlx5_2
NIC3: mlx5_3
NIC4: mlx5_4
NIC5: mlx5_5
NIC6: mlx5_6
NIC7: mlx5_7
NIC8: mlx5_8
NIC9: mlx5_9
ulimit soft: 1048576
Checklist
Describe the bug
LoRA adapters trained with HuggingFace Transformers + PEFT often use prefixed module names like
feed_forward.gate_proj, feed_forward.up_proj, etc. However, SGLang's currentSUPPORTED_LORA_TARGET_MODULESonly includes base module names (gate_proj, up_proj,etc.), causing adapter loading to fail even when using --lora-target-modules all.The issue occurs at the validation stage before SGLang's existing normalization logic can handle the weight format conversion. While SGLang already has proper normalize_gate_up_proj functionality to handle the architectural differences between training (separate gate_proj/up_proj) and inference (merged gate_up_proj), the adapter loading is blocked by the target module validation.
Proposed Solutions
Simple fix: Add common prefixed module names to SUPPORTED_LORA_TARGET_MODULES like below
Better fix: Enhance the validation logic to automatically map prefixed names to base names during validation, allowing the existing normalization to work properly (TAKING LOOK NOW)
Reproduction
Model page: https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
I can offer a sample adapter file offline, If needed.
Environment
Environment
SGLang version: Latest main branch (commit
f4e3ebe)Model: Llama4-Maverick (but affects any model with prefixed module names IIUC)
LoRA training framework: HuggingFace Transformers + PEFT
Hardware: 8 X H100
python3 -m sglang.check_envresult below