I believe this issue applies to both R1 FP4 and R1-0528 FP4.
For R1 FP4, GSM8k score is only 0.886. Not trying to reproduce official result, but it should be something around 0.95. Also Nvidia reports much higher gsm8k score with trtllm here.
python3 -m sglang.launch_server --port=7080 --model-path=nvidia/DeepSeek-R1-FP4 --trust-remote-code --tp=8 --host=0.0.0.0 --quantization=modelopt_fp4 --kv-cache-dtype=auto
python3 benchmark/gsm8k/bench_sglang.py --num-shots 8 --num-questions 1319 --parallel 1319 --port=7080
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [00:42<00:00, 31.20it/s]
Accuracy: 0.886
Invalid: 0.001
Latency: 42.603 s
Output throughput: 3734.827 token/s
python3 -m sglang.check_env
Python: 3.12.3 (main, Feb 4 2025, 14:48:35) [GCC 13.3.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA B200
GPU 0,1,2,3,4,5,6,7 Compute Capability: 10.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.9, V12.9.41
CUDA Driver Version: 570.124.06
PyTorch: 2.7.1+cu128
sglang: 0.4.7
sgl_kernel: 0.1.8
flashinfer_python: 0.2.6.post1
triton: 3.3.1
transformers: 4.52.3
torchao: 0.9.0
numpy: 2.1.2
aiohttp: 3.12.12
fastapi: 0.115.12
hf_transfer: 0.1.9
huggingface_hub: 0.33.0
interegular: 0.3.3
modelscope: 1.27.0
orjson: 3.10.18
outlines: 0.1.11
packaging: 25.0
psutil: 7.0.0
pydantic: 2.11.5
python-multipart: 0.0.20
pyzmq: 26.4.0
uvicorn: 0.34.3
uvloop: 0.21.0
vllm: Module Not Found
xgrammar: 0.1.19
openai: 1.86.0
tiktoken: 0.9.0
anthropic: Module Not Found
litellm: Module Not Found
decord: Module Not Found
NVIDIA Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV18 NV18 NV18 NV18 NV18 NV18 NV18 0-55,112-167 0 N/A
GPU1 NV18 X NV18 NV18 NV18 NV18 NV18 NV18 0-55,112-167 0 N/A
GPU2 NV18 NV18 X NV18 NV18 NV18 NV18 NV18 0-55,112-167 0 N/A
GPU3 NV18 NV18 NV18 X NV18 NV18 NV18 NV18 0-55,112-167 0 N/A
GPU4 NV18 NV18 NV18 NV18 X NV18 NV18 NV18 56-111,168-223 1 N/A
GPU5 NV18 NV18 NV18 NV18 NV18 X NV18 NV18 56-111,168-223 1 N/A
GPU6 NV18 NV18 NV18 NV18 NV18 NV18 X NV18 56-111,168-223 1 N/A
GPU7 NV18 NV18 NV18 NV18 NV18 NV18 NV18 X 56-111,168-223 1 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
Hypervisor vendor: KVM
ulimit soft: 1048576
Checklist
Describe the bug
I believe this issue applies to both R1 FP4 and R1-0528 FP4.
For R1 FP4, GSM8k score is only 0.886. Not trying to reproduce official result, but it should be something around 0.95. Also Nvidia reports much higher gsm8k score with trtllm here.
Any help is really appreciated!
Reproduction
To reproduce:
Environment