Skip to content

[Bug] incorrect inference result when using tensor parallel at mi250 #7641

@whitememory

Description

@whitememory

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

after applying fix aiter failure at gfx90a to docker "lmsysorg/sglang:v0.4.7-rocm630", single GPU inference of sglang works. However, when using --tp-size option the inference result is incorrect.

Tested using llama3 8b, 70b, llama2 7b at mi250 single node(8 GPU).

This does not reproduce at mi300.

Reproduction

Reproduction

  • docker pull lmsysorg/sglang:v0.4.7-rocm630
  • fix fp8.py code as suggested in this PRfix aiter failure at gfx90a in docker
  • reinstall hipblaslt since the docker has gfx942 version only (apt remove hipblaslt; apt install hipblaslt)
  • reinstall any packages removed along with hipblaslt
  • (SERVER) python3 -m sglang.launch_server --attention-backend triton --sampling-backend pytorch --model-path /model/llama3_8b --host 0.0.0.0 --port 30000 --tp-size 8
  • (CLIENT test code)
import requests
from sglang.utils import print_highlight
port=30000
response = requests.post(
            f"http://localhost:{port}/generate",
                json={
                    "text": "The capital of France is",
                    "sampling_params": {
                        "temperature": 0,
                        "max_new_tokens": 32,
                        },
                    },
                )
print_highlight(response.json())

SAMPLE RESULT

# python3 -m sglang.launch_server --attention-backend triton --sampling-backend pytorch --model-path /model/llama3_8b --tp-size 8 --host 0.0.0.0 --port 30000

# python3 -m test_req.py
{'text': 'zemควควควemouthemouthemouthemouthemouthemouthemouthemouthemouthemouth442442442442ets759unganungan(___(___羊laceongyangongyangongyangongyang drill drill', 'meta_info': {'id': '548ae1102ed44f0a89a5dfb915ed4f40', 'finish_reason': {'type': 'length', 'length': 32}, 'prompt_tokens': 6, 'completion_tokens': 32, 'cached_tokens': 0, 'e2e_latency': 0.6615102291107178}}

Environment

root@mi250:/sgl-workspace# python3 -m sglang.check_env
Python: 3.12.8 (main, Dec 4 2024, 08:54:12) [GCC 11.4.0]
ROCM available: True
GPU 0,1,2,3,4,5,6,7: AMD Instinct MI250X/MI250
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
ROCM_HOME: /opt/rocm
HIPCC: HIP version: 6.3.42131-fa1d09cbd
ROCM Driver Version: 6.8.5
PyTorch: 2.6.0a0+git8d4926e
sglang: 0.4.7
sgl_kernel: 0.1.7
flashinfer_python: Module Not Found
triton: 3.2.0+gitcddf0fc3
transformers: 4.52.3
torchao: 0.9.0
numpy: 1.26.4
aiohttp: 3.11.11
fastapi: 0.115.6
hf_transfer: 0.1.9
huggingface_hub: 0.32.4
interegular: 0.3.3
modelscope: 1.26.0
orjson: 3.10.18
outlines: 0.1.11
packaging: 24.2
psutil: 6.1.1
pydantic: 2.10.5
python-multipart: 0.0.20
pyzmq: 26.2.0
uvicorn: 0.34.0
uvloop: 0.21.0
vllm: 0.6.7.dev2+g113274a0.rocm630
xgrammar: 0.1.19
openai: 1.85.0
tiktoken: 0.7.0
anthropic: 0.53.0
litellm: 1.72.2
decord: 0.6.0
AMD Topology:

============================ ROCm System Management Interface ============================
=============================== Link Type between two GPUs ===============================
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7
GPU0 0 XGMI XGMI XGMI XGMI XGMI XGMI XGMI
GPU1 XGMI 0 XGMI XGMI XGMI XGMI XGMI XGMI
GPU2 XGMI XGMI 0 XGMI XGMI XGMI XGMI XGMI
GPU3 XGMI XGMI XGMI 0 XGMI XGMI XGMI XGMI
GPU4 XGMI XGMI XGMI XGMI 0 XGMI XGMI XGMI
GPU5 XGMI XGMI XGMI XGMI XGMI 0 XGMI XGMI
GPU6 XGMI XGMI XGMI XGMI XGMI XGMI 0 XGMI
GPU7 XGMI XGMI XGMI XGMI XGMI XGMI XGMI 0
================================== End of ROCm SMI Log ===================================

ulimit soft: 1048576

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions