-
Notifications
You must be signed in to change notification settings - Fork 70
Description
Summary
I am attempting to reproduce the InferenceMAX GPT-OSS-120B benchmarks on RunPod B200 but my vLLM results show a significant performance gap compared to SemiAnalysis benchmarks. I need clarification on the environment setup and configuration used.
My Environment
| Component | Version |
|---|---|
| GPU | NVIDIA B200 (183GB VRAM, SM100) |
| Driver | 570.195.03 |
| CUDA | 12.8.93 |
| Platform | RunPod |
| vLLM | 0.13.0 |
Performance Gap
Comparing at similar throughput levels shows a large latency gap:
| Source | Output Throughput | E2E Latency | Concurrency |
|---|---|---|---|
| SemiAnalysis vLLM | ~4,666 tok/s | ~10s | C=128 |
| Our vLLM | ~3,663 tok/s | ~27s | C=100 |
| Our vLLM | ~5,051 tok/s | ~40s | C=200 |
At comparable latency (~10s), SemiAnalysis achieves ~4,666 tok/s while our setup would be around ~2,000 tok/s - roughly 2x performance gap.
Our Full Results
| Concurrency | Output Throughput | E2E Latency |
|---|---|---|
| C=1 | 215 tok/s | 4.6s |
| C=20 | 1,370 tok/s | 14.6s |
| C=50 | 2,427 tok/s | 20.6s |
| C=100 | 3,663 tok/s | 27.3s |
| C=200 | 5,051 tok/s | 39.6s |
| C=300 | 5,725 tok/s | 52.4s |
Questions
-
What driver version was used? RunPod B200 has driver 570.x (CUDA 12.8). Is driver 575+ (CUDA 13) required for optimal performance?
-
What cloud platform was used? Different platforms may have different driver/software stacks.
-
Is Docker required? The benchmark scripts reference Docker containers.
References
- vLLM benchmark script: https://github.com/InferenceMAX/InferenceMAX/blob/main/benchmarks/gptoss_fp4_b200_docker.sh
- TRT-LLM benchmark script: https://github.com/InferenceMAX/InferenceMAX/blob/main/benchmarks/gptoss_fp4_b200_trt_docker.sh
- NVIDIA deployment guide: https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/blogs/tech_blog/blog9_Deploying_GPT_OSS_on_TRTLLM.md
Any guidance on configuration or environment requirements would be appreciated. Thank you!
Metadata
Metadata
Assignees
Labels
Type
Projects
Status