Skip to content

[CI] Update B200 est_times to prevent timeouts on slower machine#22609

Merged
hnyls2002 merged 2 commits intomainfrom
ci/update-b200-est-times
Apr 12, 2026
Merged

[CI] Update B200 est_times to prevent timeouts on slower machine#22609
hnyls2002 merged 2 commits intomainfrom
ci/update-b200-est-times

Conversation

@alisonshao
Copy link
Copy Markdown
Collaborator

@alisonshao alisonshao commented Apr 12, 2026

Summary

  • Update est_time for 10 B200 tests based on actual elapsed times + 20% buffer
  • The second B200 machine runs ~1.8x slower than the first due to hardware differences:
Metric Machine A Machine B Ratio
HBM bandwidth 5528 GB/s 3518 GB/s 1.6x slower
Host-to-Device PCIe 56.9 GB/s 55.2 GB/s ~same
Matmul TFLOPS (bf16) 1543 1499 ~same
Disk read 3667 MB/s 1301 MB/s 2.8x slower
  • Tests pass on both machines but partitions time out on Machine B because est_times were calibrated on the faster Machine A
Test Old est Machine B actual New est
test_nvfp4_gemm.py 322 459 550
test_gpt_oss_4gpu.py 312 615 740
test_fp8_blockwise_gemm.py 302 527 630
test_eagle_infer_beta_dp_attention.py 68 113 136
test_nvidia_nemotron_3_super_nvfp4.py 294 591 710
test_cutedsl_moe.py 13 491 590
test_deepseek_v3_fp4_4gpu.py 1146 1149 1380
test_deepseek_v3_fp4_mtp_small.py 416 424 510
test_flash_attention_4.py 259 276 332
test_lora_qwen3_30b...py 160 87 (no change, faster)

Example timeout: https://github.com/sgl-project/sglang/actions/runs/24288516804/job/70933367476

Test plan

  • est_time changes only, no logic changes
  • Benchmarked both machines to confirm hardware difference (HBM bandwidth, disk I/O)

The Innomatrix B200 machine runs ~1.8x slower than the Novita B200
due to running 2 concurrent CI containers sharing CPU/memory bandwidth.
Update est_time for 6 B200 tests based on actual Innomatrix elapsed
times + 20% buffer to prevent partition timeouts.

Changes (old -> new est_time):
- test_nvfp4_gemm.py: 322 -> 550 (actual: 459s)
- test_gpt_oss_4gpu.py: 312 -> 740 (actual: 615s)
- test_fp8_blockwise_gemm.py: 302 -> 630 (actual: 527s)
- test_eagle_infer_beta_dp_attention.py: 68 -> 136 (actual: 113s)
- test_nvidia_nemotron_3_super_nvfp4.py: 294 -> 710 (actual: 591s)
- test_cutedsl_moe.py: 13 -> 322 (actual: 268s)

Example timeout: https://github.com/sgl-project/sglang/actions/runs/24288516804/job/70933367476
@github-actions github-actions Bot added the blackwell SM100/SM120 label Apr 12, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the estimated execution times (est_time) for several test files within the stage-c-test-4-gpu-b200 suite, including GPT OSS, Nemotron, CuteDSL MoE, and various quantization and speculative decoding tests. I have no feedback to provide.

Additional tests found from other Inno runs:
- test_cutedsl_moe.py: 322 -> 590 (worst case: 491s)
- test_deepseek_v3_fp4_4gpu.py: 1146 -> 1380 (actual: 1149s)
- test_deepseek_v3_fp4_mtp_small.py: 416 -> 510 (actual: 424s)
- test_flash_attention_4.py: 259 -> 332 (actual: 276s)
@Kangyan-Zhou
Copy link
Copy Markdown
Collaborator

/rerun-stage stage-c-4-gpu-b200

@github-actions
Copy link
Copy Markdown
Contributor

❌ Stage stage-c-4-gpu-b200 doesn't support isolated runs yet.

NVIDIA stages:

  • stage-a-test-1-gpu-small
  • stage-a-test-cpu
  • stage-b-test-1-gpu-small
  • stage-b-test-1-gpu-large
  • stage-b-test-2-gpu-large
  • stage-b-test-4-gpu-b200
  • stage-c-test-4-gpu-h100
  • stage-c-test-8-gpu-h200
  • stage-c-test-8-gpu-h20
  • stage-c-test-4-gpu-b200
  • stage-c-test-4-gpu-gb200
  • stage-c-test-deepep-4-gpu-h100
  • stage-c-test-deepep-8-gpu-h200
  • multimodal-gen-test-1-gpu
  • multimodal-gen-test-2-gpu
  • multimodal-gen-component-accuracy-1-gpu
  • multimodal-gen-component-accuracy-2-gpu
  • multimodal-gen-test-1-b200

AMD stages:

  • sgl-kernel-unit-test-amd
  • sgl-kernel-unit-test-2-gpu-amd
  • stage-a-test-1-gpu-small-amd
  • stage-b-test-1-gpu-small-amd
  • stage-b-test-1-gpu-small-amd-nondeterministic
  • stage-b-test-1-gpu-small-amd-mi35x
  • stage-b-test-1-gpu-large-amd
  • stage-b-test-2-gpu-large-amd
  • multimodal-gen-test-1-gpu-amd
  • multimodal-gen-test-2-gpu-amd
  • stage-c-test-large-8-gpu-amd
  • stage-c-test-large-8-gpu-amd-mi35x

Other stages will be added soon. For now, use /rerun-failed-ci for those stages.

@Kangyan-Zhou
Copy link
Copy Markdown
Collaborator

/rerun-stage stage-c-test-4-gpu-b200

@github-actions
Copy link
Copy Markdown
Contributor

✅ Triggered stage-c-test-4-gpu-b200 to run independently (skipping dependencies). View workflow run

@hnyls2002 hnyls2002 changed the title [CI] Update B200 est_times to prevent Innomatrix timeouts [CI] Update B200 est_times to prevent timeouts on slower machine Apr 12, 2026
@hnyls2002 hnyls2002 merged commit d6c9d91 into main Apr 12, 2026
104 of 115 checks passed
@hnyls2002 hnyls2002 deleted the ci/update-b200-est-times branch April 12, 2026 04:40
pyc96 pushed a commit to pyc96/sglang that referenced this pull request Apr 14, 2026
…-project#22609)

Co-authored-by: Alison Shao <alison.shao@MacBook-Pro-D2W773R9CD.local>
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
…-project#22609)

Co-authored-by: Alison Shao <alison.shao@MacBook-Pro-D2W773R9CD.local>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants