Skip to content

Re-enable unit-test-deepep-8-gpu and unit-test-backend-4-gpu-gb200#17438

Merged
Fridge003 merged 5 commits intomainfrom
ci/re-enable-disabled-jobs
Jan 23, 2026
Merged

Re-enable unit-test-deepep-8-gpu and unit-test-backend-4-gpu-gb200#17438
Fridge003 merged 5 commits intomainfrom
ci/re-enable-disabled-jobs

Conversation

@alisonshao
Copy link
Copy Markdown
Collaborator

Summary

Successful GB200 run: https://github.com/sgl-project/sglang/actions/runs/21139104876/job/60851136785

Both runners have been fixed:
- 8-GPU H200 runner: IBGDA environment issues resolved (#17175)
- 4-GPU GB200 runner: repaired and working (#17367)

Successful run: https://github.com/sgl-project/sglang/actions/runs/21139104876/job/60851136785
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@alisonshao
Copy link
Copy Markdown
Collaborator Author

/rerun-stage unit-test-backend-4-gpu-gb200

@github-actions
Copy link
Copy Markdown
Contributor

✅ Triggered unit-test-backend-4-gpu-gb200 to run independently (skipping dependencies).

@github-actions
Copy link
Copy Markdown
Contributor

🔗 View workflow run

@alisonshao
Copy link
Copy Markdown
Collaborator Author

/rerun-stage unit-test-deepep-8-gpu

@github-actions
Copy link
Copy Markdown
Contributor

✅ Triggered unit-test-deepep-8-gpu to run independently (skipping dependencies).

@github-actions
Copy link
Copy Markdown
Contributor

🔗 View workflow run

Uncomment the suite that was disabled in #17175. The IBGDA/cudaHostRegister
environment issues on the 8-GPU runner have been fixed.
@alisonshao
Copy link
Copy Markdown
Collaborator Author

/rerun-stage unit-test-deepep-8-gpu

@github-actions
Copy link
Copy Markdown
Contributor

✅ Triggered unit-test-deepep-8-gpu to run independently (skipping dependencies).

@github-actions
Copy link
Copy Markdown
Contributor

🔗 View workflow run

When call-gate fails, all stage-b jobs are skipped. Without this fix,
wait-for-stage-b would run and wait forever because it expects 23 matrix
jobs but only sees 4 skipped jobs (one per matrix).

Add call-gate to the needs and check its result to skip wait-for-stage-b
when call-gate fails.
Same issue as wait-for-stage-b: when call-gate fails, stage-a-test-1 is
skipped, but wait-for-stage-a would still run and treat skipped as success.

Add call-gate to needs and skip wait-for-stage-a when call-gate fails.
@Fridge003 Fridge003 merged commit d7dd0b8 into main Jan 23, 2026
61 of 67 checks passed
@Fridge003 Fridge003 deleted the ci/re-enable-disabled-jobs branch January 23, 2026 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants