Skip to content

ci: increase Slurm time limits by +5 min for 5 nightly tests#2769

Merged
terrykong merged 4 commits into
mainfrom
increase-slurm-time-limits
Jun 11, 2026
Merged

ci: increase Slurm time limits by +5 min for 5 nightly tests#2769
terrykong merged 4 commits into
mainfrom
increase-slurm-time-limits

Conversation

@kajalj22

@kajalj22 kajalj22 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds 5 minutes to `NUM_MINUTES` for 5 nightly tests on cw_dfw
  • `grpo-nanov3-30BA3B-2n8g-fsdp2`: 45 → 50 min
  • `grpo-nanov3-30BA3B-2n8g-megatron-pack-cp-tq_simple`, `grpo-qwen2.5-32b-32n8g-fsdp2tp8-actckpt.v3`, `sft-nanov3-30BA3B-2n8g-fsdp2-lora`, `sft-qwen2.5-32b-4n8g-fsdp2tp8sp-actckpt.v3`: 30 → 35 min

🤖 Generated with Claude Code

Bumps NUM_MINUTES from 45→50 and 30→33 for tests that have been
timing out on cw_dfw H100 nodes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Kajal Jain <kajalj@nvidia.com>
@kajalj22 kajalj22 requested a review from a team as a code owner June 10, 2026 21:38
@copy-pr-bot

copy-pr-bot Bot commented Jun 10, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Use flat +5 min increase instead of 10%: 30→35 for the four 30-min tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Kajal Jain <kajalj@nvidia.com>
@kajalj22 kajalj22 changed the title ci: increase Slurm time limits by 10% for 5 nightly tests ci: increase Slurm time limits by +5 min for 5 nightly tests Jun 10, 2026
@kajalj22 kajalj22 added the CI:docs Run doctest label Jun 10, 2026
@kajalj22

Copy link
Copy Markdown
Contributor Author

/ok to test 4b19a89

@kajalj22

Copy link
Copy Markdown
Contributor Author

/ok to test 2c9cbd6

@terrykong terrykong enabled auto-merge (squash) June 10, 2026 21:56
@kajalj22

Copy link
Copy Markdown
Contributor Author

/ok to test fe2a945

@terrykong terrykong merged commit f8478e8 into main Jun 11, 2026
43 checks passed
@terrykong terrykong deleted the increase-slurm-time-limits branch June 11, 2026 02:13
pengdurice pushed a commit to pengdurice/RL that referenced this pull request Jun 12, 2026
…NeMo#2769)

Signed-off-by: Kajal Jain <kajalj@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:docs Run doctest

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants