Skip to content

Re-enable away scheduling for gang jobs#4614

Merged
JamesMurkin merged 1 commit intomasterfrom
reenable_gang_away_scheduling
Jan 21, 2026
Merged

Re-enable away scheduling for gang jobs#4614
JamesMurkin merged 1 commit intomasterfrom
reenable_gang_away_scheduling

Conversation

@JamesMurkin
Copy link
Contributor

This was originally disabled due to a bug that caused gangs to get partially scheduled

Since that change we have largely rewritten how gang jobs are modelled internally in the scheduler so we believe this bug is now fixed

  • I've added several unit tests around this around, targeting what we believe was cause of the bug (scheduled and preempted in the same round)

I have added DisableGangAwayScheduling as:

  • If the bug is still there, it allows us to disable the feature again with a config change
  • Some configurations, it wouldn't be suitable for gang jobs to be scheduled away as these are typically more volatile (inclined to be preempted) and this isn't always suitable for gangs

This was originally disabled due to a bug that caused gangs to get partially scheduled

Since that change we have largely rewritten how gang jobs are modelled internally in the scheduler so we believe this bug is now fixed
 - I've added several unit tests around this around, targeting what we believe was cause of the bug (scheduled and preempted in the same round)

I have added `DisableGangAwayScheduling` as:
 - If the bug is still there, it allows us to disable the feature again with a config change
 - Some configurations, it wouldn't be suitable for gang jobs to be scheduled away as these are typically more volatile (inclined to be preempted) and this isn't always suitable for gangs

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
@JamesMurkin JamesMurkin marked this pull request as ready for review January 21, 2026 13:15
@JamesMurkin JamesMurkin merged commit 28fc636 into master Jan 21, 2026
15 checks passed
@JamesMurkin JamesMurkin deleted the reenable_gang_away_scheduling branch January 21, 2026 14:06
Sigele pushed a commit to Sigele/armada that referenced this pull request Jan 30, 2026
This was originally disabled due to a bug that caused gangs to get
partially scheduled

Since that change we have largely rewritten how gang jobs are modelled
internally in the scheduler so we believe this bug is now fixed
- I've added several unit tests around this around, targeting what we
believe was cause of the bug (scheduled and preempted in the same round)

I have added `DisableGangAwayScheduling` as:
- If the bug is still there, it allows us to disable the feature again
with a config change
- Some configurations, it wouldn't be suitable for gang jobs to be
scheduled away as these are typically more volatile (inclined to be
preempted) and this isn't always suitable for gangs

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: Sigele Nickerson-Adams <sigele.nickerson-adams@nmc2.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants