Skip to content

Introduce per queue time limit for scheduling new jobs#4710

Merged
JamesMurkin merged 18 commits intomasterfrom
queue_max_scheduling_time_2
Feb 27, 2026
Merged

Introduce per queue time limit for scheduling new jobs#4710
JamesMurkin merged 18 commits intomasterfrom
queue_max_scheduling_time_2

Conversation

@JamesMurkin
Copy link
Contributor

We have an existing configuration NewJobsSchedulingTimeout which limits how long the scheduler will allow for new job scheduling before it stops allowing new jobs to be scheduled that round

That works well - however what often happens is when the scheduler is slow, it is slow for a single queue

This PR therefore adds the same limit but per queue:

  • This limits impact to a single queue
  • This means the config can be set to a more aggressive value, as it only impacts a single queue when hit

This PR also:

  • Updates how NewJobsSchedulingTimeout works to act like other constraints (adding logic to constraints.go)
  • Updates where scheduling duration config lives
    • Move MaxSchedulingDuration -> Scheduling.MaxSchedulingDuration
    • Move NewJobsSchedulingTimeout -> Scheduling.MaxNewJobSchedulingDuration
    • The motivation here is:
      • More consistent naming
      • Scheduling config moved to scheduling section of config
    • For now this will be backwards compatible with warning

The CandidateGangIterator assumes that iterators it was passed were ordered
 - Evicted jobs
 - Non-evicted jobs

So if it ever saw a non-evicted job, it could them just drop the whole queue when `OnlyYieldEvicted` was true

However in market based scheduling, that assumption does not hold true and you can get evicted and non-evicted jobs in a mixed ordering

This meant if we ever hit something that caused `OnlyYieldEvicted` when using market based scheduling, you could end up preempting many (or all) evicted jobs, depending on the job ordering - as the CandidateGangIterator stopped considering the queue as soon as it saw a non-evicted job

Now I've pushed that logic down through the iterator stack - as they are better placed to make the right assumptions
 - I've also removed the logic about OnlyYieldEvicted dotted all over CandidateGangIterator and it is now contained to their OnlyYieldEvicted/OnlyYieldEvictedForQueue functions

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
We have an existing configuration `NewJobsSchedulingTimeout` which limits how long the scheduler will allow for new job scheduling before it stops allowing new jobs to be scheduled that round

That works well - however what often happens is when the scheduler is slow, it is slow for a single queue

This PR therefore adds the same limit but per queue:
 - This limits impact to a single queue
 - This means the config can be set to a more aggressive value, as it only impacts a single queue when hit

This PR also:
 - Updates how `NewJobsSchedulingTimeout` works to act like other constraints (adding logic to constraints.go)
 - Updates where scheduling duration config lives
   - Move MaxSchedulingDuration -> Scheduling.MaxSchedulingDuration
   - Move NewJobsSchedulingTimeout -> Scheduling.MaxNewJobSchedulingDuration
   - The motivation here is:
     - More consistent naming
     - Scheduling config moved to scheduling section of config
   - For now this will be backwards compatible with warning

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>

# Conflicts:
#	internal/scheduler/scheduling/queue_scheduler.go
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
@JamesMurkin JamesMurkin marked this pull request as ready for review February 23, 2026 14:25
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
c := sl.Current().Interface().(SchedulingConfig)

// Validate scheduling timeouts
if c.MaxNewJobSchedulingDuration > 0 && c.MaxNewJobSchedulingDuration >= c.MaxSchedulingDuration {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
@JamesMurkin JamesMurkin enabled auto-merge (squash) February 27, 2026 13:17
@JamesMurkin JamesMurkin merged commit 476c8cc into master Feb 27, 2026
15 checks passed
@JamesMurkin JamesMurkin deleted the queue_max_scheduling_time_2 branch February 27, 2026 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants