Skip to content

Fix bug in market driven queue iteration#4701

Merged
JamesMurkin merged 8 commits intomasterfrom
fix_market_drive_queue_iteration_bug
Feb 20, 2026
Merged

Fix bug in market driven queue iteration#4701
JamesMurkin merged 8 commits intomasterfrom
fix_market_drive_queue_iteration_bug

Conversation

@JamesMurkin
Copy link
Contributor

The CandidateGangIterator assumes that iterators it was passed were ordered

  • Evicted jobs
  • Non-evicted jobs

So if it ever saw a non-evicted job, it could them just drop the whole queue when OnlyYieldEvicted was true

However in market based scheduling, that assumption does not hold true and you can get evicted and non-evicted jobs in a mixed ordering

This meant if we ever hit something that caused OnlyYieldEvicted when using market based scheduling, you could end up preempting many (or all) evicted jobs, depending on the job ordering - as the CandidateGangIterator stopped considering the queue as soon as it saw a non-evicted job

Now I've pushed that logic down through the iterator stack - as they are better placed to make the right assumptions

  • I've also removed the logic about OnlyYieldEvicted dotted all over CandidateGangIterator and it is now contained to their OnlyYieldEvicted/OnlyYieldEvictedForQueue functions

The CandidateGangIterator assumes that iterators it was passed were ordered
 - Evicted jobs
 - Non-evicted jobs

So if it ever saw a non-evicted job, it could them just drop the whole queue when `OnlyYieldEvicted` was true

However in market based scheduling, that assumption does not hold true and you can get evicted and non-evicted jobs in a mixed ordering

This meant if we ever hit something that caused `OnlyYieldEvicted` when using market based scheduling, you could end up preempting many (or all) evicted jobs, depending on the job ordering - as the CandidateGangIterator stopped considering the queue as soon as it saw a non-evicted job

Now I've pushed that logic down through the iterator stack - as they are better placed to make the right assumptions
 - I've also removed the logic about OnlyYieldEvicted dotted all over CandidateGangIterator and it is now contained to their OnlyYieldEvicted/OnlyYieldEvictedForQueue functions

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
},
PriorityFactorByQueue: map[string]float64{"A": 1, "B": 1},
},
"reschedules evicted jobs if scheduling limits hit": {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was what covered the original bug

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
@JamesMurkin JamesMurkin marked this pull request as ready for review February 20, 2026 13:22
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
jobDbTxn := jobDb.WriteTxn()

// Add all the initial jobs, creating runs for them
for nodeIdx, jobs := range tc.InitialRunningJobs {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused code

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>

type JobContextIterator interface {
Next() (*schedulercontext.JobSchedulingContext, error)
OnlyYieldEvicted()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: SetOnlyYieldEvicted(onlyYieldEvicted bool) would be nicer

@JamesMurkin JamesMurkin merged commit e5b52dc into master Feb 20, 2026
16 checks passed
@JamesMurkin JamesMurkin deleted the fix_market_drive_queue_iteration_bug branch February 20, 2026 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants