Skip to content

Add preempt and cancel jobs on node for specified executor#4612

Merged
nikola-jokic merged 6 commits intomasterfrom
nikola-jokic/node-preemption
Jan 29, 2026
Merged

Add preempt and cancel jobs on node for specified executor#4612
nikola-jokic merged 6 commits intomasterfrom
nikola-jokic/node-preemption

Conversation

@nikola-jokic
Copy link
Contributor

What type of PR is this?

Enhancement

What this PR does / why we need it

Jobs may need to be preempted or canceled on a node. The server and armadactl should support these use cases natively.

@nikola-jokic nikola-jokic force-pushed the nikola-jokic/node-preemption branch 3 times, most recently from b5a798b to b084b17 Compare January 19, 2026 17:33
@nikola-jokic nikola-jokic changed the title Nikola jokic/node preemption Add preempt and cancel jobs on node for specified executor Jan 19, 2026
@nikola-jokic nikola-jokic force-pushed the nikola-jokic/node-preemption branch from 39147de to eda1819 Compare January 19, 2026 18:40
@@ -0,0 +1,117 @@
package node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally there would be some tests on this - as its our API

PriorityClasses []string
}

type PreemptOnNode struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we double check there are no sensible tests we can add for these

Signed-off-by: Nikola Jokic <jokicnikola07@gmail.com>
@nikola-jokic nikola-jokic force-pushed the nikola-jokic/node-preemption branch from e24f0af to e524318 Compare January 22, 2026 14:42
WHERE jr.node = @node
AND jr.executor = @executor
AND j.queue = ANY(@queues::text[])
AND jr.succeeded = false AND jr.failed = false AND jr.cancelled = false AND jr.preempted = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of AND jr.succeeded = false AND jr.failed = false AND jr.cancelled = false AND jr.preempted = false we could add a terminated field which would be autogenerated from succeeded OR failed OR cancelled OR preempted. That way, an index will need to update only for 1 additional field instead of 4 additional fields and produce less index bloat.
Here is an example how we implemented it for the jobs table - https://github.com/armadaproject/armada/blob/master/internal/scheduler/database/migrations/027_add_terminated_column.sql

cc @masipauskas

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JamesMurkin If we go with that, should we update other queries?
It makes sense to me to go with these changes, and the next PR would address jr.queue and termination.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I require this new field for some other work so I'll create a PR which will add it in a migration

ON jr.job_id = j.job_id
WHERE jr.node = @node
AND jr.executor = @executor
AND j.queue = ANY(@queues::text[])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can you use jr.queue = ANY(@queues::text[]) so the filter can be done before the join, as that will speed up the query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update the SelectJobsByExecutorAndQueues as well then? I based this query on that one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say just update the one you added, I can prepare a PR with the migration and for the other queries

Signed-off-by: Nikola Jokic <jokicnikola07@gmail.com>
@nikola-jokic nikola-jokic enabled auto-merge (squash) January 29, 2026 13:15
@nikola-jokic nikola-jokic merged commit ef1e327 into master Jan 29, 2026
26 checks passed
@nikola-jokic nikola-jokic deleted the nikola-jokic/node-preemption branch January 29, 2026 13:26
nikola-jokic added a commit that referenced this pull request Jan 29, 2026
<!-- Thanks for sending a pull request! Here are some tips for you: -->

#### What type of PR is this?

Enhancement

#### What this PR does / why we need it

jr.queue filtering should be faster:
#4612 (comment)

Signed-off-by: Nikola Jokic <jokicnikola07@gmail.com>
Sigele pushed a commit to Sigele/armada that referenced this pull request Jan 30, 2026
…ject#4612)

<!-- Thanks for sending a pull request! Here are some tips for you: -->

#### What type of PR is this?

Enhancement

#### What this PR does / why we need it

Jobs may need to be preempted or canceled on a node. The server and
`armadactl` should support these use cases natively.

---------

Signed-off-by: Nikola Jokic <jokicnikola07@gmail.com>
Signed-off-by: Sigele Nickerson-Adams <sigele.nickerson-adams@nmc2.ai>
Sigele pushed a commit to Sigele/armada that referenced this pull request Jan 30, 2026
<!-- Thanks for sending a pull request! Here are some tips for you: -->

#### What type of PR is this?

Enhancement

#### What this PR does / why we need it

jr.queue filtering should be faster:
armadaproject#4612 (comment)

Signed-off-by: Nikola Jokic <jokicnikola07@gmail.com>
Signed-off-by: Sigele Nickerson-Adams <sigele.nickerson-adams@nmc2.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants