feat: preserve partial scheduling progress on context timeout instead of rolling back all work by dejanzele · Pull Request #4559 · armadaproject/armada

dejanzele · 2025-12-02T13:45:33Z

What type of PR is this?

Enhancement

What this PR does / why we need it

Previously, when the scheduler hit its timeout during the scheduling cycle, it would return an error and discard all work, even jobs that were successfully scheduled before the timeout.

This change implements a two-tier timeout system for the scheduler to handle long scheduling cycles gracefully.

This change introduces a new config field in the scheduler: newJobsSchedulingTimeout:

  # scheduler.config.yaml
  scheduling:
    # Hard timeout - absolute maximum duration for a scheduling cycle.
    # When exceeded, cycle aborts with error and discards all work.
    maxSchedulingDuration: 10s

    # Soft timeout - stop scheduling new jobs after this duration.
    # Evicted jobs continue to be rescheduled until hard timeout.
    # Set to 0 to disable soft timeout behavior.
    # Must be less than maxSchedulingDuration when non-zero.
    newJobsSchedulingTimeout: 8s

Expected output

When soft timeout fires:

  INFO Soft timeout reached for pool default, switching to evicted-only mode
  INFO Looping through candidate gangs for pool default...
  INFO Scheduled 873 jobs for pool default

When hard timeout fires (unchanged behavior):

  ERROR hard timeout: context deadline exceeded

How to test

Check the section at the end called Additional Files for the test script and test Armada job.

Configure scheduler with a short timeout in _local/scheduler/config.yaml:

maxSchedulingDuration: 200ms

Configure fake executor with enough capacity in _local/fakeexecutor/config.yaml:

nodes:
  - name: "fake-node"
    count: 50
    allocatable:
      cpu: "64"
      memory: "256Gi"

Start the local environment: goreman -f _local/procfiles/fake-executor.Procfile start
Create two test queues:

armadactl create queue queue-a
armadactl create queue queue-b

Run the following commands to generate jobs:

./scripts/submit-jobs.sh -c 5000 -q queue-a -j jobset-timeout-a example/fair-share-test.yaml
./scripts/submit-jobs.sh -c 5000 -q queue-b -j jobset-timeout-b example/fair-share-test.yaml

Assert that the following logs appear in the output of the scheduler:

  INFO Timeout reached for pool default, switching to evicted-only mode
  INFO Scheduling cycle interrupted by context deadline exceeded: scheduled 873 jobs for pool default
  INFO Scheduled on executor pool default in 19.983083ms with error <nil>

Additional Files

# scripts/submit-jobs.sh

#!/bin/bash
set -e

COUNT=1
JOBSET="test-jobset"
QUEUE="test-queue"
JOB_TEMPLATE=""
MAX_PARALLEL=50

while [[ $# -gt 0 ]]; do
    case $1 in
        -c|--count) COUNT="$2"; shift 2 ;;
        -j|--jobset) JOBSET="$2"; shift 2 ;;
        -q|--queue) QUEUE="$2"; shift 2 ;;
        -p|--parallel) MAX_PARALLEL="$2"; shift 2 ;;
        -*) echo "Unknown option $1"; exit 1 ;;
        *) JOB_TEMPLATE="$1"; shift ;;
    esac
done

[[ -z "$JOB_TEMPLATE" ]] && JOB_TEMPLATE="example/fair-share-test.yaml"
[[ ! -f "$JOB_TEMPLATE" ]] && echo "Error: $JOB_TEMPLATE not found" && exit 1

ARMADACTL="./armadactl"
[[ ! -f "$ARMADACTL" ]] && ARMADACTL="armadactl"

TEMP_DIR=$(mktemp -d)
trap "rm -rf $TEMP_DIR" EXIT

JOB_FILE="$TEMP_DIR/job.yaml"
sed -e "s/^jobSetId:.*/jobSetId: $JOBSET/" -e "s/^queue:.*/queue: $QUEUE/" "$JOB_TEMPLATE" > "$JOB_FILE"

$ARMADACTL create queue "$QUEUE" 2>/dev/null || true

echo "Submitting $COUNT batches to queue '$QUEUE' jobset '$JOBSET'..."

PIDS=()
for ((i=1; i<=COUNT; i++)); do
    $ARMADACTL submit "$JOB_FILE" >/dev/null 2>&1 &
    PIDS+=($!)
    if ((${#PIDS[@]} >= MAX_PARALLEL)) || ((i == COUNT)); then
        for pid in "${PIDS[@]}"; do wait $pid; done
        PIDS=()
        echo "Progress: $i/$COUNT"
    fi
done

echo "Done. Submitted $COUNT batches to queue '$QUEUE'"

# example/fair-share-test.yaml

queue: test-queue
jobSetId: fair-share-test
jobs:
  - namespace: default
    priority: 1000
    podSpec: &podspec
      terminationGracePeriodSeconds: 0
      restartPolicy: Never
      containers:
        - name: worker
          image: busybox:latest
          command: ["sleep", "3600"]
          resources:
            limits:
              memory: 64Mi
              cpu: 50m
            requests:
              memory: 64Mi
              cpu: 50m
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec

internal/scheduler/scheduling/queue_scheduler.go

d80tb7

I'm not convinced this works in the case of premption.

… of rolling back all work Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>

We've changed approach and I'm happy this one should work

Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>

… of rolling back all work (armadaproject#4559)  #### What type of PR is this? Enhancement #### What this PR does / why we need it Previously, when the scheduler hit its timeout during the scheduling cycle, it would return an error and discard all work, even jobs that were successfully scheduled before the timeout. This change implements a two-tier timeout system for the scheduler to handle long scheduling cycles gracefully. This change introduces a new config field in the scheduler: `newJobsSchedulingTimeout`: ``` # scheduler.config.yaml scheduling: # Hard timeout - absolute maximum duration for a scheduling cycle. # When exceeded, cycle aborts with error and discards all work. maxSchedulingDuration: 10s # Soft timeout - stop scheduling new jobs after this duration. # Evicted jobs continue to be rescheduled until hard timeout. # Set to 0 to disable soft timeout behavior. # Must be less than maxSchedulingDuration when non-zero. newJobsSchedulingTimeout: 8s ``` #### Expected output When soft timeout fires: ``` INFO Soft timeout reached for pool default, switching to evicted-only mode INFO Looping through candidate gangs for pool default... INFO Scheduled 873 jobs for pool default ``` When hard timeout fires (unchanged behavior): ``` ERROR hard timeout: context deadline exceeded ``` #### How to test Check the section at the end called **Additional Files** for the test script and test Armada job. 1. Configure scheduler with a short timeout in `_local/scheduler/config.yaml`: ``` maxSchedulingDuration: 200ms ``` 2. Configure fake executor with enough capacity in `_local/fakeexecutor/config.yaml`: ``` nodes: - name: "fake-node" count: 50 allocatable: cpu: "64" memory: "256Gi" ``` 3. Start the local environment: `goreman -f _local/procfiles/fake-executor.Procfile start` 4. Create two test queues: ``` armadactl create queue queue-a armadactl create queue queue-b ``` 5. Run the following commands to generate jobs: ``` ./scripts/submit-jobs.sh -c 5000 -q queue-a -j jobset-timeout-a example/fair-share-test.yaml ./scripts/submit-jobs.sh -c 5000 -q queue-b -j jobset-timeout-b example/fair-share-test.yaml ``` 6. Assert that the following logs appear in the output of the scheduler: ``` INFO Timeout reached for pool default, switching to evicted-only mode INFO Scheduling cycle interrupted by context deadline exceeded: scheduled 873 jobs for pool default INFO Scheduled on executor pool default in 19.983083ms with error <nil> ``` ##### Additional Files ``` # scripts/submit-jobs.sh #!/bin/bash set -e COUNT=1 JOBSET="test-jobset" QUEUE="test-queue" JOB_TEMPLATE="" MAX_PARALLEL=50 while [[ $# -gt 0 ]]; do case $1 in -c|--count) COUNT="$2"; shift 2 ;; -j|--jobset) JOBSET="$2"; shift 2 ;; -q|--queue) QUEUE="$2"; shift 2 ;; -p|--parallel) MAX_PARALLEL="$2"; shift 2 ;; -*) echo "Unknown option $1"; exit 1 ;; *) JOB_TEMPLATE="$1"; shift ;; esac done [[ -z "$JOB_TEMPLATE" ]] && JOB_TEMPLATE="example/fair-share-test.yaml" [[ ! -f "$JOB_TEMPLATE" ]] && echo "Error: $JOB_TEMPLATE not found" && exit 1 ARMADACTL="./armadactl" [[ ! -f "$ARMADACTL" ]] && ARMADACTL="armadactl" TEMP_DIR=$(mktemp -d) trap "rm -rf $TEMP_DIR" EXIT JOB_FILE="$TEMP_DIR/job.yaml" sed -e "s/^jobSetId:.*/jobSetId: $JOBSET/" -e "s/^queue:.*/queue: $QUEUE/" "$JOB_TEMPLATE" > "$JOB_FILE" $ARMADACTL create queue "$QUEUE" 2>/dev/null || true echo "Submitting $COUNT batches to queue '$QUEUE' jobset '$JOBSET'..." PIDS=() for ((i=1; i<=COUNT; i++)); do $ARMADACTL submit "$JOB_FILE" >/dev/null 2>&1 & PIDS+=($!) if ((${#PIDS[@]} >= MAX_PARALLEL)) || ((i == COUNT)); then for pid in "${PIDS[@]}"; do wait $pid; done PIDS=() echo "Progress: $i/$COUNT" fi done echo "Done. Submitted $COUNT batches to queue '$QUEUE'" ``` ``` # example/fair-share-test.yaml queue: test-queue jobSetId: fair-share-test jobs: - namespace: default priority: 1000 podSpec: &podspec terminationGracePeriodSeconds: 0 restartPolicy: Never containers: - name: worker image: busybox:latest command: ["sleep", "3600"] resources: limits: memory: 64Mi cpu: 50m requests: memory: 64Mi cpu: 50m - namespace: default priority: 1000 podSpec: *podspec - namespace: default priority: 1000 podSpec: *podspec - namespace: default priority: 1000 podSpec: *podspec - namespace: default priority: 1000 podSpec: *podspec - namespace: default priority: 1000 podSpec: *podspec - namespace: default priority: 1000 podSpec: *podspec - namespace: default priority: 1000 podSpec: *podspec - namespace: default priority: 1000 podSpec: *podspec - namespace: default priority: 1000 podSpec: *podspec - namespace: default priority: 1000 podSpec: *podspec - namespace: default priority: 1000 podSpec: *podspec - namespace: default priority: 1000 podSpec: *podspec - namespace: default priority: 1000 podSpec: *podspec - namespace: default priority: 1000 podSpec: *podspec - namespace: default priority: 1000 podSpec: *podspec - namespace: default priority: 1000 podSpec: *podspec - namespace: default priority: 1000 podSpec: *podspec - namespace: default priority: 1000 podSpec: *podspec - namespace: default priority: 1000 podSpec: *podspec ``` Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>

nikola-jokic previously approved these changes Dec 2, 2025

View reviewed changes

d80tb7 reviewed Dec 3, 2025

View reviewed changes

internal/scheduler/scheduling/queue_scheduler.go Outdated Show resolved Hide resolved

d80tb7 previously requested changes Dec 3, 2025

View reviewed changes

dejanzele dismissed nikola-jokic’s stale review via 905be1e December 12, 2025 17:04

dejanzele force-pushed the feat/scheduler-graceful-shutdown branch 9 times, most recently from 27cd367 to fa91955 Compare December 18, 2025 00:04

dejanzele force-pushed the feat/scheduler-graceful-shutdown branch 8 times, most recently from 9eaf27f to 800f2c1 Compare February 5, 2026 12:46

feat: preserve partial scheduling progress on context timeout instead…

196b8d3

… of rolling back all work Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>

dejanzele force-pushed the feat/scheduler-graceful-shutdown branch from 800f2c1 to 196b8d3 Compare February 5, 2026 12:47

JamesMurkin previously approved these changes Feb 5, 2026

View reviewed changes

dejanzele enabled auto-merge (squash) February 6, 2026 12:29

dejanzele added 3 commits February 6, 2026 13:29

Merge branch 'master' into feat/scheduler-graceful-shutdown

9b42d73

Merge branch 'master' into feat/scheduler-graceful-shutdown

2e71021

fix an issue with a missing param in scheduling_algo_test.go

95fb668

Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>

dejanzele dismissed JamesMurkin’s stale review via 95fb668 February 6, 2026 13:46

dejanzele added 2 commits February 6, 2026 14:53

Merge branch 'master' into feat/scheduler-graceful-shutdown

7a0d56f

Merge branch 'master' into feat/scheduler-graceful-shutdown

535bb06

JamesMurkin approved these changes Feb 6, 2026

View reviewed changes

dejanzele merged commit e694299 into armadaproject:master Feb 6, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: preserve partial scheduling progress on context timeout instead of rolling back all work#4559

feat: preserve partial scheduling progress on context timeout instead of rolling back all work#4559
dejanzele merged 6 commits intoarmadaproject:masterfrom
dejanzele:feat/scheduler-graceful-shutdown

dejanzele commented Dec 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

d80tb7 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dejanzele commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it

Expected output

How to test

Additional Files

Uh oh!

Uh oh!

d80tb7 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dejanzele commented Dec 2, 2025 •

edited

Loading