Skip to content

Handle Azure Batch ActiveJobAndScheduleQuotaReached with retry#6874

Merged
pditommaso merged 3 commits intomasterfrom
azure-batch-handle-job-quota
Feb 27, 2026
Merged

Handle Azure Batch ActiveJobAndScheduleQuotaReached with retry#6874
pditommaso merged 3 commits intomasterfrom
azure-batch-handle-job-quota

Conversation

@adamrtalbot
Copy link
Collaborator

@adamrtalbot adamrtalbot commented Feb 27, 2026

Summary

Fixes #5575

When Azure Batch hits the active job and schedule quota (HTTP 409 ActiveJobAndScheduleQuotaReached), Nextflow now retries with configurable backoff instead of failing immediately.

Changes

  • AzBatchService.groovy: Added applyCreateJob() using Failsafe RetryPolicy with a fixed delay to handle ActiveJobAndScheduleQuotaReached quota errors. This layers on top of the existing apply() method — apply() handles short-lived transient errors (408, 429, 500, etc.) while applyCreateJob() handles the longer-lived quota exhaustion with configurable fixed delays. Extracted createJobRequest() and isJobQuotaError() as protected methods for testability.
  • AzBatchOpts.groovy: Added two new config options:
    • azure.batch.maxJobQuotaRetries (int, default 3) — maximum number of retries when quota is reached
    • azure.batch.jobQuotaRetryDelay (Duration, default 2 min) — delay between retries
  • AzBatchServiceTest.groovy: Added 6 unit tests covering: success on first try, retry then succeed, exhaust retries, zero retries, non-quota 409 errors, and non-HTTP exceptions.

Behavior

When job creation fails with ActiveJobAndScheduleQuotaReached:

  1. A warning is logged: Azure Batch active job quota reached - waiting 2m before retry (attempt 1 of 3)
  2. Failsafe waits for the configured delay
  3. Job creation is retried
  4. If retries are exhausted, an IllegalStateException is thrown with a helpful message suggesting config changes

Add retry logic when Azure Batch returns HTTP 409 with
ActiveJobAndScheduleQuotaReached error code during job creation,
instead of failing immediately. Configurable via
azure.batch.maxJobQuotaRetries (default 3) and
azure.batch.jobQuotaRetryDelay (default 2 min).

Generated by Claude Code

Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
@netlify
Copy link

netlify bot commented Feb 27, 2026

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit d712f72
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/69a219c92f26da000851af96
😎 Deploy Preview https://deploy-preview-6874--nextflow-docs-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

pditommaso and others added 2 commits February 27, 2026 21:37
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@pditommaso pditommaso merged commit 6e66aaa into master Feb 27, 2026
24 checks passed
@pditommaso pditommaso deleted the azure-batch-handle-job-quota branch February 27, 2026 22:39
pditommaso added a commit that referenced this pull request Mar 3, 2026
* Handle Azure Batch ActiveJobAndScheduleQuotaReached with retry (#5575)

Add retry logic when Azure Batch returns HTTP 409 with
ActiveJobAndScheduleQuotaReached error code during job creation,
instead of failing immediately. Configurable via
azure.batch.maxJobQuotaRetries (default 3) and
azure.batch.jobQuotaRetryDelay (default 2 min).

Generated by Claude Code

Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>

* Refactor job quota retry to use Failsafe RetryPolicy

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>

---------

Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Co-authored-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wait for Azure Batch job quota to clear before submitting a new job

3 participants