Handle Azure Batch ActiveJobAndScheduleQuotaReached with retry#6874
Merged
pditommaso merged 3 commits intomasterfrom Feb 27, 2026
Merged
Handle Azure Batch ActiveJobAndScheduleQuotaReached with retry#6874pditommaso merged 3 commits intomasterfrom
pditommaso merged 3 commits intomasterfrom
Conversation
Add retry logic when Azure Batch returns HTTP 409 with ActiveJobAndScheduleQuotaReached error code during job creation, instead of failing immediately. Configurable via azure.batch.maxJobQuotaRetries (default 3) and azure.batch.jobQuotaRetryDelay (default 2 min). Generated by Claude Code Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
✅ Deploy Preview for nextflow-docs-staging ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
pditommaso
reviewed
Feb 27, 2026
plugins/nf-azure/src/main/nextflow/cloud/azure/batch/AzBatchService.groovy
Outdated
Show resolved
Hide resolved
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
pditommaso
approved these changes
Feb 27, 2026
pditommaso
added a commit
that referenced
this pull request
Mar 3, 2026
* Handle Azure Batch ActiveJobAndScheduleQuotaReached with retry (#5575) Add retry logic when Azure Batch returns HTTP 409 with ActiveJobAndScheduleQuotaReached error code during job creation, instead of failing immediately. Configurable via azure.batch.maxJobQuotaRetries (default 3) and azure.batch.jobQuotaRetryDelay (default 2 min). Generated by Claude Code Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com> * Refactor job quota retry to use Failsafe RetryPolicy Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com> --------- Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com> Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com> Co-authored-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #5575
When Azure Batch hits the active job and schedule quota (HTTP 409
ActiveJobAndScheduleQuotaReached), Nextflow now retries with configurable backoff instead of failing immediately.Changes
AzBatchService.groovy: AddedapplyCreateJob()using FailsafeRetryPolicywith a fixed delay to handleActiveJobAndScheduleQuotaReachedquota errors. This layers on top of the existingapply()method —apply()handles short-lived transient errors (408, 429, 500, etc.) whileapplyCreateJob()handles the longer-lived quota exhaustion with configurable fixed delays. ExtractedcreateJobRequest()andisJobQuotaError()as protected methods for testability.AzBatchOpts.groovy: Added two new config options:azure.batch.maxJobQuotaRetries(int, default3) — maximum number of retries when quota is reachedazure.batch.jobQuotaRetryDelay(Duration, default2 min) — delay between retriesAzBatchServiceTest.groovy: Added 6 unit tests covering: success on first try, retry then succeed, exhaust retries, zero retries, non-quota 409 errors, and non-HTTP exceptions.Behavior
When job creation fails with
ActiveJobAndScheduleQuotaReached:Azure Batch active job quota reached - waiting 2m before retry (attempt 1 of 3)IllegalStateExceptionis thrown with a helpful message suggesting config changes