fix: stop workflows retrying forever when no retries are configured#16450
Conversation
| }) | ||
| }, | ||
| } | ||
| let workflowConfig: undefined | WorkflowConfig = undefined |
There was a problem hiding this comment.
Previously the missing-slug check only triggered when a workflowSlug didn't match, and even then it returned error without updating the job - leaving it stuck in processing: true forever.
We now also check for missing taskSlugs and properly fail the job with hasError: true.
The fixed workflow-retry behavior would already catch the missing-task case (the inline workflow throws a TypeError and runs through handleWorkflowError), but checking up front lets us log a clear "Task 'foo' is not registered" error instead of an opaque TypeError, as well as match the behavior of when a workflow slug is not found.
📦 esbuild Bundle Analysis for payloadThis analysis was generated by esbuild-bundle-analyzer. 🤖
Largest pathsThese visualization shows top 20 largest paths in the bundle.Meta file: packages/next/meta_index.json, Out file: esbuild/index.js
Meta file: packages/payload/meta_index.json, Out file: esbuild/index.js
Meta file: packages/payload/meta_shared.json, Out file: esbuild/exports/shared.js
Meta file: packages/richtext-lexical/meta_client.json, Out file: esbuild/exports/client_optimized/index.js
Meta file: packages/ui/meta_client.json, Out file: esbuild/exports/client_optimized/index.js
Meta file: packages/ui/meta_shared.json, Out file: esbuild/exports/shared_optimized/index.js
DetailsNext to the size is how much the size has increased or decreased compared with the base branch of this PR.
|
DanRibbens
left a comment
There was a problem hiding this comment.
Just one nit-pick, going to approve so consider the feedback as optional
| const { hasFinalError, maxWorkflowRetries, waitUntil } = hasNoRetriesConfigured | ||
| ? { hasFinalError: true, maxWorkflowRetries: undefined, waitUntil: undefined } | ||
| : getWorkflowRetryBehavior({ |
There was a problem hiding this comment.
Would be cleaner to read if you change this by passing workflowConfig to getWorkflowRetryBehavior eliminate the ternary and hasNoRetriesConfigured from this file.
Jobs with a workflow that had no
retriesconfigured would retry forever with no backoff if the workflow handler threw an error. We hit this when a task slug was removed from config in a deploy after the job was queued - the inline single-task workflow would throw aTypeErrorcalling the missing task.Why the bug existed: when a task throws,
handleTaskErrorruns first and uses the task's ownretriesvalue to cap retries, so a workflow with noretriesis fine - the task's retry limit is what stops the loop. But when the workflow handler itself throws, the error skipshandleTaskErrorand goes straight tohandleWorkflowError. The existing default there for "no workflow retries set" was "retry indefinitely, the task's limit will stop it" - except no task was ever involved, so nothing stopped the loop.handleWorkflowErrornow permanently fails the job when the workflow has no retries set. The task case is unchanged on purpose: a workflow without retries that calls a task withretries: 3still retries the task up to 3 times like before.There's also a smaller fix for jobs whose
workflowSlugis no longer in config - those previously returned early without callingupdateJob, leaving them stuck inprocessing: trueindefinitely.