[One workflow] Workflow execution error trigger (workflows.failed)#257633
Conversation
...form/plugins/shared/workflows_execution_engine/server/execution_functions/resume_workflow.ts
Outdated
Show resolved
Hide resolved
…ce-implementation
There was a problem hiding this comment.
Pull request overview
Implements a new event-driven trigger (workflows.executionFailed) emitted when a workflow run reaches a failed terminal state, enabling subscriber workflows (notifications/cleanup/retries) to react with the failure event context.
Changes:
- Adds common/server/public trigger definitions + schema for
workflows.executionFailedand registers them inworkflows_extensions. - Emits
workflows.executionFailedfrom the execution engine on failed runs (run + resume paths) with step failure context captured at failure time. - Adds unit + Scout API tests and updates trigger-definition approval fixtures.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/platform/plugins/shared/workflows_management/test/scout_workflows_ui/api/tests/workflow_execution/workflow_error_trigger.spec.ts | Adds Scout test validating a subscriber workflow runs on execution failure. |
| src/platform/plugins/shared/workflows_extensions/test/scout/api/tests/trigger_definitions_approval.spec.ts | Updates tags configuration for trigger definitions approval test coverage. |
| src/platform/plugins/shared/workflows_extensions/test/scout/api/fixtures/approved_trigger_definitions.ts | Approves the new trigger id + schema hash. |
| src/platform/plugins/shared/workflows_extensions/server/triggers/workflow_execution_failed.ts | Introduces server trigger definition wrapper/export for the new trigger. |
| src/platform/plugins/shared/workflows_extensions/server/triggers/index.ts | Registers internal trigger definitions in the server plugin. |
| src/platform/plugins/shared/workflows_extensions/server/plugin.ts | Calls internal trigger registration during setup. |
| src/platform/plugins/shared/workflows_extensions/server/index.ts | Exposes trigger id/type from the server package. |
| src/platform/plugins/shared/workflows_extensions/public/triggers/workflow_execution_failed.ts | Adds UI-facing trigger metadata, docs, examples, and snippet. |
| src/platform/plugins/shared/workflows_extensions/public/triggers/index.ts | Registers public trigger definition. |
| src/platform/plugins/shared/workflows_extensions/public/plugin.ts | Calls public trigger registration during setup. |
| src/platform/plugins/shared/workflows_extensions/common/triggers/workflow_execution_failed.ts | Defines trigger id + Zod schema + common trigger definition. |
| src/platform/plugins/shared/workflows_extensions/common/triggers/index.ts | Re-exports common trigger artifacts. |
| src/platform/plugins/shared/workflows_extensions/common/index.ts | Exposes trigger artifacts from common entrypoint. |
| src/platform/plugins/shared/workflows_execution_engine/server/workflow_context_manager/workflow_execution_state.ts | Adds in-memory failed-step context storage for event payload building. |
| src/platform/plugins/shared/workflows_execution_engine/server/workflow_context_manager/step_execution_runtime.ts | Captures failed step context in failStep() for later emission. |
| src/platform/plugins/shared/workflows_execution_engine/server/lib/build_workflow_execution_failed_payload.ts | Adds payload builder for the emitted event. |
| src/platform/plugins/shared/workflows_execution_engine/server/lib/build_workflow_execution_failed_payload.test.ts | Adds unit tests for the payload builder. |
| src/platform/plugins/shared/workflows_execution_engine/server/execution_functions/run_workflow.ts | Emits the event in a finally block when status is FAILED. |
| src/platform/plugins/shared/workflows_execution_engine/server/execution_functions/run_workflow.test.ts | Adds unit tests asserting emission behavior in run path. |
| src/platform/plugins/shared/workflows_execution_engine/server/execution_functions/resume_workflow.ts | Emits the event in a finally block when a resumed execution fails. |
| src/platform/plugins/shared/workflows_execution_engine/server/execution_functions/resume_workflow.test.ts | Adds unit tests asserting emission behavior in resume path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...gins/shared/workflows_execution_engine/server/lib/build_workflow_execution_failed_payload.ts
Outdated
Show resolved
Hide resolved
...shared/workflows_execution_engine/server/lib/build_workflow_execution_failed_payload.test.ts
Outdated
Show resolved
Hide resolved
...shared/workflows_execution_engine/server/lib/build_workflow_execution_failed_payload.test.ts
Outdated
Show resolved
Hide resolved
...gins/shared/workflows_execution_engine/server/lib/build_workflow_execution_failed_payload.ts
Outdated
Show resolved
Hide resolved
.../shared/workflows_execution_engine/server/workflow_context_manager/step_execution_runtime.ts
Outdated
Show resolved
Hide resolved
...latform/plugins/shared/workflows_execution_engine/server/execution_functions/run_workflow.ts
Outdated
Show resolved
Hide resolved
...latform/plugins/shared/workflows_execution_engine/server/execution_functions/run_workflow.ts
Outdated
Show resolved
Hide resolved
...nagement/test/scout_workflows_ui/api/tests/workflow_execution/workflow_error_trigger.spec.ts
Show resolved
Hide resolved
…tation' of github.com:yngrdyn/kibana into 14421-feature-workflow-error-trigger-reference-implementation
...latform/plugins/shared/workflows_execution_engine/server/execution_functions/run_workflow.ts
Outdated
Show resolved
Hide resolved
...latform/plugins/shared/workflows_execution_engine/server/execution_functions/run_workflow.ts
Outdated
Show resolved
Hide resolved
.../shared/workflows_execution_engine/server/workflow_context_manager/step_execution_runtime.ts
Outdated
Show resolved
Hide resolved
...form/plugins/shared/workflows_execution_engine/server/execution_functions/resume_workflow.ts
Outdated
Show resolved
Hide resolved
src/platform/plugins/shared/workflows_extensions/common/triggers/workflow_execution_failed.ts
Show resolved
Hide resolved
jbudz
left a comment
There was a problem hiding this comment.
packages/kbn-optimizer/limits.yml LGTM
…ce-implementation
…ce-implementation
…ce-implementation
…w-error-trigger-reference-implementation
…w-error-trigger-reference-implementation
.../shared/workflows_execution_engine/server/workflow_context_manager/step_execution_runtime.ts
Show resolved
Hide resolved
src/platform/plugins/shared/workflows_extensions/public/triggers/workflow_execution_failed.ts
Outdated
Show resolved
Hide resolved
…ce-implementation
…ce-implementation
…ce-implementation
…ce-implementation
…ce-implementation
…ce-implementation
💚 Build Succeeded
Metrics [docs]Module Count
Public APIs missing comments
Async chunks
Page load bundle
Unknown metric groupsAPI count
async chunk count
History
cc @yngrdyn |
Closes https://github.com/elastic/security-team/issues/14421.
This PR implements the Workflow execution error Trigger: when a workflow run fails, the platform emits a
workflows.failedevent so that other workflows subscribed to it can run (notifications, cleanup, retries). It also serves as a reference implementation for solution teams adding their own event-driven triggers.Summary
When a workflow execution reaches a failed terminal state, the execution engine builds an event payload (workflow id/name, execution id, error message, failed step id/name, optional stack trace) and calls
workflowsExtensions.emitEvent()with trigger idworkflows.executionFailed. The existing trigger event handler (from #254964) resolves workflows subscribed to that trigger in the same space, evaluates optional KQLon.conditionagainst the event, and runs only matching workflows with the event ascontext.event. The payload includesworkflow.isErrorHandler: truewhen the failed run was itself triggered by an error event, so subscribers can filter out error-handler failures and avoid infinite loops.graph TB subgraph Engine["workflows_execution_engine"] Run["runWorkflow() / resumeWorkflow()"] Fail["Execution fails → failStep()"] Finally["finally: load execution, build payload"] Emit["emitEvent(workflows.executionFailed, payload, spaceId, request)"] end subgraph Subscriber["Error-handling workflow"] Steps["Steps use {{ event.workflow.id }}, {{ event.error.message }}, etc."] end Run --> Fail Fail --> Finally Finally --> Emit Emit --> Steps style Engine fill:#e1f5ff style Subscriber fill:#e8f5e9Event payload (and thus
context.eventin subscriber workflows):id,name,spaceId,isErrorHandlerid,startedAt,failedAtmessage,stepId,stepName, optionalstepExecutionId, optionalstackTraceConditions and steps can use e.g.
event.workflow.name,event.error.stepName,event.execution.id, andnot event.workflow.isErrorHandler:trueto avoid handling failures from error-handler workflows.What's in this PR
Trigger registration (workflows_extensions)
WORKFLOW_EXECUTION_FAILED_TRIGGER_ID, ZodworkflowExecutionFailedEventSchema(workflow, execution, error with optionalstepExecutionIdandstackTrace), i18n for schema descriptions.workflows_extensionsplugin setup; used for validation when emitting and for internal trigger-definitions API.PublicTriggerDefinitionwith i18n title, description, documentation, and examples so the workflow authoring UI shows the trigger and its event shape.Emit on failure (
workflows_execution_engine)buildWorkflowExecutionFailedPayload(execution, failedStepContext?)inserver/lib/build_workflow_execution_failed_payload.ts. Step context (stepId, stepName, stepExecutionId, stackTrace) comes from in-memoryFailedStepContextset infailStep(); not fromexecution.error.detailsor ES step executions (avoids refresh delays).step_execution_runtime.ts,failStep()callsworkflowExecutionState.setLastFailedStepContext({ stepId, stepName, stepExecutionId, stack })so the payload builder can read it in the same run.run_workflow.tsandresume_workflow.ts, in afinallyblock: ifexecution?.status === FAILEDand not a test run, build payload (withworkflowExecutionState.getLastFailedStepContext()) and callworkflowsExtensions.emitEvent({ triggerId: WORKFLOW_EXECUTION_FAILED_TRIGGER_ID, spaceId, payload, request }). Ensures one emit per failed run and consistent metering.How to verify
workflows.executionFailedand a step that logs or notifiestriggeredBy: 'workflows.executionFailed', step seesevent.workflow.name,event.error.message).Release note
Added
workflows.executionFailedtrigger so you can run workflows when another workflow fails. Use it to send notifications (e.g. Slack), run cleanup, or trigger retries. Subscriber workflows receive an event with workflow and execution details, the error message, and the failed step. #257633