Skip to content

[One Workflow] Workflows execute automatically on events#254964

Merged
yngrdyn merged 46 commits intoelastic:mainfrom
yngrdyn:14419-feature-workflows-execute-automatically-on-events
Mar 9, 2026
Merged

[One Workflow] Workflows execute automatically on events#254964
yngrdyn merged 46 commits intoelastic:mainfrom
yngrdyn:14419-feature-workflows-execute-automatically-on-events

Conversation

@yngrdyn
Copy link
Copy Markdown
Contributor

@yngrdyn yngrdyn commented Feb 25, 2026

Closes https://github.com/elastic/security-team/issues/14419.
Relates to https://github.com/elastic/security-team/issues/14416, https://github.com/elastic/security-team/issues/14417, https://github.com/elastic/security-team/issues/14418, https://github.com/elastic/security-team/issues/14420.

This PR implements conditional execution for event-driven workflows, when an event is emitted, only workflows whose trigger with.condition (KQL) matches the event are run.

Summary

When a trigger event is emitted (workflowsExtensions.emitEvent()), the platform fetches all workflows subscribed to that trigger type and then filters them in memory using each workflow’s optional KQL with.condition. Only workflows whose condition matches the event payload are run.

graph TB
    subgraph Emitter["Plugin emitting event"]
        EmitEvent["emitEvent(triggerId, payload)"]
    end

    subgraph Handler["Trigger event handler"]
        BuildContext["Build event context: payload + timestamp + spaceId"]
        ResolveSubs["resolveMatchingWorkflowSubscriptions(triggerId, spaceId, eventContext)"]
        WriteAudit["Write trigger event to data stream (audit)"]
        RunWorkflows["Run matching workflows with event context"]
        subgraph Resolver["Subscription resolver"]
            GetWorkflows["api.getWorkflowsSubscribedToTrigger(triggerId, spaceId)"]
            FilterCondition["workflowMatchesTriggerCondition(workflow, triggerId, eventContext)"]
            subgraph FilterLogic["Filter (per workflow)"]
                GetCondition["Read workflow trigger with.condition (KQL)"]
                EvalKql["evaluateKql(condition, { event: eventContext })"]
                Match["Match? → include in list"]
            end
        end
    end

    EmitEvent --> BuildContext
    BuildContext --> ResolveSubs
    ResolveSubs --> GetWorkflows
    GetWorkflows --> FilterCondition
    FilterCondition --> GetCondition
    GetCondition --> EvalKql
    EvalKql --> Match
    Match --> WriteAudit
    Match --> RunWorkflows

    style Emitter fill:#e1f5ff
    style Handler fill:#fff4e1
    style Resolver fill:#f3e5f5
    style FilterLogic fill:#e8f5e9
Loading

Event context passed to both the filter and workflow execution:

  • Payload: whatever the emitter sent.
  • Base fields: timestamp and spaceId (space where the event was emitted), injected by the handler.

So conditions and steps can use e.g. event.timestamp, event.spaceId, and any custom fields (event.category, event.severity, etc.).

What's in this PR

In-memory trigger condition filtering

  • filter_workflows_by_trigger_condition.ts

    • workflowMatchesTriggerCondition(workflow, triggerId, payload, logger?) evaluates the workflow’s trigger with.condition (KQL) against the event using @kbn/eval-kql.
    • Context for KQL is { event: payload } so authors write e.g. event.severity: "high" or event.category: "alerts" or event.category: "notifications".
    • No condition or empty condition → workflow is included. Evaluation errors are logged and the workflow is excluded.
  • trigger_event_handler.ts

    • After getWorkflowsSubscribedToTrigger(triggerId, spaceId), workflows are filtered with workflowMatchesTriggerCondition(..., eventContext, logger).
    • Only the filtered list is used for audit (trigger-events data stream) and for runWorkflow(..., { event: eventContext }, ...).

Base event schema

  • @kbn/workflows

    • Defines base fields present on every trigger event: spaceId, timestamp (with descriptions).
    • Used when building the workflow context schema for the YAML editor; custom trigger eventSchema shapes are merged on top, so autocomplete and validation always offer event.timestamp and event.spaceId.
  • Runtime event context

    • Handler builds eventContext = { ...payload, timestamp, spaceId } and passes it to the filter and to runWorkflow, so steps and conditions see the same base + payload.

Example: workflow with a condition

triggers:
  - type: example.custom_trigger
    with:
      condition: 'event.severity: "high" and (event.category: "alerts" or event.category: "notifications") and not event.source: "legacy"'
steps:
  - name: log_event
    type: console
    with:
      message: "Event at {{ event.timestamp }}: {{ event.message }}"

Only events that satisfy the KQL condition will run this workflow; steps can use event.timestamp, event.spaceId, and any payload fields.


How to verify

  1. Start Kibana with the workflows extensions example:
    yarn start --run-examples
  2. Create a workflow that uses example.custom_trigger with a with.condition (e.g. event.category: "alerts").
  3. From the example app, emit an event with category: "alerts" → workflow should run.
  4. Emit an event without that category → workflow should not run.
  5. In the workflow YAML editor, confirm that event.timestamp and event.spaceId appear in autocomplete for the trigger event.

🎥 Demo

In the demo, we have two spaces: default and sec. Emitting the event from a workspace does not trigger workflows in a different workspace.

Screen.Recording.2026-03-04.at.11.27.04.mov

@yngrdyn yngrdyn self-assigned this Feb 25, 2026
@yngrdyn yngrdyn added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting Team:One Workflow Team label for One Workflow (Workflow automation) labels Feb 25, 2026
@yngrdyn
Copy link
Copy Markdown
Contributor Author

yngrdyn commented Feb 26, 2026

/ci

@yngrdyn
Copy link
Copy Markdown
Contributor Author

yngrdyn commented Feb 27, 2026

/ci

@yngrdyn
Copy link
Copy Markdown
Contributor Author

yngrdyn commented Feb 27, 2026

/ci

…' of github.com:yngrdyn/kibana into 14419-feature-workflows-execute-automatically-on-events
@yngrdyn
Copy link
Copy Markdown
Contributor Author

yngrdyn commented Feb 27, 2026

/ci

@yngrdyn
Copy link
Copy Markdown
Contributor Author

yngrdyn commented Feb 27, 2026

/ci

@yngrdyn yngrdyn enabled auto-merge (squash) March 9, 2026 11:08
@yngrdyn yngrdyn merged commit 3216a17 into elastic:main Mar 9, 2026
18 checks passed
@elasticmachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
workflowsExtensions 7 18 +11

Any counts in public APIs

Total count of every any typed public API. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats any for more detailed information.

id before after diff
@kbn/workflows 96 97 +1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
workflowsManagement 1.6MB 1.6MB +127.0B
Unknown metric groups

API count

id before after diff
@kbn/workflows 507 508 +1
workflowsExtensions 74 96 +22
total +23

History

cc @yngrdyn

DennisKo pushed a commit to DennisKo/kibana that referenced this pull request Mar 9, 2026
)

Closes elastic/security-team#14419.
Relates to elastic/security-team#14416,
elastic/security-team#14417,
elastic/security-team#14418,
elastic/security-team#14420.

This PR implements conditional execution for event-driven workflows,
when an event is emitted, only workflows whose trigger `with.condition`
(KQL) matches the event are run.

## Summary

When a trigger event is emitted (`workflowsExtensions.emitEvent()`), the
platform fetches all workflows subscribed to that trigger type and then
filters them in memory using each workflow’s optional KQL
`with.condition`. Only workflows whose condition matches the event
payload are run.

```mermaid
graph TB
    subgraph Emitter["Plugin emitting event"]
        EmitEvent["emitEvent(triggerId, payload)"]
    end

    subgraph Handler["Trigger event handler"]
        BuildContext["Build event context: payload + timestamp + spaceId"]
        ResolveSubs["resolveMatchingWorkflowSubscriptions(triggerId, spaceId, eventContext)"]
        WriteAudit["Write trigger event to data stream (audit)"]
        RunWorkflows["Run matching workflows with event context"]
        subgraph Resolver["Subscription resolver"]
            GetWorkflows["api.getWorkflowsSubscribedToTrigger(triggerId, spaceId)"]
            FilterCondition["workflowMatchesTriggerCondition(workflow, triggerId, eventContext)"]
            subgraph FilterLogic["Filter (per workflow)"]
                GetCondition["Read workflow trigger with.condition (KQL)"]
                EvalKql["evaluateKql(condition, { event: eventContext })"]
                Match["Match? → include in list"]
            end
        end
    end

    EmitEvent --> BuildContext
    BuildContext --> ResolveSubs
    ResolveSubs --> GetWorkflows
    GetWorkflows --> FilterCondition
    FilterCondition --> GetCondition
    GetCondition --> EvalKql
    EvalKql --> Match
    Match --> WriteAudit
    Match --> RunWorkflows

    style Emitter fill:#e1f5ff
    style Handler fill:#fff4e1
    style Resolver fill:#f3e5f5
    style FilterLogic fill:#e8f5e9
```

**Event context** passed to both the filter and workflow execution:

- **Payload:** whatever the emitter sent.
- **Base fields:** `timestamp` and `spaceId` (space where the event was
emitted), injected by the handler.

So conditions and steps can use e.g. `event.timestamp`, `event.spaceId`,
and any custom fields (`event.category`, `event.severity`, etc.).

## What's in this PR

### In-memory trigger condition filtering

- **`filter_workflows_by_trigger_condition.ts`**  
- `workflowMatchesTriggerCondition(workflow, triggerId, payload,
logger?)` evaluates the workflow’s trigger `with.condition` (KQL)
against the event using `@kbn/eval-kql`.
- Context for KQL is `{ event: payload }` so authors write e.g.
`event.severity: "high"` or `event.category: "alerts" or event.category:
"notifications"`.
- No condition or empty condition → workflow is included. Evaluation
errors are logged and the workflow is excluded.

- **`trigger_event_handler.ts`**  
- After `getWorkflowsSubscribedToTrigger(triggerId, spaceId)`, workflows
are filtered with `workflowMatchesTriggerCondition(..., eventContext,
logger)`.
- Only the **filtered** list is used for audit (trigger-events data
stream) and for `runWorkflow(..., { event: eventContext }, ...)`.

### Base event schema

- **`@kbn/workflows`**  
- Defines base fields present on every trigger event: `spaceId`,
`timestamp` (with descriptions).
- Used when building the workflow context schema for the YAML editor;
custom trigger `eventSchema` shapes are merged on top, so autocomplete
and validation always offer `event.timestamp` and `event.spaceId`.

- **Runtime event context**  
- Handler builds `eventContext = { ...payload, timestamp, spaceId }` and
passes it to the filter and to `runWorkflow`, so steps and conditions
see the same base + payload.

---

## Example: workflow with a condition

```yaml
triggers:
  - type: example.custom_trigger
    with:
      condition: 'event.severity: "high" and (event.category: "alerts" or event.category: "notifications") and not event.source: "legacy"'
steps:
  - name: log_event
    type: console
    with:
      message: "Event at {{ event.timestamp }}: {{ event.message }}"
```

Only events that satisfy the KQL condition will run this workflow; steps
can use `event.timestamp`, `event.spaceId`, and any payload fields.

---

## How to verify

1. Start Kibana with the workflows extensions example:
   ```bash
   yarn start --run-examples
   ```
2. Create a workflow that uses `example.custom_trigger` with a
`with.condition` (e.g. `event.category: "alerts"`).
3. From the example app, emit an event **with** `category: "alerts"` →
workflow should run.
4. Emit an event **without** that category → workflow should not run.
5. In the workflow YAML editor, confirm that `event.timestamp` and
`event.spaceId` appear in autocomplete for the trigger event.

### 🎥 Demo

In the demo, we have two spaces: `default` and `sec`. Emitting the event
from a workspace does not trigger workflows in a different workspace.


https://github.com/user-attachments/assets/b5def36a-5740-4727-a4e1-af087a748db8

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Kiryous added a commit that referenced this pull request Mar 10, 2026
…orkflow authoring (#256871)

Close elastic/security-team#15740

## Summary

Re-lands #255180 which was reverted in d9e2f4b due to a merge conflict
with #254964 (event-driven trigger execution). Fixed the duplicate
`const api` declaration in `plugin.ts` that resulted from both PRs
adding code in the same method.

No other changes from the original PR — see #255180 for full
description.
juliaElastic added a commit that referenced this pull request Mar 11, 2026
## Summary

Closes elastic/kibana-team#3020

Tests started failing when a hidden `.workflows-events` data stream is
being created by #254964 and the
Fleet data streams API tries to query it and fails with missing access.

Tested by checking out the linked PR locally and applying the changes in
this PR.

```
 └-: fleet
   └-> "before all" hook: beforeTestSuite.trigger in "fleet"
   └-> "before all" hook in "fleet"
     │ debg Creating new local SAML session for a user 'elastic_admin' with role 'admin'
     │ debg Created API key for role: [admin]
     │ debg Waiting up to 30000ms for get default fleet server...
     │ debg Waiting up to 30000ms for get default Elasticsearch output...
   └-: datastreams API
     └-> "before all" hook: beforeTestSuite.trigger for "it works"
     └-> "before all" hook for "it works"
     └-> it works
       └-> "before each" hook: global before each for "it works"
       └- ✓ pass  (64ms)
     └-> "after all" hook for "it works"
     └-> "after all" hook: afterTestSuite.trigger for "it works"
   └-> "after all" hook in "fleet"
   └-> "after all" hook: afterTestSuite.trigger in "fleet"

1 passing (7.2s)

✨  Done in 15.50s.
```

Printed out results of data stream queries from ES:
https://github.com/elastic/kibana/pull/256625/changes#diff-3315d99712b4c4534754edf2249ca9b2a9553ccc84db4ab8e716dae526807a43R102-R108

```
 proc [kibana] dataStreamsInfoByName [
 proc [kibana]   '.workflows-events',
 proc [kibana]   '.alerts-transform.health.alerts-default',
 proc [kibana]   '.alerts-streams.alerts-default',
 proc [kibana]   '.alerts-default.alerts-default',
 proc [kibana]   '.alerts-ml.anomaly-detection.alerts-default',
 proc [kibana]   '.alerts-security.alerts-default',
 proc [kibana]   '.alerts-dataset.quality.alerts-default',
 proc [kibana]   'logs-nginx.access-default',
 proc [kibana]   '.alerts-ml.anomaly-detection-health.alerts-default',
 proc [kibana]   '.kibana-event-log-ds',
 proc [kibana]   '.alerts-stack.alerts-default',
 proc [kibana]   '.alerts-security.attack.discovery.alerts-default',
 proc [kibana]   '.edr-workflow-insights-default'
 proc [kibana] ]
 proc [kibana] dataStreamsStatsByName []
 proc [kibana] dataStreamsMeteringStatsByName [
 proc [kibana]   '.workflows-events',
 proc [kibana]   '.alerts-transform.health.alerts-default',
 proc [kibana]   '.alerts-streams.alerts-default',
 proc [kibana]   '.alerts-default.alerts-default',
 proc [kibana]   '.alerts-ml.anomaly-detection.alerts-default',
 proc [kibana]   '.alerts-security.alerts-default',
 proc [kibana]   '.alerts-dataset.quality.alerts-default',
 proc [kibana]   'logs-nginx.access-default',
 proc [kibana]   '.alerts-ml.anomaly-detection-health.alerts-default',
 proc [kibana]   '.kibana-event-log-ds',
 proc [kibana]   '.alerts-stack.alerts-default',
 proc [kibana]   '.alerts-security.attack.discovery.alerts-default',
 proc [kibana]   '.edr-workflow-insights-default'
 proc [kibana] ]
 proc [kibana] dataStreamNames [ 'logs-nginx.access-default' ]
```

### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [ ] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [ ] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.

### Identify risks

Does this PR introduce any risks? For example, consider risks like hard
to test bugs, performance regression, potential of data loss.

Describe the risk, its severity, and mitigation for each identified
risk. Invite stakeholders and evaluate how to proceed before merging.

- [ ] [See some risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)
- [ ] ...
qn895 pushed a commit to qn895/kibana that referenced this pull request Mar 11, 2026
)

Closes elastic/security-team#14419.
Relates to elastic/security-team#14416,
elastic/security-team#14417,
elastic/security-team#14418,
elastic/security-team#14420.

This PR implements conditional execution for event-driven workflows,
when an event is emitted, only workflows whose trigger `with.condition`
(KQL) matches the event are run.

## Summary

When a trigger event is emitted (`workflowsExtensions.emitEvent()`), the
platform fetches all workflows subscribed to that trigger type and then
filters them in memory using each workflow’s optional KQL
`with.condition`. Only workflows whose condition matches the event
payload are run.

```mermaid
graph TB
    subgraph Emitter["Plugin emitting event"]
        EmitEvent["emitEvent(triggerId, payload)"]
    end

    subgraph Handler["Trigger event handler"]
        BuildContext["Build event context: payload + timestamp + spaceId"]
        ResolveSubs["resolveMatchingWorkflowSubscriptions(triggerId, spaceId, eventContext)"]
        WriteAudit["Write trigger event to data stream (audit)"]
        RunWorkflows["Run matching workflows with event context"]
        subgraph Resolver["Subscription resolver"]
            GetWorkflows["api.getWorkflowsSubscribedToTrigger(triggerId, spaceId)"]
            FilterCondition["workflowMatchesTriggerCondition(workflow, triggerId, eventContext)"]
            subgraph FilterLogic["Filter (per workflow)"]
                GetCondition["Read workflow trigger with.condition (KQL)"]
                EvalKql["evaluateKql(condition, { event: eventContext })"]
                Match["Match? → include in list"]
            end
        end
    end

    EmitEvent --> BuildContext
    BuildContext --> ResolveSubs
    ResolveSubs --> GetWorkflows
    GetWorkflows --> FilterCondition
    FilterCondition --> GetCondition
    GetCondition --> EvalKql
    EvalKql --> Match
    Match --> WriteAudit
    Match --> RunWorkflows

    style Emitter fill:#e1f5ff
    style Handler fill:#fff4e1
    style Resolver fill:#f3e5f5
    style FilterLogic fill:#e8f5e9
```

**Event context** passed to both the filter and workflow execution:

- **Payload:** whatever the emitter sent.
- **Base fields:** `timestamp` and `spaceId` (space where the event was
emitted), injected by the handler.

So conditions and steps can use e.g. `event.timestamp`, `event.spaceId`,
and any custom fields (`event.category`, `event.severity`, etc.).

## What's in this PR

### In-memory trigger condition filtering

- **`filter_workflows_by_trigger_condition.ts`**  
- `workflowMatchesTriggerCondition(workflow, triggerId, payload,
logger?)` evaluates the workflow’s trigger `with.condition` (KQL)
against the event using `@kbn/eval-kql`.
- Context for KQL is `{ event: payload }` so authors write e.g.
`event.severity: "high"` or `event.category: "alerts" or event.category:
"notifications"`.
- No condition or empty condition → workflow is included. Evaluation
errors are logged and the workflow is excluded.

- **`trigger_event_handler.ts`**  
- After `getWorkflowsSubscribedToTrigger(triggerId, spaceId)`, workflows
are filtered with `workflowMatchesTriggerCondition(..., eventContext,
logger)`.
- Only the **filtered** list is used for audit (trigger-events data
stream) and for `runWorkflow(..., { event: eventContext }, ...)`.

### Base event schema

- **`@kbn/workflows`**  
- Defines base fields present on every trigger event: `spaceId`,
`timestamp` (with descriptions).
- Used when building the workflow context schema for the YAML editor;
custom trigger `eventSchema` shapes are merged on top, so autocomplete
and validation always offer `event.timestamp` and `event.spaceId`.

- **Runtime event context**  
- Handler builds `eventContext = { ...payload, timestamp, spaceId }` and
passes it to the filter and to `runWorkflow`, so steps and conditions
see the same base + payload.

---

## Example: workflow with a condition

```yaml
triggers:
  - type: example.custom_trigger
    with:
      condition: 'event.severity: "high" and (event.category: "alerts" or event.category: "notifications") and not event.source: "legacy"'
steps:
  - name: log_event
    type: console
    with:
      message: "Event at {{ event.timestamp }}: {{ event.message }}"
```

Only events that satisfy the KQL condition will run this workflow; steps
can use `event.timestamp`, `event.spaceId`, and any payload fields.

---

## How to verify

1. Start Kibana with the workflows extensions example:
   ```bash
   yarn start --run-examples
   ```
2. Create a workflow that uses `example.custom_trigger` with a
`with.condition` (e.g. `event.category: "alerts"`).
3. From the example app, emit an event **with** `category: "alerts"` →
workflow should run.
4. Emit an event **without** that category → workflow should not run.
5. In the workflow YAML editor, confirm that `event.timestamp` and
`event.spaceId` appear in autocomplete for the trigger event.

### 🎥 Demo

In the demo, we have two spaces: `default` and `sec`. Emitting the event
from a workspace does not trigger workflows in a different workspace.


https://github.com/user-attachments/assets/b5def36a-5740-4727-a4e1-af087a748db8

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
qn895 pushed a commit to qn895/kibana that referenced this pull request Mar 11, 2026
…orkflow authoring (elastic#256871)

Close elastic/security-team#15740

## Summary

Re-lands elastic#255180 which was reverted in d9e2f4b due to a merge conflict
with elastic#254964 (event-driven trigger execution). Fixed the duplicate
`const api` declaration in `plugin.ts` that resulted from both PRs
adding code in the same method.

No other changes from the original PR — see elastic#255180 for full
description.
yngrdyn added a commit that referenced this pull request Mar 16, 2026
Addresses open review comments on [event-driven workflow
execution](#254964):
concurrency/back pressure, pagination and related tests.

## Summary

When many workflows are subscribed to a trigger, emitting an event can
overload the node if we run them all inline. This PR **schedules**
event-driven workflow runs via **TM** (same model as alerts) and caps
concurrent **scheduling** with **p-limit** so `es` and `TM` are not
flooded. It also fixes **silent truncation** when more than 1000
workflows match a trigger by implementing **PIT + search_after** in
`getWorkflowsSubscribedToTrigger`, so all matching workflows are
returned.
sorenlouv pushed a commit that referenced this pull request Mar 17, 2026
…orkflow authoring (#256871)

Close https://github.com/elastic/security-team/issues/15740

## Summary

Re-lands #255180 which was reverted in d9e2f4b due to a merge conflict
with #254964 (event-driven trigger execution). Fixed the duplicate
`const api` declaration in `plugin.ts` that resulted from both PRs
adding code in the same method.

No other changes from the original PR — see #255180 for full
description.
sorenlouv pushed a commit that referenced this pull request Mar 17, 2026
## Summary

Closes https://github.com/elastic/kibana-team/issues/3020

Tests started failing when a hidden `.workflows-events` data stream is
being created by #254964 and the
Fleet data streams API tries to query it and fails with missing access.

Tested by checking out the linked PR locally and applying the changes in
this PR.

```
 └-: fleet
   └-> "before all" hook: beforeTestSuite.trigger in "fleet"
   └-> "before all" hook in "fleet"
     │ debg Creating new local SAML session for a user 'elastic_admin' with role 'admin'
     │ debg Created API key for role: [admin]
     │ debg Waiting up to 30000ms for get default fleet server...
     │ debg Waiting up to 30000ms for get default Elasticsearch output...
   └-: datastreams API
     └-> "before all" hook: beforeTestSuite.trigger for "it works"
     └-> "before all" hook for "it works"
     └-> it works
       └-> "before each" hook: global before each for "it works"
       └- ✓ pass  (64ms)
     └-> "after all" hook for "it works"
     └-> "after all" hook: afterTestSuite.trigger for "it works"
   └-> "after all" hook in "fleet"
   └-> "after all" hook: afterTestSuite.trigger in "fleet"

1 passing (7.2s)

✨  Done in 15.50s.
```

Printed out results of data stream queries from ES:
https://github.com/elastic/kibana/pull/256625/changes#diff-3315d99712b4c4534754edf2249ca9b2a9553ccc84db4ab8e716dae526807a43R102-R108

```
 proc [kibana] dataStreamsInfoByName [
 proc [kibana]   '.workflows-events',
 proc [kibana]   '.alerts-transform.health.alerts-default',
 proc [kibana]   '.alerts-streams.alerts-default',
 proc [kibana]   '.alerts-default.alerts-default',
 proc [kibana]   '.alerts-ml.anomaly-detection.alerts-default',
 proc [kibana]   '.alerts-security.alerts-default',
 proc [kibana]   '.alerts-dataset.quality.alerts-default',
 proc [kibana]   'logs-nginx.access-default',
 proc [kibana]   '.alerts-ml.anomaly-detection-health.alerts-default',
 proc [kibana]   '.kibana-event-log-ds',
 proc [kibana]   '.alerts-stack.alerts-default',
 proc [kibana]   '.alerts-security.attack.discovery.alerts-default',
 proc [kibana]   '.edr-workflow-insights-default'
 proc [kibana] ]
 proc [kibana] dataStreamsStatsByName []
 proc [kibana] dataStreamsMeteringStatsByName [
 proc [kibana]   '.workflows-events',
 proc [kibana]   '.alerts-transform.health.alerts-default',
 proc [kibana]   '.alerts-streams.alerts-default',
 proc [kibana]   '.alerts-default.alerts-default',
 proc [kibana]   '.alerts-ml.anomaly-detection.alerts-default',
 proc [kibana]   '.alerts-security.alerts-default',
 proc [kibana]   '.alerts-dataset.quality.alerts-default',
 proc [kibana]   'logs-nginx.access-default',
 proc [kibana]   '.alerts-ml.anomaly-detection-health.alerts-default',
 proc [kibana]   '.kibana-event-log-ds',
 proc [kibana]   '.alerts-stack.alerts-default',
 proc [kibana]   '.alerts-security.attack.discovery.alerts-default',
 proc [kibana]   '.edr-workflow-insights-default'
 proc [kibana] ]
 proc [kibana] dataStreamNames [ 'logs-nginx.access-default' ]
```

### Checklist

Check the PR satisfies following conditions. 

Reviewers should verify this PR satisfies this list as well.

- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [ ] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [ ] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.

### Identify risks

Does this PR introduce any risks? For example, consider risks like hard
to test bugs, performance regression, potential of data loss.

Describe the risk, its severity, and mitigation for each identified
risk. Invite stakeholders and evaluate how to proceed before merging.

- [ ] [See some risk
examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)
- [ ] ...
sorenlouv pushed a commit that referenced this pull request Mar 17, 2026
Addresses open review comments on [event-driven workflow
execution](#254964):
concurrency/back pressure, pagination and related tests.

## Summary

When many workflows are subscribed to a trigger, emitting an event can
overload the node if we run them all inline. This PR **schedules**
event-driven workflow runs via **TM** (same model as alerts) and caps
concurrent **scheduling** with **p-limit** so `es` and `TM` are not
flooded. It also fixes **silent truncation** when more than 1000
workflows match a trigger by implementing **PIT + search_after** in
`getWorkflowsSubscribedToTrigger`, so all matching workflows are
returned.
jeramysoucy pushed a commit to jeramysoucy/kibana that referenced this pull request Mar 26, 2026
…c#257283)

Addresses open review comments on [event-driven workflow
execution](elastic#254964):
concurrency/back pressure, pagination and related tests.

## Summary

When many workflows are subscribed to a trigger, emitting an event can
overload the node if we run them all inline. This PR **schedules**
event-driven workflow runs via **TM** (same model as alerts) and caps
concurrent **scheduling** with **p-limit** so `es` and `TM` are not
flooded. It also fixes **silent truncation** when more than 1000
workflows match a trigger by implementing **PIT + search_after** in
`getWorkflowsSubscribedToTrigger`, so all matching workflows are
returned.
yngrdyn added a commit that referenced this pull request Apr 6, 2026
…257633)

Closes elastic/security-team#14421.

This PR implements the **Workflow execution error Trigger**: when a
workflow run fails, the platform emits a `workflows.failed` event so
that other workflows subscribed to it can run (notifications, cleanup,
retries). It also serves as a **reference implementation** for solution
teams adding their own event-driven triggers.

## Summary

When a workflow execution reaches a **failed** terminal state, the
execution engine builds an event payload (workflow id/name, execution
id, error message, failed step id/name, optional stack trace) and calls
`workflowsExtensions.emitEvent()` with trigger id
`workflows.executionFailed`. The existing trigger event handler (from
[#254964](#254964)) resolves
workflows subscribed to that trigger in the same space, evaluates
optional KQL `on.condition` against the event, and runs only matching
workflows with the event as `context.event`. The payload includes
`workflow.isErrorHandler: true` when the failed run was itself triggered
by an error event, so subscribers can filter out error-handler failures
and avoid infinite loops.

```mermaid
graph TB
    subgraph Engine["workflows_execution_engine"]
        Run["runWorkflow() / resumeWorkflow()"]
        Fail["Execution fails → failStep()"]
        Finally["finally: load execution, build payload"]
        Emit["emitEvent(workflows.executionFailed, payload, spaceId, request)"]
    end

    subgraph Subscriber["Error-handling workflow"]
        Steps["Steps use {{ event.workflow.id }}, {{ event.error.message }}, etc."]
    end

    Run --> Fail
    Fail --> Finally
    Finally --> Emit
    Emit --> Steps

    style Engine fill:#e1f5ff
    style Subscriber fill:#e8f5e9
```

**Event payload** (and thus `context.event` in subscriber workflows):

- **workflow:** `id`, `name`, `spaceId`, `isErrorHandler`
- **execution:** `id`, `startedAt`, `failedAt`
- **error:** `message`, `stepId`, `stepName`, optional
`stepExecutionId`, optional `stackTrace`

Conditions and steps can use e.g. `event.workflow.name`,
`event.error.stepName`, `event.execution.id`, and `not
event.workflow.isErrorHandler:true` to avoid handling failures from
error-handler workflows.

---

## What's in this PR

### Trigger registration (workflows_extensions)

- **Common:** `WORKFLOW_EXECUTION_FAILED_TRIGGER_ID`, Zod
`workflowExecutionFailedEventSchema` (workflow, execution, error with
optional `stepExecutionId` and `stackTrace`), i18n for schema
descriptions.
- **Server:** Trigger definition registered in `workflows_extensions`
plugin setup; used for validation when emitting and for internal
trigger-definitions API.
- **Public:** `PublicTriggerDefinition` with i18n title, description,
documentation, and examples so the workflow authoring UI shows the
trigger and its event shape.

### Emit on failure (`workflows_execution_engine`)

- **Payload builder:** `buildWorkflowExecutionFailedPayload(execution,
failedStepContext?)` in
`server/lib/build_workflow_execution_failed_payload.ts`. Step context
(stepId, stepName, stepExecutionId, stackTrace) comes from in-memory
`FailedStepContext` set in `failStep()`; not from
`execution.error.details` or ES step executions (avoids refresh delays).
- **Failure context:** In `step_execution_runtime.ts`, `failStep()`
calls `workflowExecutionState.setLastFailedStepContext({ stepId,
stepName, stepExecutionId, stack })` so the payload builder can read it
in the same run.
- **Emission:** In `run_workflow.ts` and `resume_workflow.ts`, in a
`finally` block: if `execution?.status === FAILED` and not a test run,
build payload (with `workflowExecutionState.getLastFailedStepContext()`)
and call `workflowsExtensions.emitEvent({ triggerId:
WORKFLOW_EXECUTION_FAILED_TRIGGER_ID, spaceId, payload, request })`.
Ensures one emit per failed run and consistent metering.

## How to verify

1. Start Kibana with the workflows extensions example:
   ```bash
   yarn start
   ```
2. Create a workflow that always fails.
```
name: Always fails
enabled: true
triggers:
  - type: manual
steps:
  - name: log_start
    type: console
    with:
      message: "Workflow started; next step will fail."
  - name: http_always_500
    type: http
    with:
      url: "https://httpstat.us/500"
      method: GET

```
3. Create a second workflow with trigger `workflows.executionFailed` and
a step that logs or notifies
```
name: Workflow failure monitor
description: Sends a Slack notification with full details when any workflow in the space fails.
enabled: true
triggers:
  - type: workflows.executionFailed
    on:
      condition: not event.workflow.isErrorHandler:true
steps:
  - name: slack_alert
    type: slack
    connector-id: c57c5a7b-dc2b-4d64-b9bd-a02c92696e03
    with:
      message: |
        :alert: *Workflow execution failed*

        *Workflow:* {{ event.workflow.name }}
        *Workflow ID:* `{{ event.workflow.id }}`
        *Space:* {{ event.workflow.spaceId }}

        *Failed step:* {{ event.error.stepName }}
        *Error:* {{ event.error.message }}

        *Execution ID:* {{ event.execution.id }}
        *Started:* {{ event.execution.startedAt }}
        *Failed at:* {{ event.execution.failedAt }}
        {% if event.error.stackTrace %}
        *Stack trace:*
        ```
        {{ event.error.stackTrace }}
        ```
        {% endif %}

        *View execution in Kibana:*
        {{kibanaUrl}}{% if event.workflow.spaceId != 'default' %}/s/{{ event.workflow.spaceId }}{% endif %}/app/workflows/{{ event.workflow.id }}?executionId={{ event.execution.id }}&tab=executions&stepExecutionId={{ event.error.stepExecutionId }}
```
4. Run the first workflow; wait for it to fail.
5. Confirm the second workflow runs and receives the event (e.g.
`triggeredBy: 'workflows.executionFailed'`, step sees
`event.workflow.name`, `event.error.message`).

## Release note
Added `workflows.executionFailed` trigger so you can run workflows when
another workflow fails. Use it to send notifications (e.g. Slack), run
cleanup, or trigger retries. Subscriber workflows receive an event with
workflow and execution details, the error message, and the failed step.
[#257633](#257633)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:One Workflow Team label for One Workflow (Workflow automation) v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants