[ResponseOps] Alerting v2: Initialize all resources#248696
[ResponseOps] Alerting v2: Initialize all resources#248696cnasikas merged 14 commits intoelastic:alerting_v2from
Conversation
| const resourceManager = container.get(ResourceManager); | ||
| const esClient = container.get(CoreStart('elasticsearch')).client.asInternalUser; | ||
|
|
||
| registerResources({ | ||
| resourceManager, | ||
| esClient, | ||
| }); | ||
|
|
||
| resourceManager.startInitialization(); |
There was a problem hiding this comment.
I find this setup a bit difficult to follow. Why do we need both registerResources callback, and resourceManager? can it be handled just by resourceManager, inside startInitialization method?
There was a problem hiding this comment.
This way, we keep the ResourceManager generic. registerResources is where the plugin declares what to initialize. That way, adding/removing resources doesn’t require changing the ResourceManager, and we can reuse the same manager pattern across different resource types.
The ResourceManager is the orchestrator. The registerResources is the composition/configuration. If we put the configuration inside the ResourceManager, we would have to import resource definitions, know how many they are, and know how to build templates/mappings/ILM policies. It becomes harder to reuse it for other resource types, unit test it in isolation, change resource definitions without touching orchestration code (most important), and keep dependencies acyclic/separate.
There was a problem hiding this comment.
The ResourceManager and ResourceInitializer services feel a bit over-engineered for installing 3 datastreams. But if it works it works™️
That being said, I think we could at least change the way we initiate the ResourceManager, and make it impossible to call resourceManager.startInitialization() without registering the resources, by using its constructor, e.g. new ResourceManager(resourceDefinitions, other dep).startInitialization() or something similar. I think that could remove the wiring layer done in registerResources().
And more importantly, are we planning to use a lock when installing the resources in this PR or a followup?
There was a problem hiding this comment.
The ResourceManager and ResourceInitializer services feel a bit over-engineered for installing 3 datastreams
I get the concern, but the goal here is not to install only three data streams. It is to establish a reliable async-initializer pattern for resources that may grow (more streams, more resource types, more fields in the mappings, etc). It is the same pattern the existing alerting framework follows. We have been burned in the past in situations where we choose not to follow a robust design pattern that scales. I believe that for the new alerting framework, in every functionality we add, we should be ready and scalable from the start. The goal is not to touch the core code unless it is to fix a bug. Features should be built on top of it with extensions, not modifications. This is the philosophy I am trying to achieve with the new framework.
make it impossible to call resourceManager.startInitialization() without registering the resources
I agree that we should reduce the ways to make mistakes, but without the cost of coupling we are trying to avoid. I believe integration tests would catch any mistake here (start the resource manager without registering the resources). But, we could either a) change the call to a single entry point (initializeResources({ resouceManager, esClient })) that will register and initialize the resources or b) change ResourceManager to accept a registry callback (resourceManager.startInitialization(() => registerResources(resourceManager, esClient))). I would go with a). Wdyt?
And more importantly, are we planning to use a lock when installing the resources in this PR or a followup?
What do you mean by a lock?
There was a problem hiding this comment.
when many kibana instances start, they all going to try to install the resources. Are we planning to add a lock to ensure only one kibana instance does the job?
We can also let them compete, the resources are idempotent anyway (except the final datastream creation but we handle the case where it already exists so that's fine)
There was a problem hiding this comment.
and to answer the previous comment, a) seems simpler than b) so I would go with a) as well.
I'm all in adding abstraction when it makes sense, and making everything scalable from start. You have a better domain knowledge on alerting, so I trust your instinct when you say we'll need to install more resources soon.
I have a random thoughts about versioning? That's a problem I ran into with SLO, and it's not easy to solve. We can also defer that to when we hit the problem, but how would we approach a mapping change for example? Since we are using datastream, I suppose we create a new index template with a higher priority, and we do a datastream rollover? How does alerting v1 solved this?
There was a problem hiding this comment.
Oh yes, we are handling this scenario. All Kibana nodes are going to try to create/update the datastreams. Some of them will get a 400 (resource_already_exists_exception) and will fail silently. I am going to improve a bit the handling of this to also check for error.body?.error.type === 'resource_already_exists_exception'.
There was a problem hiding this comment.
We can also defer that to when we hit the problem, but how would we approach a mapping change, for example?
We are not allow breaking changes. We allow only adding optional fields. This happens when updating the template. I put a note to think about it and act if needed on another PR.
|
Pinging @elastic/response-ops (Team:ResponseOps) |
| action_type: { type: 'keyword' }, | ||
| episode_id: { type: 'keyword' }, | ||
| rule_id: { type: 'keyword' }, | ||
| source: { type: 'keyword' }, |
There was a problem hiding this comment.
It is from where the action is generated. Internally by the system or by external services (external alerts). I will put an open question about it because you are right, we need to figure out what this value is for someone calling the actions APIs.
| grouping: z.object({ | ||
| key: z.string(), | ||
| value: z.string(), | ||
| }), |
There was a problem hiding this comment.
Can't we have more than one group?
There was a problem hiding this comment.
I don't think so. The group can be combine from multiple grouping keys though.
There was a problem hiding this comment.
So does it mean we would have grouping.key: "host,region" and grouping.value: "elast.co,us-east-1" ?
There was a problem hiding this comment.
yes, something like that. for now we use : for concatenation
| await this.esClient.ilm.putLifecycle({ | ||
| name: this.resourceDefinition.ilmPolicy.name, | ||
| policy: this.resourceDefinition.ilmPolicy.policy, | ||
| }); |
There was a problem hiding this comment.
There was a problem hiding this comment.
Yes, it will be ignored in serverless. We need it for ECS and on-premises deployments.
There was a problem hiding this comment.
Dumb question mainly for my understanding, but how does it work on serverless then?
There was a problem hiding this comment.
There are no dumb questions! I am not aware tbh 🙂 . @darnautov Any idea?
| }, | ||
| }; | ||
|
|
||
| await this.esClient.cluster.putComponentTemplate(componentTemplate); |
There was a problem hiding this comment.
Should we retry on transient errors (for all calls) ? https://github.com/kdelemme/kibana/blob/9a75f0ff5aabe5be8472c76b8385779349705e37/x-pack/solutions/observability/plugins/slo/server/utils/retry.ts#L18
There was a problem hiding this comment.
Ah I see you have the retryService that retry the whole flow if one call fails.
kdelemme
left a comment
There was a problem hiding this comment.
Some questions but otherwise lgtm
| coreStartServices: Promise<[CoreStart, AlertingServerStartDependencies, unknown]>, | ||
| config: PluginConfig, | ||
| resourcesService: AlertingResourcesService | ||
| container: Container |
There was a problem hiding this comment.
FYI I updated the task definition to use DI here #248866
There was a problem hiding this comment.
How should I approach it? Merge this PR and then in your PR update with the new task definition?
There was a problem hiding this comment.
it's just FYI, I update my PR after we merge this one
...form/plugins/shared/alerting_v2/server/lib/services/resource_service/resource_initializer.ts
Outdated
Show resolved
Hide resolved
...platform/plugins/shared/alerting_v2/server/lib/services/resource_service/resource_manager.ts
Outdated
Show resolved
Hide resolved
...platform/plugins/shared/alerting_v2/server/lib/services/resource_service/resource_manager.ts
Outdated
Show resolved
Hide resolved
7461d63 to
90a5604
Compare
💔 Build Failed
Failed CI StepsTest Failures
Metrics [docs]
History
cc @cnasikas |
## Summary ### Key capabilities - **ES|QL-native rule evaluation** — Rules are defined as ES|QL queries with optional WHERE clause conditions, evaluated on a configurable schedule - **Alert lifecycle management** — Full episode tracking with pending → active → recovering → inactive state transitions, including configurable alert delay (consecutive breaches / duration) - **Event-driven architecture** — Alert events and actions are stored in dedicated data streams (`.alerting-events`, `.alerting-actions`) with ES|QL views for querying - **Notification dispatch pipeline** — A multi-step dispatcher that matches alert episodes to notification policies, handles throttling/suppression, and triggers Kibana Workflows using encrypted API keys - **Notification policies** — CRUD APIs and UI for creating notification policies with KQL-based rule matching, workflow integration, and API key management - **Rule authoring UI** — A shared rule form package (`@kbn/alerting-v2-rule-form`) usable standalone or embedded in Discover, with ES|QL editor, WHERE clause condition editing, recovery configuration, and live query preview - **Rule management UI** — Full rule list with pagination, enable/disable, clone, edit, and delete operations - **APM instrumentation** — Middleware and decorators for tracing rule execution and client operations ### Architecture highlights - **InversifyJS DI** — All services use constructor injection with typed tokens, scoped per-request or singleton as appropriate - **Pipeline pattern** — Rule executor and dispatcher use composable step-based pipelines - **Saved Objects** — Rules stored as hidden saved objects; notification policies stored as encrypted saved objects (for API key protection) - **Feature privileges** — Dedicated Kibana feature with read/all privileges for RBAC --- ## Contained PRs <details> <summary><strong>Core Engine & Plugin Init</strong> (12 PRs)</summary> - #247283 — Init alerting v2 plugin (@cnasikas) - #247452 — Add the alerting v2 feature privileges (@cnasikas) - #247673 — Director (@cnasikas) - #248306 — Create basic services (@cnasikas) - #248696 — Initialize all resources (@cnasikas) - #250023 — Schema package (@cnasikas) - #250010 — YML Editor (@cnasikas) - #251064 — Remove index.mode: lookup for RnA alert indices (@cnasikas) - #251707 — Simplify task registration pattern (@kdelemme) - #251876 — Dedicated user service (@cnasikas) - #252073 — Use `kbn/data-streams` in alerting_v2 (@cnasikas) - #255120 — Update alerting-v2 owner to new rna project team (@cnasikas) </details> <details> <summary><strong>Rule Execution Pipeline</strong> (12 PRs)</summary> - #247472 — Add alerting v2 Rule Executor (@darnautov) - #248285 — Alerting v2 rule HTTP APIs (@darnautov) - #248728 — Add basic alert actions route (@darnautov) - #250161 — Refactor rule executor to use a pipeline pattern (@darnautov) - #252292 — Implement the CountTimeframeStrategy for the director (@cnasikas) - #252544 — Add support of streaming in the rule executor (@darnautov) - #252754 — Update rule attributes (@kdelemme) - #253355 — Add getRules client method (@kdelemme) - #253668 — Make evaluation.query.condition optional (@kdelemme) - #254031 — Add recovery event generation to rule execution pipeline (@kdelemme) - #255968 — ES|QL views (@adcoelho) - #256697 — Create episodes ES|QL view (@adcoelho) </details> <details> <summary><strong>Alert Suppression & Episodes</strong> (3 PRs)</summary> - #252174 — Alert suppression (@kdelemme) - #256486 — Fix suppression query (@kdelemme) - #256527 — Store 'unmatched' action for unmatched alert episodes (@kdelemme) </details> <details> <summary><strong>Dispatcher & Notification Engine</strong> (6 PRs)</summary> - #250822 — Alerting v2 dispatcher (@kdelemme) - #251529 — Use query service in dispatcher (@kdelemme) - #251679 — Dispatcher task (@kdelemme) - #252758 — Dispatcher notification policy (@kdelemme) - #255332 — Wait for resources before scheduling dispatcher task (@kdelemme) - #256536 — Use stored encrypted API keys from Notification Policy in dispatcher step (@kdelemme) </details> <details> <summary><strong>Notification Policies (Server)</strong> (4 PRs)</summary> - #251336 — Introduce notification policy CRUD APIs and client (@cnasikas) - #253134 — Update notification policy (@cnasikas) - #254808 — Store API key owner on Notification Policy (@kdelemme) - #256940 — Make notification policies global with optional rule-label scoping (@kdelemme) </details> <details> <summary><strong>Notification Policies UI</strong> (1 PR)</summary> - #255599 — Add notification policies UI and Storybook form story (@adcoelho) </details> <details> <summary><strong>Rule Authoring UI</strong> (13 PRs)</summary> - #250961 — Add create rule flyout in Discover (@adcoelho) - #255111 — Add activation configuration fields to alerting V2 rule form (@yiannisnikolopoulos) - #255427 — Rule form: provide services via context (@dominiqueclarke) - #255876 — MVP rule form, Split evaluation condition, and Recovery configuration (@dominiqueclarke) - #256260 — Foundational rule list (@dominiqueclarke) - #256756 — Wire up edit flow (@dominiqueclarke) - #256801 — Move consecutive breaches max to shared constants (@yiannisnikolopoulos) - #256818 — Preview query and design parity (@dominiqueclarke) - #256938 — Allow clearing number inputs in state transition fields (@yiannisnikolopoulos) - #257017 — Add enable/disable and clone rule to rule list (@dominiqueclarke) - #257246 — Remove all React.FC (@dominiqueclarke) - #257415 — Rule form - fix test (@dominiqueclarke) - #257454 — Block comma key in number input component (@yiannisnikolopoulos) </details> <details> <summary><strong>API Documentation & Schema</strong> (2 PRs)</summary> - #254901 — Rename indexes for alert events and actions (@adcoelho) - #255810 — OAS for alert action routes (@adcoelho) </details> <details> <summary><strong>Observability & Monitoring</strong> (3 PRs)</summary> - #254925 — Add ApmMiddleware to the rule executor (@adcoelho) - #255115 — Add the withAPM decorator and apply it to the rules_client (@adcoelho) - #255999 — Fix linting problem in apm middleware (@adcoelho) </details> <details> <summary><strong>CI & Maintenance</strong> (2 PRs)</summary> - #257409 — Refactor SO services to use inversify DI for client initialization (@darnautov) - Fix alerting-v2-schema jest config (@darnautov) </details> --- --------- Co-authored-by: Dima Arnautov <dmitrii.arnautov@elastic.co> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kevin Delemme <kevin.delemme@elastic.co> Co-authored-by: Mike Côté <mikecote@users.noreply.github.com> Co-authored-by: Antonio <antonio.coelho@elastic.co> Co-authored-by: Kevin Delemme <kdelemme@gmail.com> Co-authored-by: Dominique Clarke <dominique.clarke@elastic.co> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Jason Rhodes <jason.rhodes@elastic.co> Co-authored-by: Yiannis Nikolopoulos <yiannis.nikolopoulos@elastic.co> Co-authored-by: Alexi Doak <109488926+doakalexi@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Bailey Cash <bailey.cash@elastic.co> Co-authored-by: Anna Davydova <ana.davydova@elastic.co> Co-authored-by: Umberto Pepato <umbopepato@users.noreply.github.com> Co-authored-by: Jason Rhodes <jason.matthew.rhodes@gmail.com> Co-authored-by: Joana Cardoso <169058851+joana-cps@users.noreply.github.com>
## Summary ### Key capabilities - **ES|QL-native rule evaluation** — Rules are defined as ES|QL queries with optional WHERE clause conditions, evaluated on a configurable schedule - **Alert lifecycle management** — Full episode tracking with pending → active → recovering → inactive state transitions, including configurable alert delay (consecutive breaches / duration) - **Event-driven architecture** — Alert events and actions are stored in dedicated data streams (`.alerting-events`, `.alerting-actions`) with ES|QL views for querying - **Notification dispatch pipeline** — A multi-step dispatcher that matches alert episodes to notification policies, handles throttling/suppression, and triggers Kibana Workflows using encrypted API keys - **Notification policies** — CRUD APIs and UI for creating notification policies with KQL-based rule matching, workflow integration, and API key management - **Rule authoring UI** — A shared rule form package (`@kbn/alerting-v2-rule-form`) usable standalone or embedded in Discover, with ES|QL editor, WHERE clause condition editing, recovery configuration, and live query preview - **Rule management UI** — Full rule list with pagination, enable/disable, clone, edit, and delete operations - **APM instrumentation** — Middleware and decorators for tracing rule execution and client operations ### Architecture highlights - **InversifyJS DI** — All services use constructor injection with typed tokens, scoped per-request or singleton as appropriate - **Pipeline pattern** — Rule executor and dispatcher use composable step-based pipelines - **Saved Objects** — Rules stored as hidden saved objects; notification policies stored as encrypted saved objects (for API key protection) - **Feature privileges** — Dedicated Kibana feature with read/all privileges for RBAC --- ## Contained PRs <details> <summary><strong>Core Engine & Plugin Init</strong> (12 PRs)</summary> - elastic#247283 — Init alerting v2 plugin (@cnasikas) - elastic#247452 — Add the alerting v2 feature privileges (@cnasikas) - elastic#247673 — Director (@cnasikas) - elastic#248306 — Create basic services (@cnasikas) - elastic#248696 — Initialize all resources (@cnasikas) - elastic#250023 — Schema package (@cnasikas) - elastic#250010 — YML Editor (@cnasikas) - elastic#251064 — Remove index.mode: lookup for RnA alert indices (@cnasikas) - elastic#251707 — Simplify task registration pattern (@kdelemme) - elastic#251876 — Dedicated user service (@cnasikas) - elastic#252073 — Use `kbn/data-streams` in alerting_v2 (@cnasikas) - elastic#255120 — Update alerting-v2 owner to new rna project team (@cnasikas) </details> <details> <summary><strong>Rule Execution Pipeline</strong> (12 PRs)</summary> - elastic#247472 — Add alerting v2 Rule Executor (@darnautov) - elastic#248285 — Alerting v2 rule HTTP APIs (@darnautov) - elastic#248728 — Add basic alert actions route (@darnautov) - elastic#250161 — Refactor rule executor to use a pipeline pattern (@darnautov) - elastic#252292 — Implement the CountTimeframeStrategy for the director (@cnasikas) - elastic#252544 — Add support of streaming in the rule executor (@darnautov) - elastic#252754 — Update rule attributes (@kdelemme) - elastic#253355 — Add getRules client method (@kdelemme) - elastic#253668 — Make evaluation.query.condition optional (@kdelemme) - elastic#254031 — Add recovery event generation to rule execution pipeline (@kdelemme) - elastic#255968 — ES&elastic#124;QL views (@adcoelho) - elastic#256697 — Create episodes ES&elastic#124;QL view (@adcoelho) </details> <details> <summary><strong>Alert Suppression & Episodes</strong> (3 PRs)</summary> - elastic#252174 — Alert suppression (@kdelemme) - elastic#256486 — Fix suppression query (@kdelemme) - elastic#256527 — Store 'unmatched' action for unmatched alert episodes (@kdelemme) </details> <details> <summary><strong>Dispatcher & Notification Engine</strong> (6 PRs)</summary> - elastic#250822 — Alerting v2 dispatcher (@kdelemme) - elastic#251529 — Use query service in dispatcher (@kdelemme) - elastic#251679 — Dispatcher task (@kdelemme) - elastic#252758 — Dispatcher notification policy (@kdelemme) - elastic#255332 — Wait for resources before scheduling dispatcher task (@kdelemme) - elastic#256536 — Use stored encrypted API keys from Notification Policy in dispatcher step (@kdelemme) </details> <details> <summary><strong>Notification Policies (Server)</strong> (4 PRs)</summary> - elastic#251336 — Introduce notification policy CRUD APIs and client (@cnasikas) - elastic#253134 — Update notification policy (@cnasikas) - elastic#254808 — Store API key owner on Notification Policy (@kdelemme) - elastic#256940 — Make notification policies global with optional rule-label scoping (@kdelemme) </details> <details> <summary><strong>Notification Policies UI</strong> (1 PR)</summary> - elastic#255599 — Add notification policies UI and Storybook form story (@adcoelho) </details> <details> <summary><strong>Rule Authoring UI</strong> (13 PRs)</summary> - elastic#250961 — Add create rule flyout in Discover (@adcoelho) - elastic#255111 — Add activation configuration fields to alerting V2 rule form (@yiannisnikolopoulos) - elastic#255427 — Rule form: provide services via context (@dominiqueclarke) - elastic#255876 — MVP rule form, Split evaluation condition, and Recovery configuration (@dominiqueclarke) - elastic#256260 — Foundational rule list (@dominiqueclarke) - elastic#256756 — Wire up edit flow (@dominiqueclarke) - elastic#256801 — Move consecutive breaches max to shared constants (@yiannisnikolopoulos) - elastic#256818 — Preview query and design parity (@dominiqueclarke) - elastic#256938 — Allow clearing number inputs in state transition fields (@yiannisnikolopoulos) - elastic#257017 — Add enable/disable and clone rule to rule list (@dominiqueclarke) - elastic#257246 — Remove all React.FC (@dominiqueclarke) - elastic#257415 — Rule form - fix test (@dominiqueclarke) - elastic#257454 — Block comma key in number input component (@yiannisnikolopoulos) </details> <details> <summary><strong>API Documentation & Schema</strong> (2 PRs)</summary> - elastic#254901 — Rename indexes for alert events and actions (@adcoelho) - elastic#255810 — OAS for alert action routes (@adcoelho) </details> <details> <summary><strong>Observability & Monitoring</strong> (3 PRs)</summary> - elastic#254925 — Add ApmMiddleware to the rule executor (@adcoelho) - elastic#255115 — Add the withAPM decorator and apply it to the rules_client (@adcoelho) - elastic#255999 — Fix linting problem in apm middleware (@adcoelho) </details> <details> <summary><strong>CI & Maintenance</strong> (2 PRs)</summary> - elastic#257409 — Refactor SO services to use inversify DI for client initialization (@darnautov) - Fix alerting-v2-schema jest config (@darnautov) </details> --- --------- Co-authored-by: Dima Arnautov <dmitrii.arnautov@elastic.co> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kevin Delemme <kevin.delemme@elastic.co> Co-authored-by: Mike Côté <mikecote@users.noreply.github.com> Co-authored-by: Antonio <antonio.coelho@elastic.co> Co-authored-by: Kevin Delemme <kdelemme@gmail.com> Co-authored-by: Dominique Clarke <dominique.clarke@elastic.co> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Jason Rhodes <jason.rhodes@elastic.co> Co-authored-by: Yiannis Nikolopoulos <yiannis.nikolopoulos@elastic.co> Co-authored-by: Alexi Doak <109488926+doakalexi@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Bailey Cash <bailey.cash@elastic.co> Co-authored-by: Anna Davydova <ana.davydova@elastic.co> Co-authored-by: Umberto Pepato <umbopepato@users.noreply.github.com> Co-authored-by: Jason Rhodes <jason.matthew.rhodes@gmail.com> Co-authored-by: Joana Cardoso <169058851+joana-cps@users.noreply.github.com>
- **ES|QL-native rule evaluation** — Rules are defined as ES|QL queries with optional WHERE clause conditions, evaluated on a configurable schedule - **Alert lifecycle management** — Full episode tracking with pending → active → recovering → inactive state transitions, including configurable alert delay (consecutive breaches / duration) - **Event-driven architecture** — Alert events and actions are stored in dedicated data streams (`.alerting-events`, `.alerting-actions`) with ES|QL views for querying - **Notification dispatch pipeline** — A multi-step dispatcher that matches alert episodes to notification policies, handles throttling/suppression, and triggers Kibana Workflows using encrypted API keys - **Notification policies** — CRUD APIs and UI for creating notification policies with KQL-based rule matching, workflow integration, and API key management - **Rule authoring UI** — A shared rule form package (`@kbn/alerting-v2-rule-form`) usable standalone or embedded in Discover, with ES|QL editor, WHERE clause condition editing, recovery configuration, and live query preview - **Rule management UI** — Full rule list with pagination, enable/disable, clone, edit, and delete operations - **APM instrumentation** — Middleware and decorators for tracing rule execution and client operations - **InversifyJS DI** — All services use constructor injection with typed tokens, scoped per-request or singleton as appropriate - **Pipeline pattern** — Rule executor and dispatcher use composable step-based pipelines - **Saved Objects** — Rules stored as hidden saved objects; notification policies stored as encrypted saved objects (for API key protection) - **Feature privileges** — Dedicated Kibana feature with read/all privileges for RBAC --- <details> <summary><strong>Core Engine & Plugin Init</strong> (12 PRs)</summary> - elastic#247283 — Init alerting v2 plugin (@cnasikas) - elastic#247452 — Add the alerting v2 feature privileges (@cnasikas) - elastic#247673 — Director (@cnasikas) - elastic#248306 — Create basic services (@cnasikas) - elastic#248696 — Initialize all resources (@cnasikas) - elastic#250023 — Schema package (@cnasikas) - elastic#250010 — YML Editor (@cnasikas) - elastic#251064 — Remove index.mode: lookup for RnA alert indices (@cnasikas) - elastic#251707 — Simplify task registration pattern (@kdelemme) - elastic#251876 — Dedicated user service (@cnasikas) - elastic#252073 — Use `kbn/data-streams` in alerting_v2 (@cnasikas) - elastic#255120 — Update alerting-v2 owner to new rna project team (@cnasikas) </details> <details> <summary><strong>Rule Execution Pipeline</strong> (12 PRs)</summary> - elastic#247472 — Add alerting v2 Rule Executor (@darnautov) - elastic#248285 — Alerting v2 rule HTTP APIs (@darnautov) - elastic#248728 — Add basic alert actions route (@darnautov) - elastic#250161 — Refactor rule executor to use a pipeline pattern (@darnautov) - elastic#252292 — Implement the CountTimeframeStrategy for the director (@cnasikas) - elastic#252544 — Add support of streaming in the rule executor (@darnautov) - elastic#252754 — Update rule attributes (@kdelemme) - elastic#253355 — Add getRules client method (@kdelemme) - elastic#253668 — Make evaluation.query.condition optional (@kdelemme) - elastic#254031 — Add recovery event generation to rule execution pipeline (@kdelemme) - elastic#255968 — ES&elastic#124;QL views (@adcoelho) - elastic#256697 — Create episodes ES&elastic#124;QL view (@adcoelho) </details> <details> <summary><strong>Alert Suppression & Episodes</strong> (3 PRs)</summary> - elastic#252174 — Alert suppression (@kdelemme) - elastic#256486 — Fix suppression query (@kdelemme) - elastic#256527 — Store 'unmatched' action for unmatched alert episodes (@kdelemme) </details> <details> <summary><strong>Dispatcher & Notification Engine</strong> (6 PRs)</summary> - elastic#250822 — Alerting v2 dispatcher (@kdelemme) - elastic#251529 — Use query service in dispatcher (@kdelemme) - elastic#251679 — Dispatcher task (@kdelemme) - elastic#252758 — Dispatcher notification policy (@kdelemme) - elastic#255332 — Wait for resources before scheduling dispatcher task (@kdelemme) - elastic#256536 — Use stored encrypted API keys from Notification Policy in dispatcher step (@kdelemme) </details> <details> <summary><strong>Notification Policies (Server)</strong> (4 PRs)</summary> - elastic#251336 — Introduce notification policy CRUD APIs and client (@cnasikas) - elastic#253134 — Update notification policy (@cnasikas) - elastic#254808 — Store API key owner on Notification Policy (@kdelemme) - elastic#256940 — Make notification policies global with optional rule-label scoping (@kdelemme) </details> <details> <summary><strong>Notification Policies UI</strong> (1 PR)</summary> - elastic#255599 — Add notification policies UI and Storybook form story (@adcoelho) </details> <details> <summary><strong>Rule Authoring UI</strong> (13 PRs)</summary> - elastic#250961 — Add create rule flyout in Discover (@adcoelho) - elastic#255111 — Add activation configuration fields to alerting V2 rule form (@yiannisnikolopoulos) - elastic#255427 — Rule form: provide services via context (@dominiqueclarke) - elastic#255876 — MVP rule form, Split evaluation condition, and Recovery configuration (@dominiqueclarke) - elastic#256260 — Foundational rule list (@dominiqueclarke) - elastic#256756 — Wire up edit flow (@dominiqueclarke) - elastic#256801 — Move consecutive breaches max to shared constants (@yiannisnikolopoulos) - elastic#256818 — Preview query and design parity (@dominiqueclarke) - elastic#256938 — Allow clearing number inputs in state transition fields (@yiannisnikolopoulos) - elastic#257017 — Add enable/disable and clone rule to rule list (@dominiqueclarke) - elastic#257246 — Remove all React.FC (@dominiqueclarke) - elastic#257415 — Rule form - fix test (@dominiqueclarke) - elastic#257454 — Block comma key in number input component (@yiannisnikolopoulos) </details> <details> <summary><strong>API Documentation & Schema</strong> (2 PRs)</summary> - elastic#254901 — Rename indexes for alert events and actions (@adcoelho) - elastic#255810 — OAS for alert action routes (@adcoelho) </details> <details> <summary><strong>Observability & Monitoring</strong> (3 PRs)</summary> - elastic#254925 — Add ApmMiddleware to the rule executor (@adcoelho) - elastic#255115 — Add the withAPM decorator and apply it to the rules_client (@adcoelho) - elastic#255999 — Fix linting problem in apm middleware (@adcoelho) </details> <details> <summary><strong>CI & Maintenance</strong> (2 PRs)</summary> - elastic#257409 — Refactor SO services to use inversify DI for client initialization (@darnautov) - Fix alerting-v2-schema jest config (@darnautov) </details> --- --------- Co-authored-by: Dima Arnautov <dmitrii.arnautov@elastic.co> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kevin Delemme <kevin.delemme@elastic.co> Co-authored-by: Mike Côté <mikecote@users.noreply.github.com> Co-authored-by: Antonio <antonio.coelho@elastic.co> Co-authored-by: Kevin Delemme <kdelemme@gmail.com> Co-authored-by: Dominique Clarke <dominique.clarke@elastic.co> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Jason Rhodes <jason.rhodes@elastic.co> Co-authored-by: Yiannis Nikolopoulos <yiannis.nikolopoulos@elastic.co> Co-authored-by: Alexi Doak <109488926+doakalexi@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Bailey Cash <bailey.cash@elastic.co> Co-authored-by: Anna Davydova <ana.davydova@elastic.co> Co-authored-by: Umberto Pepato <umbopepato@users.noreply.github.com> Co-authored-by: Jason Rhodes <jason.matthew.rhodes@gmail.com> Co-authored-by: Joana Cardoso <169058851+joana-cps@users.noreply.github.com>
- **ES|QL-native rule evaluation** — Rules are defined as ES|QL queries with optional WHERE clause conditions, evaluated on a configurable schedule - **Alert lifecycle management** — Full episode tracking with pending → active → recovering → inactive state transitions, including configurable alert delay (consecutive breaches / duration) - **Event-driven architecture** — Alert events and actions are stored in dedicated data streams (`.alerting-events`, `.alerting-actions`) with ES|QL views for querying - **Notification dispatch pipeline** — A multi-step dispatcher that matches alert episodes to notification policies, handles throttling/suppression, and triggers Kibana Workflows using encrypted API keys - **Notification policies** — CRUD APIs and UI for creating notification policies with KQL-based rule matching, workflow integration, and API key management - **Rule authoring UI** — A shared rule form package (`@kbn/alerting-v2-rule-form`) usable standalone or embedded in Discover, with ES|QL editor, WHERE clause condition editing, recovery configuration, and live query preview - **Rule management UI** — Full rule list with pagination, enable/disable, clone, edit, and delete operations - **APM instrumentation** — Middleware and decorators for tracing rule execution and client operations - **InversifyJS DI** — All services use constructor injection with typed tokens, scoped per-request or singleton as appropriate - **Pipeline pattern** — Rule executor and dispatcher use composable step-based pipelines - **Saved Objects** — Rules stored as hidden saved objects; notification policies stored as encrypted saved objects (for API key protection) - **Feature privileges** — Dedicated Kibana feature with read/all privileges for RBAC --- <details> <summary><strong>Core Engine & Plugin Init</strong> (12 PRs)</summary> - elastic#247283 — Init alerting v2 plugin (@cnasikas) - elastic#247452 — Add the alerting v2 feature privileges (@cnasikas) - elastic#247673 — Director (@cnasikas) - elastic#248306 — Create basic services (@cnasikas) - elastic#248696 — Initialize all resources (@cnasikas) - elastic#250023 — Schema package (@cnasikas) - elastic#250010 — YML Editor (@cnasikas) - elastic#251064 — Remove index.mode: lookup for RnA alert indices (@cnasikas) - elastic#251707 — Simplify task registration pattern (@kdelemme) - elastic#251876 — Dedicated user service (@cnasikas) - elastic#252073 — Use `kbn/data-streams` in alerting_v2 (@cnasikas) - elastic#255120 — Update alerting-v2 owner to new rna project team (@cnasikas) </details> <details> <summary><strong>Rule Execution Pipeline</strong> (12 PRs)</summary> - elastic#247472 — Add alerting v2 Rule Executor (@darnautov) - elastic#248285 — Alerting v2 rule HTTP APIs (@darnautov) - elastic#248728 — Add basic alert actions route (@darnautov) - elastic#250161 — Refactor rule executor to use a pipeline pattern (@darnautov) - elastic#252292 — Implement the CountTimeframeStrategy for the director (@cnasikas) - elastic#252544 — Add support of streaming in the rule executor (@darnautov) - elastic#252754 — Update rule attributes (@kdelemme) - elastic#253355 — Add getRules client method (@kdelemme) - elastic#253668 — Make evaluation.query.condition optional (@kdelemme) - elastic#254031 — Add recovery event generation to rule execution pipeline (@kdelemme) - elastic#255968 — ES&elastic#124;QL views (@adcoelho) - elastic#256697 — Create episodes ES&elastic#124;QL view (@adcoelho) </details> <details> <summary><strong>Alert Suppression & Episodes</strong> (3 PRs)</summary> - elastic#252174 — Alert suppression (@kdelemme) - elastic#256486 — Fix suppression query (@kdelemme) - elastic#256527 — Store 'unmatched' action for unmatched alert episodes (@kdelemme) </details> <details> <summary><strong>Dispatcher & Notification Engine</strong> (6 PRs)</summary> - elastic#250822 — Alerting v2 dispatcher (@kdelemme) - elastic#251529 — Use query service in dispatcher (@kdelemme) - elastic#251679 — Dispatcher task (@kdelemme) - elastic#252758 — Dispatcher notification policy (@kdelemme) - elastic#255332 — Wait for resources before scheduling dispatcher task (@kdelemme) - elastic#256536 — Use stored encrypted API keys from Notification Policy in dispatcher step (@kdelemme) </details> <details> <summary><strong>Notification Policies (Server)</strong> (4 PRs)</summary> - elastic#251336 — Introduce notification policy CRUD APIs and client (@cnasikas) - elastic#253134 — Update notification policy (@cnasikas) - elastic#254808 — Store API key owner on Notification Policy (@kdelemme) - elastic#256940 — Make notification policies global with optional rule-label scoping (@kdelemme) </details> <details> <summary><strong>Notification Policies UI</strong> (1 PR)</summary> - elastic#255599 — Add notification policies UI and Storybook form story (@adcoelho) </details> <details> <summary><strong>Rule Authoring UI</strong> (13 PRs)</summary> - elastic#250961 — Add create rule flyout in Discover (@adcoelho) - elastic#255111 — Add activation configuration fields to alerting V2 rule form (@yiannisnikolopoulos) - elastic#255427 — Rule form: provide services via context (@dominiqueclarke) - elastic#255876 — MVP rule form, Split evaluation condition, and Recovery configuration (@dominiqueclarke) - elastic#256260 — Foundational rule list (@dominiqueclarke) - elastic#256756 — Wire up edit flow (@dominiqueclarke) - elastic#256801 — Move consecutive breaches max to shared constants (@yiannisnikolopoulos) - elastic#256818 — Preview query and design parity (@dominiqueclarke) - elastic#256938 — Allow clearing number inputs in state transition fields (@yiannisnikolopoulos) - elastic#257017 — Add enable/disable and clone rule to rule list (@dominiqueclarke) - elastic#257246 — Remove all React.FC (@dominiqueclarke) - elastic#257415 — Rule form - fix test (@dominiqueclarke) - elastic#257454 — Block comma key in number input component (@yiannisnikolopoulos) </details> <details> <summary><strong>API Documentation & Schema</strong> (2 PRs)</summary> - elastic#254901 — Rename indexes for alert events and actions (@adcoelho) - elastic#255810 — OAS for alert action routes (@adcoelho) </details> <details> <summary><strong>Observability & Monitoring</strong> (3 PRs)</summary> - elastic#254925 — Add ApmMiddleware to the rule executor (@adcoelho) - elastic#255115 — Add the withAPM decorator and apply it to the rules_client (@adcoelho) - elastic#255999 — Fix linting problem in apm middleware (@adcoelho) </details> <details> <summary><strong>CI & Maintenance</strong> (2 PRs)</summary> - elastic#257409 — Refactor SO services to use inversify DI for client initialization (@darnautov) - Fix alerting-v2-schema jest config (@darnautov) </details> --- --------- Co-authored-by: Dima Arnautov <dmitrii.arnautov@elastic.co> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kevin Delemme <kevin.delemme@elastic.co> Co-authored-by: Mike Côté <mikecote@users.noreply.github.com> Co-authored-by: Antonio <antonio.coelho@elastic.co> Co-authored-by: Kevin Delemme <kdelemme@gmail.com> Co-authored-by: Dominique Clarke <dominique.clarke@elastic.co> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Jason Rhodes <jason.rhodes@elastic.co> Co-authored-by: Yiannis Nikolopoulos <yiannis.nikolopoulos@elastic.co> Co-authored-by: Alexi Doak <109488926+doakalexi@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Bailey Cash <bailey.cash@elastic.co> Co-authored-by: Anna Davydova <ana.davydova@elastic.co> Co-authored-by: Umberto Pepato <umbopepato@users.noreply.github.com> Co-authored-by: Jason Rhodes <jason.matthew.rhodes@gmail.com> Co-authored-by: Joana Cardoso <169058851+joana-cps@users.noreply.github.com>
## Summary ### Key capabilities - **ES|QL-native rule evaluation** — Rules are defined as ES|QL queries with optional WHERE clause conditions, evaluated on a configurable schedule - **Alert lifecycle management** — Full episode tracking with pending → active → recovering → inactive state transitions, including configurable alert delay (consecutive breaches / duration) - **Event-driven architecture** — Alert events and actions are stored in dedicated data streams (`.alerting-events`, `.alerting-actions`) with ES|QL views for querying - **Notification dispatch pipeline** — A multi-step dispatcher that matches alert episodes to notification policies, handles throttling/suppression, and triggers Kibana Workflows using encrypted API keys - **Notification policies** — CRUD APIs and UI for creating notification policies with KQL-based rule matching, workflow integration, and API key management - **Rule authoring UI** — A shared rule form package (`@kbn/alerting-v2-rule-form`) usable standalone or embedded in Discover, with ES|QL editor, WHERE clause condition editing, recovery configuration, and live query preview - **Rule management UI** — Full rule list with pagination, enable/disable, clone, edit, and delete operations - **APM instrumentation** — Middleware and decorators for tracing rule execution and client operations ### Architecture highlights - **InversifyJS DI** — All services use constructor injection with typed tokens, scoped per-request or singleton as appropriate - **Pipeline pattern** — Rule executor and dispatcher use composable step-based pipelines - **Saved Objects** — Rules stored as hidden saved objects; notification policies stored as encrypted saved objects (for API key protection) - **Feature privileges** — Dedicated Kibana feature with read/all privileges for RBAC --- ## Contained PRs <details> <summary><strong>Core Engine & Plugin Init</strong> (12 PRs)</summary> - elastic#247283 — Init alerting v2 plugin (@cnasikas) - elastic#247452 — Add the alerting v2 feature privileges (@cnasikas) - elastic#247673 — Director (@cnasikas) - elastic#248306 — Create basic services (@cnasikas) - elastic#248696 — Initialize all resources (@cnasikas) - elastic#250023 — Schema package (@cnasikas) - elastic#250010 — YML Editor (@cnasikas) - elastic#251064 — Remove index.mode: lookup for RnA alert indices (@cnasikas) - elastic#251707 — Simplify task registration pattern (@kdelemme) - elastic#251876 — Dedicated user service (@cnasikas) - elastic#252073 — Use `kbn/data-streams` in alerting_v2 (@cnasikas) - elastic#255120 — Update alerting-v2 owner to new rna project team (@cnasikas) </details> <details> <summary><strong>Rule Execution Pipeline</strong> (12 PRs)</summary> - elastic#247472 — Add alerting v2 Rule Executor (@darnautov) - elastic#248285 — Alerting v2 rule HTTP APIs (@darnautov) - elastic#248728 — Add basic alert actions route (@darnautov) - elastic#250161 — Refactor rule executor to use a pipeline pattern (@darnautov) - elastic#252292 — Implement the CountTimeframeStrategy for the director (@cnasikas) - elastic#252544 — Add support of streaming in the rule executor (@darnautov) - elastic#252754 — Update rule attributes (@kdelemme) - elastic#253355 — Add getRules client method (@kdelemme) - elastic#253668 — Make evaluation.query.condition optional (@kdelemme) - elastic#254031 — Add recovery event generation to rule execution pipeline (@kdelemme) - elastic#255968 — ES&elastic#124;QL views (@adcoelho) - elastic#256697 — Create episodes ES&elastic#124;QL view (@adcoelho) </details> <details> <summary><strong>Alert Suppression & Episodes</strong> (3 PRs)</summary> - elastic#252174 — Alert suppression (@kdelemme) - elastic#256486 — Fix suppression query (@kdelemme) - elastic#256527 — Store 'unmatched' action for unmatched alert episodes (@kdelemme) </details> <details> <summary><strong>Dispatcher & Notification Engine</strong> (6 PRs)</summary> - elastic#250822 — Alerting v2 dispatcher (@kdelemme) - elastic#251529 — Use query service in dispatcher (@kdelemme) - elastic#251679 — Dispatcher task (@kdelemme) - elastic#252758 — Dispatcher notification policy (@kdelemme) - elastic#255332 — Wait for resources before scheduling dispatcher task (@kdelemme) - elastic#256536 — Use stored encrypted API keys from Notification Policy in dispatcher step (@kdelemme) </details> <details> <summary><strong>Notification Policies (Server)</strong> (4 PRs)</summary> - elastic#251336 — Introduce notification policy CRUD APIs and client (@cnasikas) - elastic#253134 — Update notification policy (@cnasikas) - elastic#254808 — Store API key owner on Notification Policy (@kdelemme) - elastic#256940 — Make notification policies global with optional rule-label scoping (@kdelemme) </details> <details> <summary><strong>Notification Policies UI</strong> (1 PR)</summary> - elastic#255599 — Add notification policies UI and Storybook form story (@adcoelho) </details> <details> <summary><strong>Rule Authoring UI</strong> (13 PRs)</summary> - elastic#250961 — Add create rule flyout in Discover (@adcoelho) - elastic#255111 — Add activation configuration fields to alerting V2 rule form (@yiannisnikolopoulos) - elastic#255427 — Rule form: provide services via context (@dominiqueclarke) - elastic#255876 — MVP rule form, Split evaluation condition, and Recovery configuration (@dominiqueclarke) - elastic#256260 — Foundational rule list (@dominiqueclarke) - elastic#256756 — Wire up edit flow (@dominiqueclarke) - elastic#256801 — Move consecutive breaches max to shared constants (@yiannisnikolopoulos) - elastic#256818 — Preview query and design parity (@dominiqueclarke) - elastic#256938 — Allow clearing number inputs in state transition fields (@yiannisnikolopoulos) - elastic#257017 — Add enable/disable and clone rule to rule list (@dominiqueclarke) - elastic#257246 — Remove all React.FC (@dominiqueclarke) - elastic#257415 — Rule form - fix test (@dominiqueclarke) - elastic#257454 — Block comma key in number input component (@yiannisnikolopoulos) </details> <details> <summary><strong>API Documentation & Schema</strong> (2 PRs)</summary> - elastic#254901 — Rename indexes for alert events and actions (@adcoelho) - elastic#255810 — OAS for alert action routes (@adcoelho) </details> <details> <summary><strong>Observability & Monitoring</strong> (3 PRs)</summary> - elastic#254925 — Add ApmMiddleware to the rule executor (@adcoelho) - elastic#255115 — Add the withAPM decorator and apply it to the rules_client (@adcoelho) - elastic#255999 — Fix linting problem in apm middleware (@adcoelho) </details> <details> <summary><strong>CI & Maintenance</strong> (2 PRs)</summary> - elastic#257409 — Refactor SO services to use inversify DI for client initialization (@darnautov) - Fix alerting-v2-schema jest config (@darnautov) </details> --- --------- Co-authored-by: Dima Arnautov <dmitrii.arnautov@elastic.co> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kevin Delemme <kevin.delemme@elastic.co> Co-authored-by: Mike Côté <mikecote@users.noreply.github.com> Co-authored-by: Antonio <antonio.coelho@elastic.co> Co-authored-by: Kevin Delemme <kdelemme@gmail.com> Co-authored-by: Dominique Clarke <dominique.clarke@elastic.co> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Jason Rhodes <jason.rhodes@elastic.co> Co-authored-by: Yiannis Nikolopoulos <yiannis.nikolopoulos@elastic.co> Co-authored-by: Alexi Doak <109488926+doakalexi@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Bailey Cash <bailey.cash@elastic.co> Co-authored-by: Anna Davydova <ana.davydova@elastic.co> Co-authored-by: Umberto Pepato <umbopepato@users.noreply.github.com> Co-authored-by: Jason Rhodes <jason.matthew.rhodes@gmail.com> Co-authored-by: Joana Cardoso <169058851+joana-cps@users.noreply.github.com>
Summary
Note
Dear reviewers, this PR is getting merged into a feature branch. Only the ResponseOps review is needed it at the moment. We will request for your review when we open the feature branch PR to be merged on
main.This PR initializes all resources needed for the alerting new engine.
Test
Ensure that the
.alert-events,.alert-transitions, and.alert-actionsdatastreams are created correctly, along with their ILM policies. Also, ensure that restarting Kibana after the creation of the datastreams do not produce errors.Fixes: https://github.com/elastic/rna-program/issues/72
Checklist
Check the PR satisfies following conditions.
Reviewers should verify this PR satisfies this list as well.