[Alerting] Alerting v2: Implement the CountTimeframeStrategy for the director#252292
Conversation
6f24b73 to
c9a8118
Compare
There was a problem hiding this comment.
Same tests but they are updated to use the new contract.
1325c6e to
a3ba038
Compare
|
Pinging @elastic/response-ops (Team:ResponseOps) |
…tion_tests/ci_checks
…b.com:cnasikas/kibana into alerting_v2_director_count_timeframe_strategy
…tion_tests/ci_checks
…b.com:cnasikas/kibana into alerting_v2_director_count_timeframe_strategy
…tion_tests/ci_checks
kdelemme
left a comment
There was a problem hiding this comment.
LGTM. Some small comments and nits and questions, but otherwise implementation looks solid (pun intended).
x-pack/platform/packages/shared/response-ops/alerting-v2-schemas/src/rule_data_schema.ts
Outdated
Show resolved
Hide resolved
| override readonly name = 'count_timeframe'; | ||
|
|
||
| override canHandle(rule: RuleResponse): boolean { | ||
| return rule.stateTransition != null; |
There was a problem hiding this comment.
should we also check for stateTransition not empty object? In this case, this strategy would act like the basic strategy but would do more work
There was a problem hiding this comment.
I considered it, and I was not sure what was best. What does it mean to provide an empty object for stateTransition? Probably, I want a transition but with the defaults. But without a stateTransition, you still get a transition, but the basic one :). I think the only difference is that if the defaults of the count strategy change, a null transition would be different from a {}, which is not ideal. So, yeah, you are right, I can fall back on the basic strategy to be consistent.
| if (!stateTransition) { | ||
| return basicResult; | ||
| } |
There was a problem hiding this comment.
does not hurt to have it but this is handled by the canHandle(). I guess you're doing that as a type guard
| schema.nullable( | ||
| schema.object({ | ||
| pendingOperator: schema.maybe(schema.oneOf([schema.literal('AND'), schema.literal('OR')])), | ||
| pendingCount: schema.maybe(schema.number()), |
There was a problem hiding this comment.
Does schema supports .min(0) as well? to be coherent with the runtime zod validation
There was a problem hiding this comment.
On SO schemas, we avoid using validation to avoid breaking Kibana on old, malformed data. But this is not the case with the rule because it is a new one. I will put a note to check with core and update if needed on another PR if you don't mind. We may need to do it to all of our attributes.
| stateTransition: NonNullable<StateTransition>, | ||
| nextStatus: AlertEpisodeStatus | ||
| ): boolean { | ||
| return stateTransition.recoveringCount === 0 && nextStatus === alertEpisodeStatus.recovering; |
There was a problem hiding this comment.
same question about recoveringCount being undefined?
There was a problem hiding this comment.
Yes it can be. If it is undefined we fall back to the default, which is 1.
There was a problem hiding this comment.
The fallback happens when we read the rule?
There was a problem hiding this comment.
No. It is implicit, not explicit. If you follow the code of the getNextState method, it will pass the stateTransition.<status>Count === 0 because it is undefined. It will go to either "Pending → Active threshold" or "Recovering → Inactive threshold". For this, the getNextStateTransition is being called, which in turn calls the isThresholdMet. The isThresholdMet will return true if the counts are undefined, simulating basically the counts being 1. I can change the code to make it explicit and have defaults to be sure. Wdyt?
|
|
||
| // --- Handle pending count of 0: skip pending, go directly to active --- | ||
| if (this.shouldSkipPending(stateTransition, basicResult.status)) { | ||
| return { status: alertEpisodeStatus.active, statusCount: DEFAULT_STATUS_COUNT }; |
There was a problem hiding this comment.
I thought we would not include the status_count for active and inactive, but only for recovering and pending. Maybe that's just an internal state not persisted on the alert-events later. Also, it does not really matter
There was a problem hiding this comment.
Maybe that's just an internal state not persisted on the alert-events later
It is persistent. I thought it would not hurt to be consistent. Let me ask and come back to you.
| operator: stateTransition.pendingOperator ?? 'OR', | ||
| count: stateTransition.pendingCount, | ||
| timeframeMs: stateTransition.pendingTimeframe | ||
| ? parseDurationToMs(stateTransition.pendingTimeframe) |
There was a problem hiding this comment.
Do we want to have a fallback around parseDurationMs failures?
| it('returns the count_timeframe strategy when stateTransition is an empty object', () => { | ||
| const rule = createRuleResponse({ stateTransition: {} }); | ||
| const resolved = factory.getStrategy(rule); | ||
| expect(resolved.name).toBe('count_timeframe'); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
should we not delegate to the basic in this case? It does the same thing but with a simpler code path
| const alertsWithNextEpisode = alertEvents.map((currentAlertEvent) => | ||
| this.getAlertEventWithNextEpisode({ | ||
| rule, | ||
| currentAlertEvent, | ||
| previousAlertEvent: alertStateByGroupHash.get(currentAlertEvent.group_hash), | ||
| strategy, |
There was a problem hiding this comment.
Is it possible we have multiple alert events for the same episode_id?
There was a problem hiding this comment.
Yes, it is possible. The rule execution will return multiple breached (or recovered) events, and for each alert event, the director calculates if it is a new episode or not. Check the resolveEpisodeId of the director.
There was a problem hiding this comment.
Ok, so my next question is should we compute the next state based always on the same previousAlertEvent?
There was a problem hiding this comment.
For each alert event, we always compute it to the latest alert event. If you see the director's queries, it fetches the latest episode of each rule/group pair. The previousAlertEvent is a bit misleading 🙂.
…b.com:cnasikas/kibana into alerting_v2_director_count_timeframe_strategy
...ck/platform/plugins/shared/alerting_v2/server/lib/director/strategies/basic_strategy.test.ts
Outdated
Show resolved
Hide resolved
x-pack/platform/plugins/shared/alerting_v2/server/lib/director/strategies/types.ts
Show resolved
Hide resolved
...m/plugins/shared/alerting_v2/server/lib/director/strategies/count_timeframe_strategy.test.ts
Show resolved
Hide resolved
...m/plugins/shared/alerting_v2/server/lib/director/strategies/count_timeframe_strategy.test.ts
Outdated
Show resolved
Hide resolved
...atform/plugins/shared/alerting_v2/server/lib/director/strategies/count_timeframe_strategy.ts
Outdated
Show resolved
Hide resolved
...atform/plugins/shared/alerting_v2/server/lib/director/strategies/count_timeframe_strategy.ts
Outdated
Show resolved
Hide resolved
darnautov
left a comment
There was a problem hiding this comment.
LGTM! Left some suggestions about the test helpers
…tion_tests/ci_checks
…b.com:cnasikas/kibana into alerting_v2_director_count_timeframe_strategy
💔 Build Failed
Failed CI StepsMetrics [docs]
History
cc @cnasikas |
## Summary ### Key capabilities - **ES|QL-native rule evaluation** — Rules are defined as ES|QL queries with optional WHERE clause conditions, evaluated on a configurable schedule - **Alert lifecycle management** — Full episode tracking with pending → active → recovering → inactive state transitions, including configurable alert delay (consecutive breaches / duration) - **Event-driven architecture** — Alert events and actions are stored in dedicated data streams (`.alerting-events`, `.alerting-actions`) with ES|QL views for querying - **Notification dispatch pipeline** — A multi-step dispatcher that matches alert episodes to notification policies, handles throttling/suppression, and triggers Kibana Workflows using encrypted API keys - **Notification policies** — CRUD APIs and UI for creating notification policies with KQL-based rule matching, workflow integration, and API key management - **Rule authoring UI** — A shared rule form package (`@kbn/alerting-v2-rule-form`) usable standalone or embedded in Discover, with ES|QL editor, WHERE clause condition editing, recovery configuration, and live query preview - **Rule management UI** — Full rule list with pagination, enable/disable, clone, edit, and delete operations - **APM instrumentation** — Middleware and decorators for tracing rule execution and client operations ### Architecture highlights - **InversifyJS DI** — All services use constructor injection with typed tokens, scoped per-request or singleton as appropriate - **Pipeline pattern** — Rule executor and dispatcher use composable step-based pipelines - **Saved Objects** — Rules stored as hidden saved objects; notification policies stored as encrypted saved objects (for API key protection) - **Feature privileges** — Dedicated Kibana feature with read/all privileges for RBAC --- ## Contained PRs <details> <summary><strong>Core Engine & Plugin Init</strong> (12 PRs)</summary> - #247283 — Init alerting v2 plugin (@cnasikas) - #247452 — Add the alerting v2 feature privileges (@cnasikas) - #247673 — Director (@cnasikas) - #248306 — Create basic services (@cnasikas) - #248696 — Initialize all resources (@cnasikas) - #250023 — Schema package (@cnasikas) - #250010 — YML Editor (@cnasikas) - #251064 — Remove index.mode: lookup for RnA alert indices (@cnasikas) - #251707 — Simplify task registration pattern (@kdelemme) - #251876 — Dedicated user service (@cnasikas) - #252073 — Use `kbn/data-streams` in alerting_v2 (@cnasikas) - #255120 — Update alerting-v2 owner to new rna project team (@cnasikas) </details> <details> <summary><strong>Rule Execution Pipeline</strong> (12 PRs)</summary> - #247472 — Add alerting v2 Rule Executor (@darnautov) - #248285 — Alerting v2 rule HTTP APIs (@darnautov) - #248728 — Add basic alert actions route (@darnautov) - #250161 — Refactor rule executor to use a pipeline pattern (@darnautov) - #252292 — Implement the CountTimeframeStrategy for the director (@cnasikas) - #252544 — Add support of streaming in the rule executor (@darnautov) - #252754 — Update rule attributes (@kdelemme) - #253355 — Add getRules client method (@kdelemme) - #253668 — Make evaluation.query.condition optional (@kdelemme) - #254031 — Add recovery event generation to rule execution pipeline (@kdelemme) - #255968 — ES|QL views (@adcoelho) - #256697 — Create episodes ES|QL view (@adcoelho) </details> <details> <summary><strong>Alert Suppression & Episodes</strong> (3 PRs)</summary> - #252174 — Alert suppression (@kdelemme) - #256486 — Fix suppression query (@kdelemme) - #256527 — Store 'unmatched' action for unmatched alert episodes (@kdelemme) </details> <details> <summary><strong>Dispatcher & Notification Engine</strong> (6 PRs)</summary> - #250822 — Alerting v2 dispatcher (@kdelemme) - #251529 — Use query service in dispatcher (@kdelemme) - #251679 — Dispatcher task (@kdelemme) - #252758 — Dispatcher notification policy (@kdelemme) - #255332 — Wait for resources before scheduling dispatcher task (@kdelemme) - #256536 — Use stored encrypted API keys from Notification Policy in dispatcher step (@kdelemme) </details> <details> <summary><strong>Notification Policies (Server)</strong> (4 PRs)</summary> - #251336 — Introduce notification policy CRUD APIs and client (@cnasikas) - #253134 — Update notification policy (@cnasikas) - #254808 — Store API key owner on Notification Policy (@kdelemme) - #256940 — Make notification policies global with optional rule-label scoping (@kdelemme) </details> <details> <summary><strong>Notification Policies UI</strong> (1 PR)</summary> - #255599 — Add notification policies UI and Storybook form story (@adcoelho) </details> <details> <summary><strong>Rule Authoring UI</strong> (13 PRs)</summary> - #250961 — Add create rule flyout in Discover (@adcoelho) - #255111 — Add activation configuration fields to alerting V2 rule form (@yiannisnikolopoulos) - #255427 — Rule form: provide services via context (@dominiqueclarke) - #255876 — MVP rule form, Split evaluation condition, and Recovery configuration (@dominiqueclarke) - #256260 — Foundational rule list (@dominiqueclarke) - #256756 — Wire up edit flow (@dominiqueclarke) - #256801 — Move consecutive breaches max to shared constants (@yiannisnikolopoulos) - #256818 — Preview query and design parity (@dominiqueclarke) - #256938 — Allow clearing number inputs in state transition fields (@yiannisnikolopoulos) - #257017 — Add enable/disable and clone rule to rule list (@dominiqueclarke) - #257246 — Remove all React.FC (@dominiqueclarke) - #257415 — Rule form - fix test (@dominiqueclarke) - #257454 — Block comma key in number input component (@yiannisnikolopoulos) </details> <details> <summary><strong>API Documentation & Schema</strong> (2 PRs)</summary> - #254901 — Rename indexes for alert events and actions (@adcoelho) - #255810 — OAS for alert action routes (@adcoelho) </details> <details> <summary><strong>Observability & Monitoring</strong> (3 PRs)</summary> - #254925 — Add ApmMiddleware to the rule executor (@adcoelho) - #255115 — Add the withAPM decorator and apply it to the rules_client (@adcoelho) - #255999 — Fix linting problem in apm middleware (@adcoelho) </details> <details> <summary><strong>CI & Maintenance</strong> (2 PRs)</summary> - #257409 — Refactor SO services to use inversify DI for client initialization (@darnautov) - Fix alerting-v2-schema jest config (@darnautov) </details> --- --------- Co-authored-by: Dima Arnautov <dmitrii.arnautov@elastic.co> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kevin Delemme <kevin.delemme@elastic.co> Co-authored-by: Mike Côté <mikecote@users.noreply.github.com> Co-authored-by: Antonio <antonio.coelho@elastic.co> Co-authored-by: Kevin Delemme <kdelemme@gmail.com> Co-authored-by: Dominique Clarke <dominique.clarke@elastic.co> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Jason Rhodes <jason.rhodes@elastic.co> Co-authored-by: Yiannis Nikolopoulos <yiannis.nikolopoulos@elastic.co> Co-authored-by: Alexi Doak <109488926+doakalexi@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Bailey Cash <bailey.cash@elastic.co> Co-authored-by: Anna Davydova <ana.davydova@elastic.co> Co-authored-by: Umberto Pepato <umbopepato@users.noreply.github.com> Co-authored-by: Jason Rhodes <jason.matthew.rhodes@gmail.com> Co-authored-by: Joana Cardoso <169058851+joana-cps@users.noreply.github.com>
## Summary ### Key capabilities - **ES|QL-native rule evaluation** — Rules are defined as ES|QL queries with optional WHERE clause conditions, evaluated on a configurable schedule - **Alert lifecycle management** — Full episode tracking with pending → active → recovering → inactive state transitions, including configurable alert delay (consecutive breaches / duration) - **Event-driven architecture** — Alert events and actions are stored in dedicated data streams (`.alerting-events`, `.alerting-actions`) with ES|QL views for querying - **Notification dispatch pipeline** — A multi-step dispatcher that matches alert episodes to notification policies, handles throttling/suppression, and triggers Kibana Workflows using encrypted API keys - **Notification policies** — CRUD APIs and UI for creating notification policies with KQL-based rule matching, workflow integration, and API key management - **Rule authoring UI** — A shared rule form package (`@kbn/alerting-v2-rule-form`) usable standalone or embedded in Discover, with ES|QL editor, WHERE clause condition editing, recovery configuration, and live query preview - **Rule management UI** — Full rule list with pagination, enable/disable, clone, edit, and delete operations - **APM instrumentation** — Middleware and decorators for tracing rule execution and client operations ### Architecture highlights - **InversifyJS DI** — All services use constructor injection with typed tokens, scoped per-request or singleton as appropriate - **Pipeline pattern** — Rule executor and dispatcher use composable step-based pipelines - **Saved Objects** — Rules stored as hidden saved objects; notification policies stored as encrypted saved objects (for API key protection) - **Feature privileges** — Dedicated Kibana feature with read/all privileges for RBAC --- ## Contained PRs <details> <summary><strong>Core Engine & Plugin Init</strong> (12 PRs)</summary> - elastic#247283 — Init alerting v2 plugin (@cnasikas) - elastic#247452 — Add the alerting v2 feature privileges (@cnasikas) - elastic#247673 — Director (@cnasikas) - elastic#248306 — Create basic services (@cnasikas) - elastic#248696 — Initialize all resources (@cnasikas) - elastic#250023 — Schema package (@cnasikas) - elastic#250010 — YML Editor (@cnasikas) - elastic#251064 — Remove index.mode: lookup for RnA alert indices (@cnasikas) - elastic#251707 — Simplify task registration pattern (@kdelemme) - elastic#251876 — Dedicated user service (@cnasikas) - elastic#252073 — Use `kbn/data-streams` in alerting_v2 (@cnasikas) - elastic#255120 — Update alerting-v2 owner to new rna project team (@cnasikas) </details> <details> <summary><strong>Rule Execution Pipeline</strong> (12 PRs)</summary> - elastic#247472 — Add alerting v2 Rule Executor (@darnautov) - elastic#248285 — Alerting v2 rule HTTP APIs (@darnautov) - elastic#248728 — Add basic alert actions route (@darnautov) - elastic#250161 — Refactor rule executor to use a pipeline pattern (@darnautov) - elastic#252292 — Implement the CountTimeframeStrategy for the director (@cnasikas) - elastic#252544 — Add support of streaming in the rule executor (@darnautov) - elastic#252754 — Update rule attributes (@kdelemme) - elastic#253355 — Add getRules client method (@kdelemme) - elastic#253668 — Make evaluation.query.condition optional (@kdelemme) - elastic#254031 — Add recovery event generation to rule execution pipeline (@kdelemme) - elastic#255968 — ES&elastic#124;QL views (@adcoelho) - elastic#256697 — Create episodes ES&elastic#124;QL view (@adcoelho) </details> <details> <summary><strong>Alert Suppression & Episodes</strong> (3 PRs)</summary> - elastic#252174 — Alert suppression (@kdelemme) - elastic#256486 — Fix suppression query (@kdelemme) - elastic#256527 — Store 'unmatched' action for unmatched alert episodes (@kdelemme) </details> <details> <summary><strong>Dispatcher & Notification Engine</strong> (6 PRs)</summary> - elastic#250822 — Alerting v2 dispatcher (@kdelemme) - elastic#251529 — Use query service in dispatcher (@kdelemme) - elastic#251679 — Dispatcher task (@kdelemme) - elastic#252758 — Dispatcher notification policy (@kdelemme) - elastic#255332 — Wait for resources before scheduling dispatcher task (@kdelemme) - elastic#256536 — Use stored encrypted API keys from Notification Policy in dispatcher step (@kdelemme) </details> <details> <summary><strong>Notification Policies (Server)</strong> (4 PRs)</summary> - elastic#251336 — Introduce notification policy CRUD APIs and client (@cnasikas) - elastic#253134 — Update notification policy (@cnasikas) - elastic#254808 — Store API key owner on Notification Policy (@kdelemme) - elastic#256940 — Make notification policies global with optional rule-label scoping (@kdelemme) </details> <details> <summary><strong>Notification Policies UI</strong> (1 PR)</summary> - elastic#255599 — Add notification policies UI and Storybook form story (@adcoelho) </details> <details> <summary><strong>Rule Authoring UI</strong> (13 PRs)</summary> - elastic#250961 — Add create rule flyout in Discover (@adcoelho) - elastic#255111 — Add activation configuration fields to alerting V2 rule form (@yiannisnikolopoulos) - elastic#255427 — Rule form: provide services via context (@dominiqueclarke) - elastic#255876 — MVP rule form, Split evaluation condition, and Recovery configuration (@dominiqueclarke) - elastic#256260 — Foundational rule list (@dominiqueclarke) - elastic#256756 — Wire up edit flow (@dominiqueclarke) - elastic#256801 — Move consecutive breaches max to shared constants (@yiannisnikolopoulos) - elastic#256818 — Preview query and design parity (@dominiqueclarke) - elastic#256938 — Allow clearing number inputs in state transition fields (@yiannisnikolopoulos) - elastic#257017 — Add enable/disable and clone rule to rule list (@dominiqueclarke) - elastic#257246 — Remove all React.FC (@dominiqueclarke) - elastic#257415 — Rule form - fix test (@dominiqueclarke) - elastic#257454 — Block comma key in number input component (@yiannisnikolopoulos) </details> <details> <summary><strong>API Documentation & Schema</strong> (2 PRs)</summary> - elastic#254901 — Rename indexes for alert events and actions (@adcoelho) - elastic#255810 — OAS for alert action routes (@adcoelho) </details> <details> <summary><strong>Observability & Monitoring</strong> (3 PRs)</summary> - elastic#254925 — Add ApmMiddleware to the rule executor (@adcoelho) - elastic#255115 — Add the withAPM decorator and apply it to the rules_client (@adcoelho) - elastic#255999 — Fix linting problem in apm middleware (@adcoelho) </details> <details> <summary><strong>CI & Maintenance</strong> (2 PRs)</summary> - elastic#257409 — Refactor SO services to use inversify DI for client initialization (@darnautov) - Fix alerting-v2-schema jest config (@darnautov) </details> --- --------- Co-authored-by: Dima Arnautov <dmitrii.arnautov@elastic.co> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kevin Delemme <kevin.delemme@elastic.co> Co-authored-by: Mike Côté <mikecote@users.noreply.github.com> Co-authored-by: Antonio <antonio.coelho@elastic.co> Co-authored-by: Kevin Delemme <kdelemme@gmail.com> Co-authored-by: Dominique Clarke <dominique.clarke@elastic.co> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Jason Rhodes <jason.rhodes@elastic.co> Co-authored-by: Yiannis Nikolopoulos <yiannis.nikolopoulos@elastic.co> Co-authored-by: Alexi Doak <109488926+doakalexi@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Bailey Cash <bailey.cash@elastic.co> Co-authored-by: Anna Davydova <ana.davydova@elastic.co> Co-authored-by: Umberto Pepato <umbopepato@users.noreply.github.com> Co-authored-by: Jason Rhodes <jason.matthew.rhodes@gmail.com> Co-authored-by: Joana Cardoso <169058851+joana-cps@users.noreply.github.com>
- **ES|QL-native rule evaluation** — Rules are defined as ES|QL queries with optional WHERE clause conditions, evaluated on a configurable schedule - **Alert lifecycle management** — Full episode tracking with pending → active → recovering → inactive state transitions, including configurable alert delay (consecutive breaches / duration) - **Event-driven architecture** — Alert events and actions are stored in dedicated data streams (`.alerting-events`, `.alerting-actions`) with ES|QL views for querying - **Notification dispatch pipeline** — A multi-step dispatcher that matches alert episodes to notification policies, handles throttling/suppression, and triggers Kibana Workflows using encrypted API keys - **Notification policies** — CRUD APIs and UI for creating notification policies with KQL-based rule matching, workflow integration, and API key management - **Rule authoring UI** — A shared rule form package (`@kbn/alerting-v2-rule-form`) usable standalone or embedded in Discover, with ES|QL editor, WHERE clause condition editing, recovery configuration, and live query preview - **Rule management UI** — Full rule list with pagination, enable/disable, clone, edit, and delete operations - **APM instrumentation** — Middleware and decorators for tracing rule execution and client operations - **InversifyJS DI** — All services use constructor injection with typed tokens, scoped per-request or singleton as appropriate - **Pipeline pattern** — Rule executor and dispatcher use composable step-based pipelines - **Saved Objects** — Rules stored as hidden saved objects; notification policies stored as encrypted saved objects (for API key protection) - **Feature privileges** — Dedicated Kibana feature with read/all privileges for RBAC --- <details> <summary><strong>Core Engine & Plugin Init</strong> (12 PRs)</summary> - elastic#247283 — Init alerting v2 plugin (@cnasikas) - elastic#247452 — Add the alerting v2 feature privileges (@cnasikas) - elastic#247673 — Director (@cnasikas) - elastic#248306 — Create basic services (@cnasikas) - elastic#248696 — Initialize all resources (@cnasikas) - elastic#250023 — Schema package (@cnasikas) - elastic#250010 — YML Editor (@cnasikas) - elastic#251064 — Remove index.mode: lookup for RnA alert indices (@cnasikas) - elastic#251707 — Simplify task registration pattern (@kdelemme) - elastic#251876 — Dedicated user service (@cnasikas) - elastic#252073 — Use `kbn/data-streams` in alerting_v2 (@cnasikas) - elastic#255120 — Update alerting-v2 owner to new rna project team (@cnasikas) </details> <details> <summary><strong>Rule Execution Pipeline</strong> (12 PRs)</summary> - elastic#247472 — Add alerting v2 Rule Executor (@darnautov) - elastic#248285 — Alerting v2 rule HTTP APIs (@darnautov) - elastic#248728 — Add basic alert actions route (@darnautov) - elastic#250161 — Refactor rule executor to use a pipeline pattern (@darnautov) - elastic#252292 — Implement the CountTimeframeStrategy for the director (@cnasikas) - elastic#252544 — Add support of streaming in the rule executor (@darnautov) - elastic#252754 — Update rule attributes (@kdelemme) - elastic#253355 — Add getRules client method (@kdelemme) - elastic#253668 — Make evaluation.query.condition optional (@kdelemme) - elastic#254031 — Add recovery event generation to rule execution pipeline (@kdelemme) - elastic#255968 — ES&elastic#124;QL views (@adcoelho) - elastic#256697 — Create episodes ES&elastic#124;QL view (@adcoelho) </details> <details> <summary><strong>Alert Suppression & Episodes</strong> (3 PRs)</summary> - elastic#252174 — Alert suppression (@kdelemme) - elastic#256486 — Fix suppression query (@kdelemme) - elastic#256527 — Store 'unmatched' action for unmatched alert episodes (@kdelemme) </details> <details> <summary><strong>Dispatcher & Notification Engine</strong> (6 PRs)</summary> - elastic#250822 — Alerting v2 dispatcher (@kdelemme) - elastic#251529 — Use query service in dispatcher (@kdelemme) - elastic#251679 — Dispatcher task (@kdelemme) - elastic#252758 — Dispatcher notification policy (@kdelemme) - elastic#255332 — Wait for resources before scheduling dispatcher task (@kdelemme) - elastic#256536 — Use stored encrypted API keys from Notification Policy in dispatcher step (@kdelemme) </details> <details> <summary><strong>Notification Policies (Server)</strong> (4 PRs)</summary> - elastic#251336 — Introduce notification policy CRUD APIs and client (@cnasikas) - elastic#253134 — Update notification policy (@cnasikas) - elastic#254808 — Store API key owner on Notification Policy (@kdelemme) - elastic#256940 — Make notification policies global with optional rule-label scoping (@kdelemme) </details> <details> <summary><strong>Notification Policies UI</strong> (1 PR)</summary> - elastic#255599 — Add notification policies UI and Storybook form story (@adcoelho) </details> <details> <summary><strong>Rule Authoring UI</strong> (13 PRs)</summary> - elastic#250961 — Add create rule flyout in Discover (@adcoelho) - elastic#255111 — Add activation configuration fields to alerting V2 rule form (@yiannisnikolopoulos) - elastic#255427 — Rule form: provide services via context (@dominiqueclarke) - elastic#255876 — MVP rule form, Split evaluation condition, and Recovery configuration (@dominiqueclarke) - elastic#256260 — Foundational rule list (@dominiqueclarke) - elastic#256756 — Wire up edit flow (@dominiqueclarke) - elastic#256801 — Move consecutive breaches max to shared constants (@yiannisnikolopoulos) - elastic#256818 — Preview query and design parity (@dominiqueclarke) - elastic#256938 — Allow clearing number inputs in state transition fields (@yiannisnikolopoulos) - elastic#257017 — Add enable/disable and clone rule to rule list (@dominiqueclarke) - elastic#257246 — Remove all React.FC (@dominiqueclarke) - elastic#257415 — Rule form - fix test (@dominiqueclarke) - elastic#257454 — Block comma key in number input component (@yiannisnikolopoulos) </details> <details> <summary><strong>API Documentation & Schema</strong> (2 PRs)</summary> - elastic#254901 — Rename indexes for alert events and actions (@adcoelho) - elastic#255810 — OAS for alert action routes (@adcoelho) </details> <details> <summary><strong>Observability & Monitoring</strong> (3 PRs)</summary> - elastic#254925 — Add ApmMiddleware to the rule executor (@adcoelho) - elastic#255115 — Add the withAPM decorator and apply it to the rules_client (@adcoelho) - elastic#255999 — Fix linting problem in apm middleware (@adcoelho) </details> <details> <summary><strong>CI & Maintenance</strong> (2 PRs)</summary> - elastic#257409 — Refactor SO services to use inversify DI for client initialization (@darnautov) - Fix alerting-v2-schema jest config (@darnautov) </details> --- --------- Co-authored-by: Dima Arnautov <dmitrii.arnautov@elastic.co> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kevin Delemme <kevin.delemme@elastic.co> Co-authored-by: Mike Côté <mikecote@users.noreply.github.com> Co-authored-by: Antonio <antonio.coelho@elastic.co> Co-authored-by: Kevin Delemme <kdelemme@gmail.com> Co-authored-by: Dominique Clarke <dominique.clarke@elastic.co> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Jason Rhodes <jason.rhodes@elastic.co> Co-authored-by: Yiannis Nikolopoulos <yiannis.nikolopoulos@elastic.co> Co-authored-by: Alexi Doak <109488926+doakalexi@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Bailey Cash <bailey.cash@elastic.co> Co-authored-by: Anna Davydova <ana.davydova@elastic.co> Co-authored-by: Umberto Pepato <umbopepato@users.noreply.github.com> Co-authored-by: Jason Rhodes <jason.matthew.rhodes@gmail.com> Co-authored-by: Joana Cardoso <169058851+joana-cps@users.noreply.github.com>
- **ES|QL-native rule evaluation** — Rules are defined as ES|QL queries with optional WHERE clause conditions, evaluated on a configurable schedule - **Alert lifecycle management** — Full episode tracking with pending → active → recovering → inactive state transitions, including configurable alert delay (consecutive breaches / duration) - **Event-driven architecture** — Alert events and actions are stored in dedicated data streams (`.alerting-events`, `.alerting-actions`) with ES|QL views for querying - **Notification dispatch pipeline** — A multi-step dispatcher that matches alert episodes to notification policies, handles throttling/suppression, and triggers Kibana Workflows using encrypted API keys - **Notification policies** — CRUD APIs and UI for creating notification policies with KQL-based rule matching, workflow integration, and API key management - **Rule authoring UI** — A shared rule form package (`@kbn/alerting-v2-rule-form`) usable standalone or embedded in Discover, with ES|QL editor, WHERE clause condition editing, recovery configuration, and live query preview - **Rule management UI** — Full rule list with pagination, enable/disable, clone, edit, and delete operations - **APM instrumentation** — Middleware and decorators for tracing rule execution and client operations - **InversifyJS DI** — All services use constructor injection with typed tokens, scoped per-request or singleton as appropriate - **Pipeline pattern** — Rule executor and dispatcher use composable step-based pipelines - **Saved Objects** — Rules stored as hidden saved objects; notification policies stored as encrypted saved objects (for API key protection) - **Feature privileges** — Dedicated Kibana feature with read/all privileges for RBAC --- <details> <summary><strong>Core Engine & Plugin Init</strong> (12 PRs)</summary> - elastic#247283 — Init alerting v2 plugin (@cnasikas) - elastic#247452 — Add the alerting v2 feature privileges (@cnasikas) - elastic#247673 — Director (@cnasikas) - elastic#248306 — Create basic services (@cnasikas) - elastic#248696 — Initialize all resources (@cnasikas) - elastic#250023 — Schema package (@cnasikas) - elastic#250010 — YML Editor (@cnasikas) - elastic#251064 — Remove index.mode: lookup for RnA alert indices (@cnasikas) - elastic#251707 — Simplify task registration pattern (@kdelemme) - elastic#251876 — Dedicated user service (@cnasikas) - elastic#252073 — Use `kbn/data-streams` in alerting_v2 (@cnasikas) - elastic#255120 — Update alerting-v2 owner to new rna project team (@cnasikas) </details> <details> <summary><strong>Rule Execution Pipeline</strong> (12 PRs)</summary> - elastic#247472 — Add alerting v2 Rule Executor (@darnautov) - elastic#248285 — Alerting v2 rule HTTP APIs (@darnautov) - elastic#248728 — Add basic alert actions route (@darnautov) - elastic#250161 — Refactor rule executor to use a pipeline pattern (@darnautov) - elastic#252292 — Implement the CountTimeframeStrategy for the director (@cnasikas) - elastic#252544 — Add support of streaming in the rule executor (@darnautov) - elastic#252754 — Update rule attributes (@kdelemme) - elastic#253355 — Add getRules client method (@kdelemme) - elastic#253668 — Make evaluation.query.condition optional (@kdelemme) - elastic#254031 — Add recovery event generation to rule execution pipeline (@kdelemme) - elastic#255968 — ES&elastic#124;QL views (@adcoelho) - elastic#256697 — Create episodes ES&elastic#124;QL view (@adcoelho) </details> <details> <summary><strong>Alert Suppression & Episodes</strong> (3 PRs)</summary> - elastic#252174 — Alert suppression (@kdelemme) - elastic#256486 — Fix suppression query (@kdelemme) - elastic#256527 — Store 'unmatched' action for unmatched alert episodes (@kdelemme) </details> <details> <summary><strong>Dispatcher & Notification Engine</strong> (6 PRs)</summary> - elastic#250822 — Alerting v2 dispatcher (@kdelemme) - elastic#251529 — Use query service in dispatcher (@kdelemme) - elastic#251679 — Dispatcher task (@kdelemme) - elastic#252758 — Dispatcher notification policy (@kdelemme) - elastic#255332 — Wait for resources before scheduling dispatcher task (@kdelemme) - elastic#256536 — Use stored encrypted API keys from Notification Policy in dispatcher step (@kdelemme) </details> <details> <summary><strong>Notification Policies (Server)</strong> (4 PRs)</summary> - elastic#251336 — Introduce notification policy CRUD APIs and client (@cnasikas) - elastic#253134 — Update notification policy (@cnasikas) - elastic#254808 — Store API key owner on Notification Policy (@kdelemme) - elastic#256940 — Make notification policies global with optional rule-label scoping (@kdelemme) </details> <details> <summary><strong>Notification Policies UI</strong> (1 PR)</summary> - elastic#255599 — Add notification policies UI and Storybook form story (@adcoelho) </details> <details> <summary><strong>Rule Authoring UI</strong> (13 PRs)</summary> - elastic#250961 — Add create rule flyout in Discover (@adcoelho) - elastic#255111 — Add activation configuration fields to alerting V2 rule form (@yiannisnikolopoulos) - elastic#255427 — Rule form: provide services via context (@dominiqueclarke) - elastic#255876 — MVP rule form, Split evaluation condition, and Recovery configuration (@dominiqueclarke) - elastic#256260 — Foundational rule list (@dominiqueclarke) - elastic#256756 — Wire up edit flow (@dominiqueclarke) - elastic#256801 — Move consecutive breaches max to shared constants (@yiannisnikolopoulos) - elastic#256818 — Preview query and design parity (@dominiqueclarke) - elastic#256938 — Allow clearing number inputs in state transition fields (@yiannisnikolopoulos) - elastic#257017 — Add enable/disable and clone rule to rule list (@dominiqueclarke) - elastic#257246 — Remove all React.FC (@dominiqueclarke) - elastic#257415 — Rule form - fix test (@dominiqueclarke) - elastic#257454 — Block comma key in number input component (@yiannisnikolopoulos) </details> <details> <summary><strong>API Documentation & Schema</strong> (2 PRs)</summary> - elastic#254901 — Rename indexes for alert events and actions (@adcoelho) - elastic#255810 — OAS for alert action routes (@adcoelho) </details> <details> <summary><strong>Observability & Monitoring</strong> (3 PRs)</summary> - elastic#254925 — Add ApmMiddleware to the rule executor (@adcoelho) - elastic#255115 — Add the withAPM decorator and apply it to the rules_client (@adcoelho) - elastic#255999 — Fix linting problem in apm middleware (@adcoelho) </details> <details> <summary><strong>CI & Maintenance</strong> (2 PRs)</summary> - elastic#257409 — Refactor SO services to use inversify DI for client initialization (@darnautov) - Fix alerting-v2-schema jest config (@darnautov) </details> --- --------- Co-authored-by: Dima Arnautov <dmitrii.arnautov@elastic.co> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kevin Delemme <kevin.delemme@elastic.co> Co-authored-by: Mike Côté <mikecote@users.noreply.github.com> Co-authored-by: Antonio <antonio.coelho@elastic.co> Co-authored-by: Kevin Delemme <kdelemme@gmail.com> Co-authored-by: Dominique Clarke <dominique.clarke@elastic.co> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Jason Rhodes <jason.rhodes@elastic.co> Co-authored-by: Yiannis Nikolopoulos <yiannis.nikolopoulos@elastic.co> Co-authored-by: Alexi Doak <109488926+doakalexi@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Bailey Cash <bailey.cash@elastic.co> Co-authored-by: Anna Davydova <ana.davydova@elastic.co> Co-authored-by: Umberto Pepato <umbopepato@users.noreply.github.com> Co-authored-by: Jason Rhodes <jason.matthew.rhodes@gmail.com> Co-authored-by: Joana Cardoso <169058851+joana-cps@users.noreply.github.com>
## Summary ### Key capabilities - **ES|QL-native rule evaluation** — Rules are defined as ES|QL queries with optional WHERE clause conditions, evaluated on a configurable schedule - **Alert lifecycle management** — Full episode tracking with pending → active → recovering → inactive state transitions, including configurable alert delay (consecutive breaches / duration) - **Event-driven architecture** — Alert events and actions are stored in dedicated data streams (`.alerting-events`, `.alerting-actions`) with ES|QL views for querying - **Notification dispatch pipeline** — A multi-step dispatcher that matches alert episodes to notification policies, handles throttling/suppression, and triggers Kibana Workflows using encrypted API keys - **Notification policies** — CRUD APIs and UI for creating notification policies with KQL-based rule matching, workflow integration, and API key management - **Rule authoring UI** — A shared rule form package (`@kbn/alerting-v2-rule-form`) usable standalone or embedded in Discover, with ES|QL editor, WHERE clause condition editing, recovery configuration, and live query preview - **Rule management UI** — Full rule list with pagination, enable/disable, clone, edit, and delete operations - **APM instrumentation** — Middleware and decorators for tracing rule execution and client operations ### Architecture highlights - **InversifyJS DI** — All services use constructor injection with typed tokens, scoped per-request or singleton as appropriate - **Pipeline pattern** — Rule executor and dispatcher use composable step-based pipelines - **Saved Objects** — Rules stored as hidden saved objects; notification policies stored as encrypted saved objects (for API key protection) - **Feature privileges** — Dedicated Kibana feature with read/all privileges for RBAC --- ## Contained PRs <details> <summary><strong>Core Engine & Plugin Init</strong> (12 PRs)</summary> - elastic#247283 — Init alerting v2 plugin (@cnasikas) - elastic#247452 — Add the alerting v2 feature privileges (@cnasikas) - elastic#247673 — Director (@cnasikas) - elastic#248306 — Create basic services (@cnasikas) - elastic#248696 — Initialize all resources (@cnasikas) - elastic#250023 — Schema package (@cnasikas) - elastic#250010 — YML Editor (@cnasikas) - elastic#251064 — Remove index.mode: lookup for RnA alert indices (@cnasikas) - elastic#251707 — Simplify task registration pattern (@kdelemme) - elastic#251876 — Dedicated user service (@cnasikas) - elastic#252073 — Use `kbn/data-streams` in alerting_v2 (@cnasikas) - elastic#255120 — Update alerting-v2 owner to new rna project team (@cnasikas) </details> <details> <summary><strong>Rule Execution Pipeline</strong> (12 PRs)</summary> - elastic#247472 — Add alerting v2 Rule Executor (@darnautov) - elastic#248285 — Alerting v2 rule HTTP APIs (@darnautov) - elastic#248728 — Add basic alert actions route (@darnautov) - elastic#250161 — Refactor rule executor to use a pipeline pattern (@darnautov) - elastic#252292 — Implement the CountTimeframeStrategy for the director (@cnasikas) - elastic#252544 — Add support of streaming in the rule executor (@darnautov) - elastic#252754 — Update rule attributes (@kdelemme) - elastic#253355 — Add getRules client method (@kdelemme) - elastic#253668 — Make evaluation.query.condition optional (@kdelemme) - elastic#254031 — Add recovery event generation to rule execution pipeline (@kdelemme) - elastic#255968 — ES&elastic#124;QL views (@adcoelho) - elastic#256697 — Create episodes ES&elastic#124;QL view (@adcoelho) </details> <details> <summary><strong>Alert Suppression & Episodes</strong> (3 PRs)</summary> - elastic#252174 — Alert suppression (@kdelemme) - elastic#256486 — Fix suppression query (@kdelemme) - elastic#256527 — Store 'unmatched' action for unmatched alert episodes (@kdelemme) </details> <details> <summary><strong>Dispatcher & Notification Engine</strong> (6 PRs)</summary> - elastic#250822 — Alerting v2 dispatcher (@kdelemme) - elastic#251529 — Use query service in dispatcher (@kdelemme) - elastic#251679 — Dispatcher task (@kdelemme) - elastic#252758 — Dispatcher notification policy (@kdelemme) - elastic#255332 — Wait for resources before scheduling dispatcher task (@kdelemme) - elastic#256536 — Use stored encrypted API keys from Notification Policy in dispatcher step (@kdelemme) </details> <details> <summary><strong>Notification Policies (Server)</strong> (4 PRs)</summary> - elastic#251336 — Introduce notification policy CRUD APIs and client (@cnasikas) - elastic#253134 — Update notification policy (@cnasikas) - elastic#254808 — Store API key owner on Notification Policy (@kdelemme) - elastic#256940 — Make notification policies global with optional rule-label scoping (@kdelemme) </details> <details> <summary><strong>Notification Policies UI</strong> (1 PR)</summary> - elastic#255599 — Add notification policies UI and Storybook form story (@adcoelho) </details> <details> <summary><strong>Rule Authoring UI</strong> (13 PRs)</summary> - elastic#250961 — Add create rule flyout in Discover (@adcoelho) - elastic#255111 — Add activation configuration fields to alerting V2 rule form (@yiannisnikolopoulos) - elastic#255427 — Rule form: provide services via context (@dominiqueclarke) - elastic#255876 — MVP rule form, Split evaluation condition, and Recovery configuration (@dominiqueclarke) - elastic#256260 — Foundational rule list (@dominiqueclarke) - elastic#256756 — Wire up edit flow (@dominiqueclarke) - elastic#256801 — Move consecutive breaches max to shared constants (@yiannisnikolopoulos) - elastic#256818 — Preview query and design parity (@dominiqueclarke) - elastic#256938 — Allow clearing number inputs in state transition fields (@yiannisnikolopoulos) - elastic#257017 — Add enable/disable and clone rule to rule list (@dominiqueclarke) - elastic#257246 — Remove all React.FC (@dominiqueclarke) - elastic#257415 — Rule form - fix test (@dominiqueclarke) - elastic#257454 — Block comma key in number input component (@yiannisnikolopoulos) </details> <details> <summary><strong>API Documentation & Schema</strong> (2 PRs)</summary> - elastic#254901 — Rename indexes for alert events and actions (@adcoelho) - elastic#255810 — OAS for alert action routes (@adcoelho) </details> <details> <summary><strong>Observability & Monitoring</strong> (3 PRs)</summary> - elastic#254925 — Add ApmMiddleware to the rule executor (@adcoelho) - elastic#255115 — Add the withAPM decorator and apply it to the rules_client (@adcoelho) - elastic#255999 — Fix linting problem in apm middleware (@adcoelho) </details> <details> <summary><strong>CI & Maintenance</strong> (2 PRs)</summary> - elastic#257409 — Refactor SO services to use inversify DI for client initialization (@darnautov) - Fix alerting-v2-schema jest config (@darnautov) </details> --- --------- Co-authored-by: Dima Arnautov <dmitrii.arnautov@elastic.co> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kevin Delemme <kevin.delemme@elastic.co> Co-authored-by: Mike Côté <mikecote@users.noreply.github.com> Co-authored-by: Antonio <antonio.coelho@elastic.co> Co-authored-by: Kevin Delemme <kdelemme@gmail.com> Co-authored-by: Dominique Clarke <dominique.clarke@elastic.co> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Jason Rhodes <jason.rhodes@elastic.co> Co-authored-by: Yiannis Nikolopoulos <yiannis.nikolopoulos@elastic.co> Co-authored-by: Alexi Doak <109488926+doakalexi@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Bailey Cash <bailey.cash@elastic.co> Co-authored-by: Anna Davydova <ana.davydova@elastic.co> Co-authored-by: Umberto Pepato <umbopepato@users.noreply.github.com> Co-authored-by: Jason Rhodes <jason.matthew.rhodes@gmail.com> Co-authored-by: Joana Cardoso <169058851+joana-cps@users.noreply.github.com>
Summary
Note
Dear reviewers, this PR is getting merged into a feature branch. Only the ResponseOps review is needed it at the moment. We will request for your review when we open the feature branch PR to be merged on
main.This PR implements the CountTimeframeStrategy for the director. It takes into account the rule state transition configuration and applies state changes based on that.
The
CountTimeframeStrategyextendsBasicTransitionStrategyand preserves the same state machine behavior by default, while adding threshold logic for:pending -> activeandrecovering -> inactive. Thresholds are calculated from a) counts (pendingCount,recoveringCount), b) timeframe (pendingTimeframe,recoveringTimeframe) or both, combined with operator (AND/OR). If a rule has nostateTransition, the director usesBasicTransitionStrategyas a fallback.Checklist
Check the PR satisfies following conditions.
Reviewers should verify this PR satisfies this list as well.