Skip to content

[One workflow] event-driven scheduling and trigger pagination#257283

Merged
yngrdyn merged 10 commits intoelastic:mainfrom
yngrdyn:event-driven-triggers-perf-improvements
Mar 16, 2026
Merged

[One workflow] event-driven scheduling and trigger pagination#257283
yngrdyn merged 10 commits intoelastic:mainfrom
yngrdyn:event-driven-triggers-perf-improvements

Conversation

@yngrdyn
Copy link
Copy Markdown
Contributor

@yngrdyn yngrdyn commented Mar 11, 2026

Addresses open review comments on event-driven workflow execution: concurrency/back pressure, pagination and related tests.

Summary

When many workflows are subscribed to a trigger, emitting an event can overload the node if we run them all inline. This PR schedules event-driven workflow runs via TM (same model as alerts) and caps concurrent scheduling with p-limit so es and TM are not flooded. It also fixes silent truncation when more than 1000 workflows match a trigger by implementing PIT + search_after in getWorkflowsSubscribedToTrigger, so all matching workflows are returned.

@yngrdyn yngrdyn self-assigned this Mar 11, 2026
@yngrdyn yngrdyn requested a review from a team as a code owner March 11, 2026 20:22
@botelastic botelastic bot added the Team:One Workflow Team label for One Workflow (Workflow automation) label Mar 11, 2026
@yngrdyn yngrdyn added backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes labels Mar 11, 2026
let searchAfter: estypes.SearchHit['sort'] | undefined;
let hasMore = true;

while (hasMore) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to limit this to something?

Something like:

while (hasMore && pageCount < MAX_PAGES) {

to avoid unbounded sequential ES requests.
Alternatively the pageCount can be introduced to a more precise hasMore check (see below).

Copy link
Copy Markdown
Contributor Author

@yngrdyn yngrdyn Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 8b987ae, although I'm not sure about the number I used. This number will be confirmed on our next sync (it's arbitrary) but I want this PR to not be blocked by this

@yngrdyn yngrdyn requested a review from dej611 March 13, 2026 17:44
Comment on lines +739 to +741
const pageSize = 1000;
const MAX_PAGES = 100;
const keepAlive = '1m';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1000 * 100 workflows in 1 minute sounds quite a challenging number.
Perhaps the amount of workflow can be lowered down a bit here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the moment this is an arbitrary number, I'd like to take this discussion to our next sync

expect(maxConcurrent).toBeLessThanOrEqual(20);
});

it('should resolve after all scheduleWorkflow calls are invoked (fire-and-forget scheduling)', async () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is relaly fire-and-forget? The code is waiting with allSettled inside

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's waiting for the scheduling not the run of those workflows

@yngrdyn yngrdyn enabled auto-merge (squash) March 16, 2026 10:56
@yngrdyn yngrdyn merged commit 2676084 into elastic:main Mar 16, 2026
18 checks passed
@elasticmachine
Copy link
Copy Markdown
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

✅ unchanged

History

cc @yngrdyn

sorenlouv pushed a commit that referenced this pull request Mar 17, 2026
Addresses open review comments on [event-driven workflow
execution](#254964):
concurrency/back pressure, pagination and related tests.

## Summary

When many workflows are subscribed to a trigger, emitting an event can
overload the node if we run them all inline. This PR **schedules**
event-driven workflow runs via **TM** (same model as alerts) and caps
concurrent **scheduling** with **p-limit** so `es` and `TM` are not
flooded. It also fixes **silent truncation** when more than 1000
workflows match a trigger by implementing **PIT + search_after** in
`getWorkflowsSubscribedToTrigger`, so all matching workflows are
returned.
jeramysoucy pushed a commit to jeramysoucy/kibana that referenced this pull request Mar 26, 2026
…c#257283)

Addresses open review comments on [event-driven workflow
execution](elastic#254964):
concurrency/back pressure, pagination and related tests.

## Summary

When many workflows are subscribed to a trigger, emitting an event can
overload the node if we run them all inline. This PR **schedules**
event-driven workflow runs via **TM** (same model as alerts) and caps
concurrent **scheduling** with **p-limit** so `es` and `TM` are not
flooded. It also fixes **silent truncation** when more than 1000
workflows match a trigger by implementing **PIT + search_after** in
`getWorkflowsSubscribedToTrigger`, so all matching workflows are
returned.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:One Workflow Team label for One Workflow (Workflow automation) v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants