Clear watch schedules when starting trigger engine by michalborek · Pull Request #145325 · elastic/elasticsearch

michalborek · 2026-03-31T12:39:04Z

When new nodes appear in the cluster the watch may change the node it is running on.
If we don't reset the schedules for a given node, there is a chance multiple nodes will execute the watch.

This could be observed in #131964 where some watcher actions were unexpectedly throttled (it happened that the same watch was run on two nodes and one of the execution was throttled.

Solving this problem uncovers another issue, related to watchers. When a watcher service is being reloaded (e.g. due to watcher allocation changes), and at the same time a new watcher is added, the WatcherIndexingListener is directly modifying the TickerScheduleTriggerEngine and the trigger engine start method reverts this operation.

To fix that, the recently added watchers will be treated separately from the rest and will be re-added during the start operation even if the fresh watcher has not yet been searchable.

Closes #131964
Closes #137562

elasticsearchmachine · 2026-03-31T12:39:48Z

Pinging @elastic/es-distributed (Team:Distributed)

elasticsearchmachine · 2026-03-31T12:39:49Z

Hi @michalborek, I've created a changelog YAML for you.

coderabbitai · 2026-03-31T12:47:56Z

📝 Walkthrough

Walkthrough

This pull request addresses an issue in the Watcher trigger engine where watch schedules were not being properly cleared on restart. The TickerScheduleTriggerEngine.start() method now explicitly clears the existing schedules map before repopulating it with new schedules, instead of only using putAll() to merge entries. A corresponding test has been added to verify that previous watch schedules are cleaned up when the engine restarts with a new set of watches. Additionally, logging has been enhanced in PeriodThrottler to track throttling decisions. A changelog entry documents this bug fix.

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	The PR implementation directly addresses the root cause identified in `#131964`: clearing watch schedules on trigger engine start prevents duplicate concurrent executions when node allocations change.
Out of Scope Changes check	✅ Passed	All changes are scoped to the stated objective: clear() in TickerScheduleTriggerEngine, debug logging in PeriodThrottler, test coverage, and changelog entry.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

🛠️ Update Documentation: Commit on current branch
🛠️ Update Documentation: Create PR

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai · 2026-03-31T12:49:48Z

⏳ Starting custom recipe Update Documentation (Local Commit)...

michalborek · 2026-03-31T12:53:28Z

@coderabbitai full review

coderabbitai · 2026-03-31T12:53:39Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/watcher/actions/throttler/PeriodThrottler.java`:
- Around line 9-10: Replace the Log4j imports and logger usage with
Elasticsearch logging types: remove imports of
org.apache.logging.log4j.LogManager and org.apache.logging.log4j.Logger and
import org.elasticsearch.logging.LogManager and org.elasticsearch.logging.Logger
instead, then update the logger declaration (the static LOGGER variable created
via LogManager.getLogger(...) in PeriodThrottler) to use the Elasticsearch
LogManager/Logger types so the class uses org.elasticsearch.logging.Logger
throughout.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: ba7e98ea-aefb-4330-8cfb-90d1365f8003

📥 Commits

Reviewing files that changed from the base of the PR and between b1bfa5b and 30e3c49.

📒 Files selected for processing (4)

docs/changelog/145325.yaml
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/watcher/actions/throttler/PeriodThrottler.java
x-pack/plugin/watcher/src/main/java/org/elasticsearch/xpack/watcher/trigger/schedule/engine/TickerScheduleTriggerEngine.java
x-pack/plugin/watcher/src/test/java/org/elasticsearch/xpack/watcher/trigger/schedule/engine/TickerScheduleEngineTests.java

...re/src/main/java/org/elasticsearch/xpack/core/watcher/actions/throttler/PeriodThrottler.java

elasticsearchmachine · 2026-04-01T09:20:12Z

Hi @michalborek, I've created a changelog YAML for you.

When new nodes appear in the cluster the watcher allocation may change If we don't reset the schedules for a given node, there is a chance multiple nodes will execute the watch.

elasticsearchmachine · 2026-04-01T13:17:03Z

Hi @michalborek, I've updated the changelog YAML for you.

...ava/org/elasticsearch/xpack/watcher/trigger/schedule/engine/TickerScheduleTriggerEngine.java

masseyke

I think this is a big improvement. I think there might be another incredibly unlikely race condition or two (for example, remove doesn't look in recentlyAddedSchedules, but that would only be an issue if you called remove and add quickly, and I think only while the engine was also pausing and restarting -- if you do all 4 of those things in rapid succession, you've got to expect some weirdness). But I think this fixes the ones we've seen in production and in tests.

michalborek · 2026-04-02T08:05:18Z

I think this is a big improvement. I think there might be another incredibly unlikely race condition or two (for example, remove doesn't look in recentlyAddedSchedules, but that would only be an issue if you called remove and add quickly, and I think only while the engine was also pausing and restarting -- if you do all 4 of those things in rapid succession, you've got to expect some weirdness). But I think this fixes the ones we've seen in production and in tests.

Yes, I agree. The symmetric to this one would be removing a watch while a reload happens. To fix this we'd need to have something like recentlyDeletedWatches...

When new nodes appear in the cluster the watcher allocation may change If we don't reset the schedules for a given node, there is a chance multiple nodes will execute the watch. This fix revealed a race condition on watch insertion which has been fixed by adding recentlyAddedSchedules collection that handles schedules added during the watcher service reload.

michalborek added >bug :Distributed/Watcher v9.4.0 labels Mar 31, 2026

elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Mar 31, 2026

elastic deleted a comment from coderabbitai bot Mar 31, 2026

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

...re/src/main/java/org/elasticsearch/xpack/core/watcher/actions/throttler/PeriodThrottler.java Outdated Show resolved Hide resolved

michalborek force-pushed the issue-131964 branch from 30e3c49 to 3f06b7f Compare March 31, 2026 13:08

masseyke self-requested a review March 31, 2026 13:14

elastic deleted a comment from coderabbitai bot Mar 31, 2026

michalborek marked this pull request as draft March 31, 2026 14:26

michalborek force-pushed the issue-131964 branch from 368f527 to 2762bd9 Compare April 1, 2026 08:16

michalborek force-pushed the issue-131964 branch from 846f145 to ee907df Compare April 1, 2026 11:00

michalborek marked this pull request as ready for review April 1, 2026 11:19

michalborek and others added 6 commits April 1, 2026 14:08

Clear watch schedules when starting trigger engine

10704e1

When new nodes appear in the cluster the watcher allocation may change If we don't reset the schedules for a given node, there is a chance multiple nodes will execute the watch.

Change a logger to elasticsearch one

d55c083

[CI] Auto commit changes from spotless

4f3e2ac

Avoid race condition when adding watches to trigger engine

b113f3e

Update docs/changelog/145325.yaml

4652365

Unmute the test

54c3c1d

michalborek force-pushed the issue-131964 branch from ee907df to 54c3c1d Compare April 1, 2026 12:08

Update docs/changelog/145325.yaml

24832d4

masseyke reviewed Apr 1, 2026

View reviewed changes

...ava/org/elasticsearch/xpack/watcher/trigger/schedule/engine/TickerScheduleTriggerEngine.java Outdated Show resolved Hide resolved

masseyke reviewed Apr 1, 2026

View reviewed changes

...ava/org/elasticsearch/xpack/watcher/trigger/schedule/engine/TickerScheduleTriggerEngine.java Show resolved Hide resolved

masseyke reviewed Apr 1, 2026

View reviewed changes

...ava/org/elasticsearch/xpack/watcher/trigger/schedule/engine/TickerScheduleTriggerEngine.java Outdated Show resolved Hide resolved

Optimise synchronized block and remove concurrent HashMap

15f6598

masseyke approved these changes Apr 1, 2026

View reviewed changes

michalborek merged commit a60205c into elastic:main Apr 2, 2026
35 checks passed

michalborek deleted the issue-131964 branch April 2, 2026 08:00

Conversation

michalborek commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Mar 31, 2026

Uh oh!

elasticsearchmachine commented Mar 31, 2026

Uh oh!

coderabbitai bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

coderabbitai bot commented Mar 31, 2026

Uh oh!

michalborek commented Mar 31, 2026

Uh oh!

coderabbitai bot commented Mar 31, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

elasticsearchmachine commented Apr 1, 2026

Uh oh!

elasticsearchmachine commented Apr 1, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

masseyke left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

michalborek commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

michalborek commented Mar 31, 2026 •

edited

Loading

coderabbitai bot commented Mar 31, 2026 •

edited

Loading