Clear watch schedules when starting trigger engine#145325
Clear watch schedules when starting trigger engine#145325michalborek merged 8 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/es-distributed (Team:Distributed) |
|
Hi @michalborek, I've created a changelog YAML for you. |
📝 WalkthroughWalkthroughThis pull request addresses an issue in the Watcher trigger engine where watch schedules were not being properly cleared on restart. The 🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
⏳ Starting custom recipe |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/watcher/actions/throttler/PeriodThrottler.java`:
- Around line 9-10: Replace the Log4j imports and logger usage with
Elasticsearch logging types: remove imports of
org.apache.logging.log4j.LogManager and org.apache.logging.log4j.Logger and
import org.elasticsearch.logging.LogManager and org.elasticsearch.logging.Logger
instead, then update the logger declaration (the static LOGGER variable created
via LogManager.getLogger(...) in PeriodThrottler) to use the Elasticsearch
LogManager/Logger types so the class uses org.elasticsearch.logging.Logger
throughout.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: ba7e98ea-aefb-4330-8cfb-90d1365f8003
📒 Files selected for processing (4)
docs/changelog/145325.yamlx-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/watcher/actions/throttler/PeriodThrottler.javax-pack/plugin/watcher/src/main/java/org/elasticsearch/xpack/watcher/trigger/schedule/engine/TickerScheduleTriggerEngine.javax-pack/plugin/watcher/src/test/java/org/elasticsearch/xpack/watcher/trigger/schedule/engine/TickerScheduleEngineTests.java
...re/src/main/java/org/elasticsearch/xpack/core/watcher/actions/throttler/PeriodThrottler.java
Outdated
Show resolved
Hide resolved
30e3c49 to
3f06b7f
Compare
368f527 to
2762bd9
Compare
|
Hi @michalborek, I've created a changelog YAML for you. |
846f145 to
ee907df
Compare
When new nodes appear in the cluster the watcher allocation may change If we don't reset the schedules for a given node, there is a chance multiple nodes will execute the watch.
ee907df to
54c3c1d
Compare
|
Hi @michalborek, I've updated the changelog YAML for you. |
...ava/org/elasticsearch/xpack/watcher/trigger/schedule/engine/TickerScheduleTriggerEngine.java
Outdated
Show resolved
Hide resolved
...ava/org/elasticsearch/xpack/watcher/trigger/schedule/engine/TickerScheduleTriggerEngine.java
Show resolved
Hide resolved
...ava/org/elasticsearch/xpack/watcher/trigger/schedule/engine/TickerScheduleTriggerEngine.java
Outdated
Show resolved
Hide resolved
masseyke
left a comment
There was a problem hiding this comment.
I think this is a big improvement. I think there might be another incredibly unlikely race condition or two (for example, remove doesn't look in recentlyAddedSchedules, but that would only be an issue if you called remove and add quickly, and I think only while the engine was also pausing and restarting -- if you do all 4 of those things in rapid succession, you've got to expect some weirdness). But I think this fixes the ones we've seen in production and in tests.
Yes, I agree. The symmetric to this one would be removing a watch while a reload happens. To fix this we'd need to have something like |
When new nodes appear in the cluster the watcher allocation may change If we don't reset the schedules for a given node, there is a chance multiple nodes will execute the watch. This fix revealed a race condition on watch insertion which has been fixed by adding recentlyAddedSchedules collection that handles schedules added during the watcher service reload.
When new nodes appear in the cluster the watch may change the node it is running on.
If we don't reset the schedules for a given node, there is a chance multiple nodes will execute the watch.
This could be observed in #131964 where some watcher actions were unexpectedly throttled (it happened that the same watch was run on two nodes and one of the execution was throttled.
Solving this problem uncovers another issue, related to watchers. When a watcher service is being reloaded (e.g. due to watcher allocation changes), and at the same time a new watcher is added, the
WatcherIndexingListeneris directly modifying theTickerScheduleTriggerEngineand the trigger enginestartmethod reverts this operation.To fix that, the recently added watchers will be treated separately from the rest and will be re-added during the start operation even if the fresh watcher has not yet been searchable.
Closes #131964
Closes #137562