Skip to content

Prevent Duplicate ILM Cluster State Updates from Being Created#78390

Merged
original-brownbear merged 4 commits intoelastic:masterfrom
original-brownbear:dedup-ilm
Sep 29, 2021
Merged

Prevent Duplicate ILM Cluster State Updates from Being Created#78390
original-brownbear merged 4 commits intoelastic:masterfrom
original-brownbear:dedup-ilm

Conversation

@original-brownbear
Copy link
Copy Markdown
Contributor

@original-brownbear original-brownbear commented Sep 28, 2021

Prevent duplicate ILM tasks from being enqueued to fix the most immediate issues around #78246. The ILM logic should be further improved though. I did not include MoveToErrorStepUpdateTask in this change yet as I wasn't entirely sure how valid/safe hashing/comparing arbitrary Exceptions would be. That could be looked into in a follow-up as well.

Relates #77466

Closes #78246

@original-brownbear original-brownbear added WIP :Data Management/ILM+SLM DO NOT USE. Use ":StorageEngine/ILM" or ":Distributed Coordination/SLM" instead. labels Sep 28, 2021
@original-brownbear original-brownbear marked this pull request as ready for review September 28, 2021 14:37
@elasticmachine elasticmachine added the Team:Data Management (obsolete) DO NOT USE. This team no longer exists. label Sep 28, 2021
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Copy link
Copy Markdown
Contributor

@probakowski probakowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @original-brownbear, this change looks good. I left just one question and small suggestion.
I tested this locally and with creating 1000 indices at the same time and tasks count (_cat/pending_tasks) never went above 5000 comparing to 1000000 without this change.

}

private boolean registerTask(IndexLifecycleClusterStateUpdateTask task) {
synchronized (executingTasks) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason for explicit synchronized block and not using Collections.synchronizedSet(new HashSet<>()) or ConcurrentHashMap?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right ... not sure why I did it this used the sync set now and just inlined everything since it's all one liners now :)

Thanks!

Comment on lines +538 to +542
final boolean removed;
synchronized (executingTasks) {
removed = executingTasks.remove(task);
}
assert removed : "tried to unregister unknown task [" + task + "]";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
final boolean removed;
synchronized (executingTasks) {
removed = executingTasks.remove(task);
}
assert removed : "tried to unregister unknown task [" + task + "]";
synchronized (executingTasks) {
final boolean removed = executingTasks.remove(task);
assert removed : "tried to unregister unknown task [" + task + "]";
}

Copy link
Copy Markdown
Contributor

@probakowski probakowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @original-brownbear!

@original-brownbear
Copy link
Copy Markdown
Contributor Author

Thanks @probakowski !

@original-brownbear original-brownbear merged commit 990aa34 into elastic:master Sep 29, 2021
@original-brownbear original-brownbear deleted the dedup-ilm branch September 29, 2021 05:49
original-brownbear added a commit that referenced this pull request Sep 29, 2021
… (#78427)

Prevent duplicate ILM tasks from being enqueued to fix the most immediate issues around #78246. The ILM logic should be further improved though. I did not include `MoveToErrorStepUpdateTask` in this change yet as I wasn't entirely sure how valid/safe hashing/comparing arbitrary `Exception`s would be. That could be looked into in a follow-up as well.

Relates #77466 

Closes #78246
original-brownbear added a commit that referenced this pull request Sep 29, 2021
Follow up to #78390. The `EmptyInfo` would not compare
correctly because it doesn't implement equals or hashcode,
breaking deduplication for `SetStepInfoUpdateTask`.

=> just making it a singleton to fix this and have a fast comp via
instance equality.
original-brownbear added a commit that referenced this pull request Sep 29, 2021
Follow up to #78390. The `EmptyInfo` would not compare
correctly because it doesn't implement equals or hashcode,
breaking deduplication for `SetStepInfoUpdateTask`.

=> just making it a singleton to fix this and have a fast comp via
instance equality.
original-brownbear added a commit that referenced this pull request Sep 30, 2021
If the current combination of current-step and index has a running CS update task
enqueued there is no point in adding yet another task for this combination on the applier
and we can skip the expensive inspection for the index.

follow up to #78390
original-brownbear added a commit that referenced this pull request Sep 30, 2021
If the current combination of current-step and index has a running CS update task
enqueued there is no point in adding yet another task for this combination on the applier
and we can skip the expensive inspection for the index.

follow up to #78390
@original-brownbear original-brownbear restored the dedup-ilm branch April 18, 2023 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Data Management/ILM+SLM DO NOT USE. Use ":StorageEngine/ILM" or ":Distributed Coordination/SLM" instead. Team:Data Management (obsolete) DO NOT USE. This team no longer exists. v7.16.0 v8.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ILM Can Create an Unlimited Number of Pending Clusterstate Updates on Slow Master Nodes

5 participants