Skip to content

Speed up ILM cluster task execution#85405

Merged
joegallo merged 27 commits intoelastic:masterfrom
joegallo:faster-ilm-batches
Mar 29, 2022
Merged

Speed up ILM cluster task execution#85405
joegallo merged 27 commits intoelastic:masterfrom
joegallo:faster-ilm-batches

Conversation

@joegallo
Copy link
Copy Markdown
Contributor

@joegallo joegallo commented Mar 28, 2022

Closes #82708

The central change here is the addition of the special purpose Metadata#withLifecycleState method that bypasses some of the expensive validation that we perform in Metadata.Builder#build -- since we're not adding or removing indices, that validation isn't relevant.

In the many shards scalability profiling that @original-brownbear did against this code, it decreased ILM execution time from 93.33% of the profile to less than 1%, so it's extraordinarily effective at decreasing the time the master spends executing the ILM cluster task batches.

joegallo added 20 commits March 28, 2022 13:33
The other ilm UpdateTasks call it idxMeta in the same context (for
better or worse).
This is only for updating an indexMetadata that already exists and is
associated with the cluster state, it's not for sneakily injecting new
indices.
Since the index already exists, and the lifecycleState is the only
thing that's being changed, we can bypass most of the expense of
building a new cluster state (the internal validation that
Metadata.Builder runs).
A little whitespace, cleanup some comments, rename some variables.
in order to prevent the non-performant code path from resembling the
performant code path. InitializePolicyContextStep now explicitly uses
the non-performant code path, while all the other callers still use
the performant path.
@joegallo joegallo added >enhancement :Data Management/ILM+SLM DO NOT USE. Use ":StorageEngine/ILM" or ":Distributed Coordination/SLM" instead. v8.2.0 labels Mar 28, 2022
@joegallo joegallo requested review from DaveCTurner and dakrone March 28, 2022 17:38
@elasticmachine elasticmachine added the Team:Data Management (obsolete) DO NOT USE. This team no longer exists. label Mar 28, 2022
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @joegallo, I've created a changelog YAML for you.

Copy link
Copy Markdown
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I left a few minor comments but thanks for working on this!

);
}

public Metadata withLifecycleState(final Index index, final LifecycleExecutionState lifecycleState) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a public method, can you add javadoc for it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does ea9c861 look to you?

final ImmutableOpenMap.Builder<String, IndexMetadata> builder = ImmutableOpenMap.builder(indices);
builder.put(index.getName(), indexMetadataBuilder.build());

return new Metadata(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be worth adding a comment about why this doesn't use Metadata.builder(this).etc(...) but directly constructs the Metadata, so that it doesn't get accidentally undone in the future. What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I handled this in ea9c861, too.

@joegallo joegallo merged commit cc51c1a into elastic:master Mar 29, 2022
@joegallo joegallo deleted the faster-ilm-batches branch March 29, 2022 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/ILM+SLM DO NOT USE. Use ":StorageEngine/ILM" or ":Distributed Coordination/SLM" instead. >enhancement Team:Data Management (obsolete) DO NOT USE. This team no longer exists. v8.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Large ILM Task Batches are Executed too Slowly

4 participants