Skip to content

Add enrich node cache#76800

Merged
martijnvg merged 18 commits intoelastic:masterfrom
martijnvg:enrich_cache
Sep 3, 2021
Merged

Add enrich node cache#76800
martijnvg merged 18 commits intoelastic:masterfrom
martijnvg:enrich_cache

Conversation

@martijnvg
Copy link
Copy Markdown
Member

Introduce a LRU cache to avoid searches that occur frequently
from the enrich processor.

Relates to #48988

Introduce a LRU cache to avoid searches that occur frequently
from the enrich processor.

Relates to elastic#48988
@martijnvg martijnvg added >enhancement :Distributed/Ingest Node Execution or management of Ingest Pipelines v8.0.0 v7.16.0 labels Aug 23, 2021
@elasticmachine elasticmachine added the Team:Data Management (obsolete) DO NOT USE. This team no longer exists. label Aug 23, 2021
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

@martijnvg martijnvg mentioned this pull request Aug 23, 2021
10 tasks
Copy link
Copy Markdown
Contributor

@danhermann danhermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Minor comments below but nothing blocking.

return String.valueOf(maxConcurrentRequests * maxLookupsPerRequest);
}, val -> Setting.parseInt(val, 1, Integer.MAX_VALUE, QUEUE_CAPACITY_SETTING_NAME), Setting.Property.NodeScope);

public static final Setting<Long> CACHE_SIZE = Setting.longSetting("enrich.cache_size", 1000, 0, Setting.Property.NodeScope);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were you planning to add documentation for this new setting?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I will add docs for this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed: 8c328a4
This is the first enrich setting that we document.
@jrodewig do you think this is the right place?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe other enrich settings should be documented as well along side the cache size setting. I will do this in a follow up and ping James when he is back. I will revert the mentioned commit for now.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping @martijnvg. I'll be happy to take a look!

EnrichProcessorFactory(Client client, ScriptService scriptService, EnrichCache enrichCache) {
this.client = client;
this.scriptService = scriptService;
this.enrichCache = enrichCache;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe require non-null here since the cache is used unconditionally in createSearchRunner?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 will do that

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: 9deb1b5

Copy link
Copy Markdown
Member

@jbaiera jbaiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well, just a little I noticed

return executingPolicies.equals(response.executingPolicies) &&
coordinatorStats.equals(response.coordinatorStats);
coordinatorStats.equals(response.coordinatorStats) &&
cacheStats.equals(response.cacheStats);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cacheStats could be end up null in mixed clusters

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot, I will fix this.

Copy link
Copy Markdown
Member Author

@martijnvg martijnvg Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: 9070d70

public static class TransportAction extends HandledTransportAction<SearchRequest, SearchResponse> {

private final Coordinator coordinator;
private final EnrichCache enrichCache;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we making use of this anywhere in the proxy action?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I initially did caching from here, but then moved it to the processor. The reason that in case something is cached then invoking coordinator action can be skipped as well.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: 55e6e24

@mjmbischoff
Copy link
Copy Markdown
Contributor

@martijnvg
Copy link
Copy Markdown
Member Author

@mjmbischoff Thanks for noticing this! Using computeIfAbsent(...) is preferable here over put() and get() (there are some other places where I think this can be changed too). I will pull that commit from your branch.

@martijnvg
Copy link
Copy Markdown
Member Author

@mjmbischoff I overlooked something, computeIfAbsent(...) can't be used here, because a potential remote call is made here and this method can't handle asynchronous execution with an ActionListener. Blocking a thread like is done here, should be avoided. Otherwise thread pools may get exhausted (For example if another node is slow or search thread pool is current node is exhausted).

This reverts commit 8c328a4.
@mjmbischoff
Copy link
Copy Markdown
Contributor

mjmbischoff commented Sep 3, 2021

@martijnvg Yeah my bad I overlooked that the second call was async.

We could keep the get on the normal thread, and then do a computeIfAbsent on the client threadpool. While this fixes multiple executions, the problem here is that it eats one additional thread. I don't see a way to fire off EnrichCoordinatorProxyAction on the calling thread, which would remove the drawback.

However, we can do one better by not having SearchResponse's in the cache but CompletableFuture<SearchResponse>'s this allows us to:

  • make the compute fast/ non-blocking by:
    • creating a CompletableFuture
    • do the async call(client.execute)
    • set the result on CompletableFuture in the callback of the async call.
  • consume the retrieved CompletableFuture async / non-blocking by registering a callback on the CompletableFuture which gets called by the thread setting the value

This does have the side effect that exceptions are now cached, so when we 'consume' the exception I'm currently invalidating the cache entry so subsequent look-ups can try again. -> I guess we could do something fancy here, speeding things up when it's a permanent failure.

This results in the cache interaction being non-blocking and only a single search request is performed when cache look-ups for the same value happen in parallel.

I've updated my branch reflecting these changes

@martijnvg
Copy link
Copy Markdown
Member Author

Thanks @mjmbischoff, I think that should work and is a good improvement. I will merge this PR, can you open a followup pr with these changes?

@martijnvg martijnvg merged commit 1ae4f3c into elastic:master Sep 3, 2021
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Sep 3, 2021
Backporting elastic#76800 to 7.x branch.

Introduce a LRU cache to avoid searches that occur frequently
from the enrich processor.

Relates to elastic#48988
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Sep 3, 2021
martijnvg added a commit that referenced this pull request Sep 3, 2021
martijnvg added a commit that referenced this pull request Sep 3, 2021
Backporting #76800 to 7.x branch.

Introduce a LRU cache to avoid searches that occur frequently
from the enrich processor.

Relates to #48988
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Sep 3, 2021
martijnvg added a commit that referenced this pull request Sep 3, 2021
@mjmbischoff
Copy link
Copy Markdown
Contributor

Thanks @mjmbischoff, I think that should work and is a good improvement. I will merge this PR, can you open a followup pr with these changes?

Will do, opened #77259

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Ingest Node Execution or management of Ingest Pipelines >enhancement Team:Data Management (obsolete) DO NOT USE. This team no longer exists. v7.16.0 v8.0.0-alpha2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants